# 机器学习公开课笔记(7)：支持向量机

## 支持向量机(Support Vector Machine, SVM)

logistic回归的假设为$$\min\limits_\theta \frac{1}{m}\left[\sum\limits_{i=1}^{m}y^{(i)}(-\log(h_\theta(x^{(i)}))) + (1-y^{(i)})(-\log(1-h_\theta(x^{(i)})))\right] + \frac{\lambda}{2m}\sum\limits_{j=1}^{n}\theta_{j}^2$$通过去掉$\frac{1}{m}$并且将$A+\lambda B$的形式变为$CA+B$的形式，可以得到SVM的假设为$$\min\limits_\theta C\left[\sum\limits_{i=1}^{m}y^{(i)}cost_1(\theta^Tx^{(i)}) + (1-y^{(i)})cost_0(\theta^Tx^{(i)})\right] + \frac{1}{2}\sum\limits_{j=1}^{n}\theta_{j}^2$$

## Kernel

### SVM with kernels

• large C: low bias, high variance
• small C: high bias, low variance

• large $\sigma^2$：high bias, low variance ($f_i$ vary more smoothly)
• small $\sigma^2$：low bias, high variance ($f_i$ vary less smoothly)

## 支持向量机实践

Linear kernel： 不指定kernel，即“No kernel”，也称为“linear kernel”（用于特征n较大，同时example数m较小时）.

Gaussian kernel: $f_i=exp\left(-\frac{||x-l^{(i)}||^2}{2\sigma^2}\right)$，其中$l^{(i)}=x^{(i)}$，需要指定参数$\sigma^2$(用于n较小，m较大时)。注意在使用Gaussian kernel之前需要对数据进行feature scaling.

• Polynomal kernel：$k(x, l) = (\alpha x^Tl+c)^{d}$，其中可调节参数包括坡度$\alpha$，常量$c$和多项式度$d$
• string kernel: 对字符串直接进行变换，不需要将字符串数值化，具体公式见wikipedia:string kernel
• chi-square kernel：$k(x, y)=1-\sum\limits_{k=1}^{n}\frac{(x_k-y_k)^2}{\frac{1}{2}(x_k+y_k)}$
• histogram intersection kernel：$k(x, y) = \sum\limits_{k=1}^{n}\min(x_k, y_k)$

多元分类：采用one-vs-all的方法，对于k个分类，需要训练k个svm.

## 参考文献

[1] Andrew Ng Coursera 公开课第七周

[2] Kernel Functions for Machine Learning Applications. http://crsouza.com/2010/03/kernel-functions-for-machine-learning-applications/#chisquare

[3] Wikipedia: string kernel. https://en.wikipedia.org/wiki/String_kernel

[4] Hofmann T, Schölkopf B, Smola A J. Kernel methods in machine learning[J]. The annals of statistics, 2008: 1171-1220.

posted @ 2016-01-13 22:33  python27  阅读(2671)  评论(0编辑  收藏  举报