包含对三种支持向量机的介绍,包括线性可分支持向量机,线性支持向量机和非线性支持向量机,包含核函数和一种快速学习算法-序列最小最优化算法SMO。
线性支持向量机与软间隔最大化
假设训练集\(T = \{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\},x_i \in \mathcal{X} = R^n,y_i \in \mathcal{Y} = \{+1,-1\},i = 1,2,\cdots,N\),这个训练数据集不是线性可分的,有一些特异点,将这些特异点去除之后,剩下的集合是线性可分的,所以我们对每一个样本点\((x_i,y_i)\)引入一个松弛变量\(\xi_i \geq 0\),之后约束条件变为:
\[y_i(\omega \cdot x_i + b) \geq 1 - \xi_i
\]
同时对于每一个松弛变量需要支付一个代价,所以目标函数变为:
\[\frac{1}{2}||\omega||^2 + C\sum_{i = 1}^N\xi_i
\]
\(C \gt 0\)称为惩罚参数,其值越大对误分类的惩罚越大
线性支持向量机的学习问题的原始问题如下:
\[\begin{aligned}
\mathop{min}\limits_{\omega,b,\xi}\ &\frac{1}{2}||\omega||^2 + C\sum_{i = 1}^N\xi_i \\
s.t.\ &y_i(\omega \cdot x_i + b) \geq 1 - \xi_i,i = 1,2,\cdots,N \\
&\xi_i \geq 0,i = 1,2,\cdots,N
\end{aligned}
\]
引入拉格朗日乘子,原始最优化问题的拉格朗日函数为:
\[L(\omega,b,\xi,\alpha,\mu) \equiv \frac{1}{2}||\omega||^2 + C\sum_{i = 1}^N\xi_i - \sum_{i = 1}^N\alpha_i(y_i(\omega \cdot x_i + b) - 1 + \xi_i) - \sum_{i = 1}^N\mu_i\xi_i
\]
其中\(\alpha_i \geq 0,\mu_i \geq 0\)
首先求\(L(\omega,b,\xi,\alpha,\mu)\)对\(\omega,b,\xi\)的极小:
\[\nabla_\omega L(\omega,b,\xi,\alpha,\mu) = \omega - \sum_{i = 1}^N\alpha_iy_ix_i = 0 \\
\nabla_b L(\omega,b,\xi,\alpha,\mu) = -\sum_{i = 1}^N\alpha_iy_i = 0 \\
\nabla_{\xi_i} L(\omega,b,\xi,\alpha,\mu) = C - \alpha_i -\mu_i = 0
\]
可以得到:
\[\omega = \sum_{i = 1}^N\alpha_iy_ix_i \\
\sum_{i = 1}^N\alpha_iy_i = 0 \\
C - \alpha_i -\mu_i = 0
\]
将上式带到拉格朗日函数当中可以得到:
\[\mathop{min}\limits_{\omega,b,\xi} L(\omega,b,\xi,\alpha,\mu) = -\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) + \sum_{i = 1}^N\alpha_i
\]
再对\(\mathop{min}\limits_{\omega,b,\xi} L(\omega,b,\xi,\alpha,\mu)\)求\(\alpha\)的最大即得对偶问题:
\[\begin{aligned}
\mathop{max}\limits_\alpha\ & -\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) + \sum_{i = 1}^N\alpha_i \\
s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\
&C - \alpha_i - \mu_i = 0 \\
&\alpha_i \geq 0 \\
&\mu_i \geq 0, i = 1,2,\cdots,N
\end{aligned}
\]
利用等式约束消去\(\mu_i\),只剩下变量\(\alpha_i\),于是上式变为:
\[\begin{aligned}
\mathop{max}\limits_\alpha\ & -\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) + \sum_{i = 1}^N\alpha_i \\
s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\
&0 \leq \alpha_i \leq C \\
\end{aligned}
\]
将上式的求极大变为求极小:
\[\begin{aligned}
\mathop{min}\limits_\alpha\ & \frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) - \sum_{i = 1}^N\alpha_i \\
s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\
&0 \leq \alpha_i \leq C \\
\end{aligned}
\]
通过求解对偶问题而得到原始问题的解,设\(\alpha^* = (\alpha^*_1,\alpha^*_2,\cdots,\alpha_N^*)^T\)是对偶问题的解,存在下标\(j\),使得\(0 \leq \alpha_j^* \leq C\),可以由\(\alpha^*\)求得原始问题的解\(\omega^*,b^*\)
\[\omega^* = \sum_{i = 1}^N\alpha_i^*y_ix_i \\
b^* = y_j - \sum_{i = 1}^N\alpha_i^*y_i(x_i \cdot x_j)
\]
其证明过程和线性可分支持向量机的类似
线性支持向量机学习算法:
输入:训练数据集\(T = \{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\}\),其中\(x_i \in \mathcal{X} = R^n,y_i \in \mathcal{Y} = \{+1,-1\},i = 1,2,\cdots,N\)
输出:分离超平面和分类决策函数
- 选择惩罚参数\(C \gt 0\),构造并求解约束最优化问题求得最优解\(\alpha^* = (\alpha^*_1,\alpha^*_2,\cdots,\alpha_N^*)^T\)
\[\begin{aligned}
\mathop{min}\limits_{\alpha}\ &\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) - \sum_{i = 1}^N\alpha_i \\
s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\
& 0 \leq \alpha_i \leq C,i = 1,2,\cdots,N
\end{aligned}
\]
\[\omega^* = \sum_{i = 1}^N\alpha_i^*y_ix_i
\]
选择\(\alpha^*\)的一个分量\(0 \lt \alpha_j^* \lt 0\),计算:
\[b^* = y_j - \sum_{i = 1}^N\alpha_i^*y_i(x_i \cdot x_j)
\]
\[\sum_{i = 1}^N\alpha_i^*y_i(x \cdot x_i) + b^* = 0 \\
f(x) = sign\left(\sum_{i = 1}^N\alpha_i^*y_i(x \cdot x_i) + b^*\right)
\]
训练数据中对应\(\alpha^*_i \gt 0\)的样本点\((x_i,y_i)\)的\(x_i\)称为支持向量:
- \(\alpha_i^* \lt C\),有\(\xi_i = 0\)支持向量落在间隔边界上
- \(\alpha_i^* = C,0 \lt \xi_i \lt 1\)支持向量分类正确,落在间隔边界与分离超平面之间
- \(\alpha_i^* = C, \xi_i = 1\)支持向量落在分离超平面上
- \(\alpha_i^* = C, \xi_i \gt 1\)支持向量落在分离超平面误分一侧
- 实例\(x_i\)到间隔边界的距离为\(\frac{\xi_i}{||\omega||}\)
- 两间隔边界之间的距离为\(\frac{2}{||\omega||}\)