统计学习方法学习笔记-07-支持向量机02

包含对三种支持向量机的介绍,包括线性可分支持向量机,线性支持向量机和非线性支持向量机,包含核函数和一种快速学习算法-序列最小最优化算法SMO。

线性支持向量机与软间隔最大化

假设训练集\(T = \{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\},x_i \in \mathcal{X} = R^n,y_i \in \mathcal{Y} = \{+1,-1\},i = 1,2,\cdots,N\),这个训练数据集不是线性可分的,有一些特异点,将这些特异点去除之后,剩下的集合是线性可分的,所以我们对每一个样本点\((x_i,y_i)\)引入一个松弛变量\(\xi_i \geq 0\),之后约束条件变为:

\[y_i(\omega \cdot x_i + b) \geq 1 - \xi_i \]

同时对于每一个松弛变量需要支付一个代价,所以目标函数变为:

\[\frac{1}{2}||\omega||^2 + C\sum_{i = 1}^N\xi_i \]

\(C \gt 0\)称为惩罚参数,其值越大对误分类的惩罚越大
线性支持向量机的学习问题的原始问题如下:

\[\begin{aligned} \mathop{min}\limits_{\omega,b,\xi}\ &\frac{1}{2}||\omega||^2 + C\sum_{i = 1}^N\xi_i \\ s.t.\ &y_i(\omega \cdot x_i + b) \geq 1 - \xi_i,i = 1,2,\cdots,N \\ &\xi_i \geq 0,i = 1,2,\cdots,N \end{aligned} \]

引入拉格朗日乘子,原始最优化问题的拉格朗日函数为:

\[L(\omega,b,\xi,\alpha,\mu) \equiv \frac{1}{2}||\omega||^2 + C\sum_{i = 1}^N\xi_i - \sum_{i = 1}^N\alpha_i(y_i(\omega \cdot x_i + b) - 1 + \xi_i) - \sum_{i = 1}^N\mu_i\xi_i \]

其中\(\alpha_i \geq 0,\mu_i \geq 0\)
首先求\(L(\omega,b,\xi,\alpha,\mu)\)\(\omega,b,\xi\)的极小:

\[\nabla_\omega L(\omega,b,\xi,\alpha,\mu) = \omega - \sum_{i = 1}^N\alpha_iy_ix_i = 0 \\ \nabla_b L(\omega,b,\xi,\alpha,\mu) = -\sum_{i = 1}^N\alpha_iy_i = 0 \\ \nabla_{\xi_i} L(\omega,b,\xi,\alpha,\mu) = C - \alpha_i -\mu_i = 0 \]

可以得到:

\[\omega = \sum_{i = 1}^N\alpha_iy_ix_i \\ \sum_{i = 1}^N\alpha_iy_i = 0 \\ C - \alpha_i -\mu_i = 0 \]

将上式带到拉格朗日函数当中可以得到:

\[\mathop{min}\limits_{\omega,b,\xi} L(\omega,b,\xi,\alpha,\mu) = -\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) + \sum_{i = 1}^N\alpha_i \]

再对\(\mathop{min}\limits_{\omega,b,\xi} L(\omega,b,\xi,\alpha,\mu)\)\(\alpha\)的最大即得对偶问题:

\[\begin{aligned} \mathop{max}\limits_\alpha\ & -\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) + \sum_{i = 1}^N\alpha_i \\ s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\ &C - \alpha_i - \mu_i = 0 \\ &\alpha_i \geq 0 \\ &\mu_i \geq 0, i = 1,2,\cdots,N \end{aligned} \]

利用等式约束消去\(\mu_i\),只剩下变量\(\alpha_i\),于是上式变为:

\[\begin{aligned} \mathop{max}\limits_\alpha\ & -\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) + \sum_{i = 1}^N\alpha_i \\ s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\ &0 \leq \alpha_i \leq C \\ \end{aligned} \]

将上式的求极大变为求极小:

\[\begin{aligned} \mathop{min}\limits_\alpha\ & \frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) - \sum_{i = 1}^N\alpha_i \\ s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\ &0 \leq \alpha_i \leq C \\ \end{aligned} \]

通过求解对偶问题而得到原始问题的解,设\(\alpha^* = (\alpha^*_1,\alpha^*_2,\cdots,\alpha_N^*)^T\)是对偶问题的解,存在下标\(j\),使得\(0 \leq \alpha_j^* \leq C\),可以由\(\alpha^*\)求得原始问题的解\(\omega^*,b^*\)

\[\omega^* = \sum_{i = 1}^N\alpha_i^*y_ix_i \\ b^* = y_j - \sum_{i = 1}^N\alpha_i^*y_i(x_i \cdot x_j) \]

其证明过程和线性可分支持向量机的类似

线性支持向量机学习算法:

输入:训练数据集\(T = \{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\}\),其中\(x_i \in \mathcal{X} = R^n,y_i \in \mathcal{Y} = \{+1,-1\},i = 1,2,\cdots,N\)
输出:分离超平面和分类决策函数

  • 选择惩罚参数\(C \gt 0\),构造并求解约束最优化问题求得最优解\(\alpha^* = (\alpha^*_1,\alpha^*_2,\cdots,\alpha_N^*)^T\)

\[\begin{aligned} \mathop{min}\limits_{\alpha}\ &\frac{1}{2}\sum_{i = 1}^N\sum_{j = 1}^N\alpha_i\alpha_jy_iy_j(x_i \cdot x_j) - \sum_{i = 1}^N\alpha_i \\ s.t.\ &\sum_{i = 1}^N\alpha_iy_i = 0 \\ & 0 \leq \alpha_i \leq C,i = 1,2,\cdots,N \end{aligned} \]

  • 计算

\[\omega^* = \sum_{i = 1}^N\alpha_i^*y_ix_i \]

选择\(\alpha^*\)的一个分量\(0 \lt \alpha_j^* \lt 0\),计算:

\[b^* = y_j - \sum_{i = 1}^N\alpha_i^*y_i(x_i \cdot x_j) \]

  • 求得分离超平面和决策函数:

\[\sum_{i = 1}^N\alpha_i^*y_i(x \cdot x_i) + b^* = 0 \\ f(x) = sign\left(\sum_{i = 1}^N\alpha_i^*y_i(x \cdot x_i) + b^*\right) \]

训练数据中对应\(\alpha^*_i \gt 0\)的样本点\((x_i,y_i)\)\(x_i\)称为支持向量:

  • \(\alpha_i^* \lt C\),有\(\xi_i = 0\)支持向量落在间隔边界上
  • \(\alpha_i^* = C,0 \lt \xi_i \lt 1\)支持向量分类正确,落在间隔边界与分离超平面之间
  • \(\alpha_i^* = C, \xi_i = 1\)支持向量落在分离超平面上
  • \(\alpha_i^* = C, \xi_i \gt 1\)支持向量落在分离超平面误分一侧
  • 实例\(x_i\)到间隔边界的距离为\(\frac{\xi_i}{||\omega||}\)
  • 两间隔边界之间的距离为\(\frac{2}{||\omega||}\)
posted @ 2022-09-18 16:56  eryo  阅读(53)  评论(0)    收藏  举报