# 数据挖掘入门系列教程（八点五）之SVM介绍以及从零开始公式推导

## 线性分类

$$$h\left(\boldsymbol{x}_{i}\right)=\left\{\begin{array}{ll} 1 & 若 y_{i}=1 \\ -1 & 若 y_{i}=-1\\\tag{1} \end{array}\right.$$$

$$$f(\boldsymbol{x})=\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}+b$$ \tag{2}$

$y_i^2 = 1 \tag{3}$

## 间隔

\begin{aligned} & \gamma = \frac{f(x_0)}{||w||} \\ & 因此距离（不带正负号）的为: \\ & \tilde{\gamma} = y_0\gamma \end{aligned}

## 最大间隔分类器

\left \{ \begin{matrix} \begin{align*} & \max \quad \frac{2}{\Vert \omega \Vert} \\ & s.t. \quad y_i(\omega^T x_i + b) \geqslant 1 ,\quad i=1,2,...,m \end{align*} \end{matrix} \right. \tag{4}

\left \{ \begin{matrix} \begin{align*} & \min \quad \frac{1}{2}\| \omega \|^2 \\ & s.t. \quad y_i(\omega^T x_i + b) \geqslant 1 ,\quad i=1,2,...,m \end{align*} \end{matrix} \right. \tag{5}

## 拉格朗日乘子法（Lagrange multipliers）

### 拉格朗日乘子法推导

$\frac{{f_x}'(x_0,y_0)}{{g_x}'(x_0,y_0)}=\frac{{f_y}'(x_0,y_0)}{{g_y}'(x_0,y_0)}=-\lambda_0 （\lambda_0可以为0）$

$\left\{\begin{matrix} {f_x}'(x_0,y_0)+\lambda_0{g_x}'(x_0,y_0)=0\\ \\ {f_y}'(x_0,y_0)+\lambda_0{g_y}'(x_0,y_0)=0\\ \\ g(x,y)=0 \end{matrix}\right. \tag{6}$

$\left\{\begin{matrix} \frac{\partial L(x,y, \lambda)}{\partial x}={f_x}'(x,y)+\lambda{g_x}'(x,y)=0\\ \\ \frac{\partial L(x,y, \lambda)}{\partial y}={f_y}'(x,y)+\lambda{g_y}'(x,y)=0\\ \\ \frac{\partial L(x,y, \lambda)}{\partial \lambda}=g(x,y)=0 \end{matrix}\right. \tag{7}$

### KKT条件（Karush-Kuhn-Tucker Conditions）

\begin{aligned} \min _{x} & f(x) \\ \text { s.t. } & h_{i}(x)=0 \quad(i=1, \ldots, m) \\ & g_{j}(x) \leqslant 0 \quad(j=1, \ldots, n) \end{aligned}

$$$L(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})=f(\boldsymbol{x})+\sum_{i=1}^{m} \lambda_{i} h_{i}(\boldsymbol{x})+\sum_{j=1}^{n} \mu_{j} g_{j}(\boldsymbol{x})$$ \tag{8}$

$$$\left\{\begin{array}{l} g_{j}(\boldsymbol{x}) \leqslant 0 \\ \mu_{j} \geqslant 0 \\ \mu_{j} g_{j}(\boldsymbol{x})=0 \\ h(x) =0 \end{array}\right.$$$

• 情况一：极小值点$$x^*$$$$g_i(x)<0$$的区域内
• 情况二：极小值点$$x^*$$$$g_i(x)=0$$

$h(x)=0\\ g(x)=0\\ \mu \geq 0 \tag{9}$

$$h(x)=0，g(x)=0$$我们很好理解，但是为什么我们对$$\mu$$还要进行限制呢？然后为什么限制还为$$\mu \geq 0$$呢？首先我们来考虑一下$$f(x)$$$$g(x)$$$$x^*$$点的梯度方向（首先$$f(x)$$$$g(x)$$$$x^*$$点的梯度方向肯定是平行的【梯度的方向代表函数值增加最快的方向】）。

• 对于$$f(x)$$来说，等值线大小由中心到周围逐渐增大，因此它的梯度方向指向可行域。为图中红色的箭头号。
• 对于$$g(x)$$来说，梯度方向肯定是指向大于0的一侧，那么就是要背离可行域。为图中黄色的箭头号。

$\frac{{f_x}'(x_0,y_0)}{{g_x}'(x_0,y_0)}=\frac{{f_y}'(x_0,y_0)}{{g_y}'(x_0,y_0)}=-\lambda_0 （\lambda_0可以为0）$

$h(x)=0\\ g(x) \leq 0\\ \mu =0$

• 情况一：$$\mu = 0，g(x) \leq 0$$
• 情况二：$$\mu \geq 0，g(x)=0$$

$$$\left\{\begin{array}{l} g_{j}(\boldsymbol{x}) \leqslant 0 \quad(主问题可行)\\ \mu_{j} \geqslant 0 \quad(对偶问题可行)\\ \mu_{j} g_{j}(\boldsymbol{x})=0 \quad(互补松弛)\\ h(x) =0 \end{array}\right.$$$

### 拉格朗日乘子法对偶问题

\begin{aligned} \min _{x} & f(x) \\ \text { s.t. } & h_{i}(x)=0 \quad(i=1, \ldots, m) \\ & g_{j}(x) \leqslant 0 \quad(j=1, \ldots, n) \end{aligned} \tag{10}

$$$L(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu})=f(\boldsymbol{x})+\sum_{i=1}^{m} \lambda_{i} h_{i}(\boldsymbol{x})+\sum_{j=1}^{n} \mu_{j} g_{j}(\boldsymbol{x}) \\ s.t. \mu_j \ge0$$$

\begin{aligned} \min _{x} \max _{\boldsymbol{\lambda}, \boldsymbol{\mu}} & \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \\ \text { s.t. } & \mu_{i} \geq 0, \quad i=1,2, \ldots, m \end{aligned}

\begin{aligned} & \min _{x} \max _{\boldsymbol{\lambda}, \boldsymbol{\mu}}\mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \\ =& \min _{\boldsymbol{x}}\left(f(\boldsymbol{x})+\max _{\boldsymbol{\lambda}, \boldsymbol{\mu}}\left(\sum_{i=1}^{m} \mu_{i} g_{i}(\boldsymbol{u})+\sum_{j=1}^{n} \lambda_{j} h_{j}(\boldsymbol{u})\right)\right) \\ =& \min _{\boldsymbol{x}}\left(f(\boldsymbol{x})+\left\{\begin{array}{l} 0 \text{ 若x满足约束}\\ \infty \text{否则} \end{array}\right)\right.\\ =& \min _{\boldsymbol{u}} f(\boldsymbol{u}) \end{aligned}

\begin{aligned} \max _{\boldsymbol{\lambda}, \boldsymbol{\mu}} \min _{x}& \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \\ \text { s.t. } & \mu_{i} \geq 0, \quad i=1,2, \ldots, m \end{aligned}

$p^* = \min _{x} \max _{\boldsymbol{\lambda}, \boldsymbol{\mu}} \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \ge \max _{\boldsymbol{\lambda}, \boldsymbol{\mu}} \min _{x} \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) = g^*$

$\max _{x} \min _{y} f(x, y) \leq \min _{y} \max _{x} f(x, y)\\ \text{let } g(x)=\min _{y} f(x, y)\\ \text{then }g(x) \leq f(x, y), \forall y\\ \therefore \max _{x} g(x) \leq \max _{x} f(x, y), \forall y\\ \therefore \max _{x} g(x) \leq \min _{y} \max _{x} f(x, y)$

### Slater 条件

Slater定理是说，当Slater条件成立且原问题是凸优化问题时，则强对偶性成立。这里有几个名词值得注意：

• 凸优化问题

如果一个优化问题满足如下格式，我们就称该问题为一个凸优化问题：

$\begin{array}{} \text{min}&f(x)\\ \text{s.t}&g_i(x)\le0,&i=1,...,m \\ \text{ }&h_i(x)=0,&i=1,...,p \end{array}$

其中$$f(x)$$是凸函数，不等式约束$$g(x)$$也是凸函数，等式约束$$h(x)$$是仿射函数。

1. 凸函数是具有如下特性的一个定义在某个向量空间的凸子集$$C$$（区间）上的实值函数$$f$$：对其定义域上任意两点$$x_1,x_2$$总有$$f\left(\frac{x_{1}+x_{2}}{2}\right) \leq \frac{f\left(x_{1}\right)+f\left(x_{2}\right)}{2}$$

2. 仿射函数

仿射函数，即最高次数为1的多项式函数。

• 强对偶性

弱对偶性是$$p* = \min _{x} \max _{\boldsymbol{\lambda}, \boldsymbol{\mu}} \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) \ge \max _{\boldsymbol{\lambda}, \boldsymbol{\mu}} \min _{x} \mathcal{L}(\boldsymbol{x}, \boldsymbol{\lambda}, \boldsymbol{\mu}) = g*$$，也就是$$p^* \ge g^*$$，则强对偶性是$$p^* = g^*$$

1. 原问题是凸优化问题
2. 存在$$x$$使得$$g(x) \le0$$严格成立。（换句话说，就是存在$$x$$使得$$g(x) \lt0$$成立）

## 最大间隔分类器与拉格朗日乘子法

\left \{ \begin{matrix} \begin{align*} & \min \quad \frac{1}{2}\| \omega \|^2 \\ & s.t. \quad y_i(\omega^T x_i + b) \geqslant 1 ,\quad i=1,2,...,m \end{align*} \end{matrix} \right. \tag{11}

$$$\mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\alpha}):=\frac{1}{2} \boldsymbol{w}^{\top} \boldsymbol{w}+\sum_{i=1}^{m} \alpha_{i}\left(1-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b\right)\right) \\ s.t. \alpha_i \ge0,\quad i=1,2,...,m$$$

\begin{aligned} &\max _{\alpha} \min _{\boldsymbol{w}, b}\left(\frac{1}{2} \boldsymbol{w}^{\top} \boldsymbol{w}+\sum_{i=1}^{m} \alpha_{i}\left(1-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b\right)\right)\right)\\ &\text { s.t. } \quad \alpha_{i} \geq 0, \quad i=1,2, \ldots, m \end{aligned}\tag{12}

\begin{aligned} &\min _{\boldsymbol{w}, b}\left( \frac{1}{2} \boldsymbol{w}^{\top} \boldsymbol{w}+\sum_{i=1}^{m} \alpha_{i}\left(1-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{x}_{i}+b\right)\right)\right)\\ &\text { s.t. } \quad \alpha_{i} \geq 0, \quad i=1,2, \ldots, m \end{aligned}

$$$\frac{\partial \mathcal{L}}{\partial \boldsymbol{w}}=\mathbf{0} \Rightarrow \boldsymbol{w}=\sum_{i=1}^{m} \alpha_{i} y_{i} \boldsymbol{x}_{i} \\ \frac{\partial \mathcal{L}}{\partial b}=0 \Rightarrow \sum_{i=1}^{m} \alpha_{i} y_{i}=0$$ \tag{13}$

\begin{aligned} \min _{\alpha} & \frac{1}{2} \sum_{i=1}^{m} \sum_{j=1}^{m} \alpha_{i} \alpha_{j} y_{i} y_{j} \boldsymbol{x}_{i}^{\top} \boldsymbol{x}_{j}-\sum_{i=1}^{m} \alpha_{i} \\ \text { s.t. } & \sum_{i=1}^{m} \alpha_{i} y_{i}=0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \ldots, m \end{aligned} \tag{14}

\begin{aligned} \boldsymbol{w} &=\sum_{i=1}^{m} \alpha_{i} y_{i} \boldsymbol{x}_{i} \\ &=\sum_{i: \alpha_{i}=0}^{m} 0 \cdot y_{i} \boldsymbol{x}_{i}+\sum_{i: \alpha_{i}>0}^{m} \alpha_{i} y_{i} \boldsymbol{x}_{i} \\ &=\sum_{i \in S V} \alpha_{i} y_{i} \boldsymbol{x}_{i}\quad(SV 代表所有支持向量的集合) \end{aligned}

\begin{aligned} b&=y_k-\boldsymbol{w}^{\top} \boldsymbol{x} \\ &=y_{k}-(\sum_{i \in S V} \alpha_{i} y_{i} \boldsymbol{x}_{i})^{\top}x_k \\ &=y_k-\sum_{i \in S V} \alpha_{i} y_{i} \boldsymbol{x}_{i}^{\top}x_k \end{aligned}

## 核技巧

### 核函数

$$\phi(x)$$$$x$$映射后的特征向量，因此划分的超平面可以表示为$$f(x)=\phi(x)+b$$。同时$$公式（11）$$可以改为：

$$$\begin{array}{l} \min _{w,b} \frac{1}{2}\|\boldsymbol{w}\|^{2} \\ \text { s.t. } y_{i}\left(\boldsymbol{w}^{\mathrm{T}} \phi\left(\boldsymbol{x}_{i}\right)+b\right) \geqslant 1, \quad i=1,2, \ldots, m \end{array}$$$

\begin{aligned} \min _{\alpha} & \frac{1}{2} \sum_{i=1}^{m} \sum_{j=1}^{m} \alpha_{i} \alpha_{j} y_{i} y_{j} \phi(\boldsymbol{x}_{i}^{\top}) \phi(\boldsymbol{x}_{j})-\sum_{i=1}^{m} \alpha_{i} \\ \text { s.t. } & \sum_{i=1}^{m} \alpha_{i} y_{i}=0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \ldots, m \end{aligned} \tag{15}

$$$\kappa\left(\boldsymbol{x}_{i}, \boldsymbol{x}_{j}\right)=\boldsymbol{\phi}\left(\boldsymbol{x}_{i}\right)^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}_{j}\right)$$$

\begin{aligned} \min _{\alpha} & \frac{1}{2} \sum_{i=1}^{m} \sum_{j=1}^{m} \alpha_{i} \alpha_{j} y_{i} y_{j} \kappa\left(\boldsymbol{x}_{i}, \boldsymbol{x}_{j}\right)-\sum_{i=1}^{m} \alpha_{i} \\ \text { s.t. } & \sum_{i=1}^{m} \alpha_{i} y_{i}=0 \\ & \alpha_{i} \geq 0, \quad i=1,2, \ldots, m \end{aligned} \tag{15}

## 软间隔

\left \{ \begin{matrix} \begin{align*} & min _{\boldsymbol{w}, b} \left(\frac{1}{2}\|\boldsymbol{w}\|^{2}+C \sum_{i=1}^{m} \ell_{0 / 1}\left(y_{i}\left(\boldsymbol{w}^{\mathrm{T}} \boldsymbol{x}_{i}+b\right)-1\right)\right)\\ & s.t. \quad y_i(\omega^T x_i + b) \geqslant 1 ,\quad i=1,2,...,m \end{align*} \end{matrix} \right. \tag{16}

$$$\ell_{0 / 1}(z)=\left\{\begin{array}{ll} 1, & \text { if } z<0 \\ 0, & \text { otherwise } \end{array}\right.$$$

\begin{aligned} &\text {hinge 损失}：\ell_{\text {hinge}}(z)=\max (0,1-z) \\ &\text { 指数损失(exponential loss): } \ell_{\exp }(z)=\exp (-z)\\ &\text { 对率损失(logistic loss): } \ell_{\log }(z)=\log (1+\exp (-z)) \end{aligned}

### 软间隔支持向量机推导

$$\text {hinge函数}：\ell_{\text {hinge}}(z)=\max (0,1-z)$$等价于：

$$$\xi_{i}=\left\{\begin{array}{ll}0 & \text { if } y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}_{i}\right)+b\right) \geq 1 \\1-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}_{i}\right)+b\right) & \text { otherwise }\end{array}\right.$$$

$$\xi_{i}$$我们称之为松弛变量（slack variable），样本违背约束越远，则松弛变量值越大。因此优化目标式$$(5)$$可以写成：

\begin{aligned}\min _{\boldsymbol{w}, b, \boldsymbol{\xi}} &( \frac{1}{2} \boldsymbol{w}^{\top} \boldsymbol{w}+C \sum_{i=1}^{m} \xi_{i} )\\\text { s.t. } & y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}_{i}\right)+b\right) \geq 1-\xi_{i}, \quad i=1,2, \ldots, m \\& \xi_{i} \geq 0, \quad i=1,2, \ldots, m\end{aligned} \tag{17}

\begin{aligned} \mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\xi}, \boldsymbol{\alpha}, \boldsymbol{\beta}):=& \frac{1}{2} \boldsymbol{w}^{\top} \boldsymbol{w}+C \sum_{i=1}^{m} \xi_{i} \\ &+\sum_{i=1}^{m} \alpha_{i}\left(1-\xi_{i}-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}_{i}\right)+b\right)\right) \\ &+\sum_{i=1}^{m} \beta_{i}\left(-\xi_{i}\right) \end{aligned} \tag{18}

$$$\left\{\begin{array}{l} 1 - \xi_{i}-y_{i}\left(\boldsymbol{w}^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}_{i}\right)+b\right) \leq 0,-\xi_{i} \leq 0 \quad(主问题可行)\\ \alpha_{i} \geq 0, \beta_{i} \geq 0 \quad(对偶问题可行)\\ \alpha_{i}\left(1-\xi_{i}-y_{i}\left(\boldsymbol{w}^{\top} \phi\left(\boldsymbol{x}_{i}\right)+b\right)\right)=0, \beta_{i} \xi_{i}=0 \quad(互补松弛)\\ \end{array}\right.$$$

\begin{aligned} \max _{\boldsymbol{\alpha}, \boldsymbol{\beta}} \min _{\boldsymbol{w}, b, \boldsymbol{\xi}} & \mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\xi}, \boldsymbol{\alpha}, \boldsymbol{\beta}) \\ \text { s.t. } & \alpha_{i} \geq 0, \quad i=1,2, \ldots, m \\ & \beta_{i} \geq 0, \quad i=1,2, \ldots, m \end{aligned}

$$\min _{\boldsymbol{w}, b, \boldsymbol{\xi}} \mathcal{L}(\boldsymbol{w}, b, \boldsymbol{\xi}, \boldsymbol{\alpha}, \boldsymbol{\beta})$$的优化属于无约束的优化问题，我们通过将偏导置零的方法得到$$(\boldsymbol{w}, b, \boldsymbol{\xi})$$的最优值：

\begin{aligned} \frac{\partial \mathcal{L}}{\partial \boldsymbol{w}}=\mathbf{0} & \Rightarrow \boldsymbol{w}=\sum_{i=1}^{m} \alpha_{i} y_{i} \boldsymbol{\phi}\left(\boldsymbol{x}_{i}\right) \\ \frac{\partial \mathcal{L}}{\partial b} &=0 \Rightarrow \sum_{i=1}^{m} \alpha_{i} y_{i}=0 \\ \frac{\partial \mathcal{L}}{\partial \boldsymbol{\xi}} &=\mathbf{0} \Rightarrow \alpha_{i}+\beta_{i}=C \end{aligned}\tag{19}

$$$\begin{array}{ll} \min _{\alpha} & \frac{1}{2} \sum_{i=1}^{m} \sum_{j=1}^{m} \alpha_{i} \alpha_{j} y_{i} y_{j} \phi\left(\boldsymbol{x}_{i}\right)^{\top} \boldsymbol{\phi}\left(\boldsymbol{x}_{j}\right)-\sum_{i=1}^{m} \alpha_{i} \\ \text { s.t. } & \sum_{i=1}^{m} \alpha_{i} y_{i}=0,\\ & 0 \le \alpha_i \le C \end{array}$$ \tag{20}$

$$$\min _{f} (\Omega(f)+C \sum_{i=1}^{m} \ell\left(f\left(\boldsymbol{x}_{i}\right), y_{i}\right))$$$

## SMO算法

\begin{aligned} \min _{\alpha} & \frac{1}{2} \sum_{i=1}^{N} \sum_{j=1}^{N} \alpha_{i} \alpha_{j} y_{i} y_{j} K\left(x_{i}, x_{j}\right)-\sum_{i=1}^{N} \alpha_{i} \\ \text { s.t. } & \sum_{i=1}^{N} \alpha_{i} y_{i}=0 \\ & 0 \leqslant \alpha_{i} \leqslant C, \quad i=1,2, \cdots, N \end{aligned} \tag{21}

$$$\begin{array}{rl} \min _{\alpha_{1}, \alpha_{2}} & W\left(\alpha_{1}, \alpha_{2}\right)=\frac{1}{2} K_{11} \alpha_{1}^{2}+\frac{1}{2} K_{22} \alpha_{2}^{2}+y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2}- \\ & \left(\alpha_{1}+\alpha_{2}\right)+y_{1} \alpha_{1} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i 1}+y_{2} \alpha_{2} \sum_{i=3}^{N} y_{i} \alpha_{i} K_{i 2} \\ \text { s.t. } & \alpha_{1} y_{1}+\alpha_{2} y_{2}=-\sum_{i=3}^{N} y_{i} \alpha_{i}=\varsigma \\ & 0 \leqslant \alpha_{i} \leqslant C, \quad i=1,2 \end{array}$$ \tag{22}$

$$$\left\{\begin{array}{l} \alpha_1 - \alpha_2 = k \quad(y_1 \ne y_2)\\ \alpha_1 + \alpha_2 = k \quad(y_1 = y_2)\\ \end{array} \right.$$ \quad \text { s.t. }0 \leqslant \alpha_{1} \leqslant C，0 \leqslant \alpha_{2} \leqslant C\\ \tag{23}$

$$式(22)$$的初始可行解为$$\alpha_{1}^{\text {old }}, \alpha_{2}^{\text {old }}$$，最优解为$$\alpha_{1}^{\text {new }}, \alpha_{2}^{\text {new}}$$。对于$$\alpha_i^{new}$$来说，其取值范围必须满足：

$L \leqslant \alpha_{i}^{\text {new }} \leqslant H$

• 情况1：$$L=\max \left(0, \alpha_{2}^{\text {old }}-\alpha_{1}^{\text {old }}\right), \quad H=\min \left(C, C+\alpha_{2}^{\text {old }}-\alpha_{1}^{\text {old }}\right)$$
• 情况2：$$L=\max \left(0, \alpha_{2}^{\text {old }}+\alpha_{1}^{\text {old }}-C\right), \quad H=\min \left(C, \alpha_{2}^{\text {old }}+\alpha_{1}^{\text {old }}\right)$$

$\alpha_2^{new}= \begin{cases} H& { \alpha_2^{new,unc} > H}\\ \alpha_2^{new,unc}& {L \leq \alpha_2^{new,unc} \leq H}\\ L& {\alpha_2^{new,unc} < L} \end{cases}$

$设：g(x) = w^{*} \bullet \phi(x) + b\\ 由式（19）可知{w}=\sum_{i=1}^{m} \alpha_{i} y_{i} {\phi}({x}_{i}) \\ 因此有： g(x)=\sum\limits_{j=1}^{m}\alpha_j^{*}y_jK(x, x_j)+ b^{*}$

$$$E_{i}=g\left(x_{i}\right)-y_{i}=\left(\sum_{j=1}^{N} \alpha_{j} y_{j} K\left(x_{j}, x_{i}\right)+b\right)-y_{i}, \quad i=1,2$$$

$$$v_{i}=\sum_{j=3}^{N} \alpha_{j} y_{j} K\left(x_{i}, x_{j}\right)=g\left(x_{i}\right)-\sum_{j=1}^{2} \alpha_{j} y_{j} K\left(x_{i}, x_{j}\right)-b, \quad i=1,2$$$

$$$\begin{array}{c} W\left(\alpha_{1}, \alpha_{2}\right)=| \frac{1}{2} K_{11} \alpha_{1}^{2}+\frac{1}{2} K_{22} \alpha_{2}^{2}+y_{1} y_{2} K_{12} \alpha_{1} \alpha_{2}- \\ \left(\alpha_{1}+\alpha_{2}\right)+y_{1} v_{1} \alpha_{1}+y_{2} v_{2} \alpha_{2} \end{array}$$ \tag{24}$

$$$\alpha_{1}=\left(\varsigma-y_{2} \alpha_{2}\right) y_{1}$$$

\begin{aligned} W\left(\alpha_{2}\right)=& \frac{1}{2} K_{11}\left(s-\alpha_{2} y_{2}\right)^{2}+\frac{1}{2} K_{22} \alpha_{2}^{2}+y_{2} K_{12}\left(s-\alpha_{2} y_{2}\right) \alpha_{2}-\\ &\left(s-\alpha_{2} y_{2}\right) y_{1}-\alpha_{2}+v_{1}\left(s-\alpha_{2} y_{2}\right)+y_{2} v_{2} \alpha_{2} \end{aligned}

\begin{aligned} \frac{\partial W}{\partial \alpha_{2}}=& K_{11} \alpha_{2}+K_{22} \alpha_{2}-2 K_{12} \alpha_{2}-\\ & K_{11 S} y_{2}+K_{12} s y_{2}+y_{1} y_{2}-1-v_{1} y_{2}+y_{2} v_{2} \end{aligned}

\begin{aligned} \left(K_{11}+K_{22}-2 K_{12}\right) \alpha_{2}=& y_{2}\left(y_{2}-y_{1}+\varsigma K_{11}-\varsigma K_{12}+v_{1}-v_{2}\right) \\ =& y_{2}\left[y_{2}-y_{1}+\varsigma K_{11}-\varsigma K_{12}+\left(g\left(x_{1}\right)-\sum_{j=1}^{2} y_{j} \alpha_{j} K_{1 j}-b\right)-\right.\\ &\left.\left(g\left(x_{2}\right)-\sum_{j=1}^{2} y_{j} \alpha_{j} K_{2 j}-b\right)\right] \end{aligned} \tag{25}

\begin{aligned} \left(K_{11}+K_{22}-2 K_{12}\right) \alpha_{2}^{\text {new }, \text { unc }} &=y_{2}\left(\left(K_{11}+K_{22}-2 K_{12}\right) \alpha_{2}^{\text {old }} y_{2}+y_{2}-y_{1}+g\left(x_{1}\right)-g\left(x_{2}\right)\right) \\ &=\left(K_{11}+K_{22}-2 K_{12}\right) \alpha_{2}^{\text {old }}+y_{2}\left(E_{1}-E_{2}\right) \end{aligned} \tag{26}

$$$\eta=K_{11}+K_{22}-2 K_{12}=\left\|\Phi\left(x_{1}\right)-\Phi\left(x_{2}\right)\right\|^{2}$$$

$$$\alpha_{2}^{\text {new }, \mathrm{unc}}=\alpha_{2}^{\text {old }}+\frac{y_{2}\left(E_{1}-E_{2}\right)}{\eta}$$ \tag{27}$

$\alpha_2^{new}= \begin{cases} H& { \alpha_2^{new,unc} > H}\\ \alpha_2^{new,unc}& {L \leq \alpha_2^{new,unc} \leq H}\\ L& {\alpha_2^{new,unc} < L} \end{cases}$

### SMO变量的选择方法

1. 第一个变量的选择

我们称第一个变量的选择为外层循环，外层循环在训练样本中选择违反KKT条件最严重的样本点。对于KKT条件，我们可以转成以下的形式：

\begin{aligned} \alpha_{i} &=0 \Leftrightarrow y_{i} g\left(x_{i}\right) \geqslant 1 &\quad(1)\\ 0<\alpha_{i} &<C \Leftrightarrow y_{i} g\left(x_{i}\right)=1 &\quad(2)\\ \alpha_{i} &=C \Leftrightarrow y_{i} g\left(x_{i}\right) \leqslant 1 &\quad(3)\\ 其中g(x_{i}) &= \sum_{j=1}^{N}\alpha_{j}y_{j}K(x_{i},x_{j})+b\\ \end{aligned} \tag{28}

证明如下：

对于上式$$(1)$$

\begin{aligned} &\because\alpha_i = 0,\alpha_i + \beta_i = C ,且在KKT条件\beta_{i}\xi_{i}=0\\ &\therefore \beta_i = C,\therefore\xi_i = 0\\ 又&\because 由KTT条件可知：1-\xi_i\le y_ig(x_i)，\alpha_{i} [y_{i}g(x_{i})-(1-\xi_{i})]=0\\ &\therefore y_ig(x_i) \ge 1 \end{aligned}

对于上式$$(2)$$

\begin{aligned} &\because0<\alpha_{i} <C ,\alpha_i + \beta_i = C ,且在KKT条件\beta_{i}\xi_{i}=0\\ &\therefore 0 \lt\beta_i \lt C,\therefore\xi_i = 0\\ 又&\because 由KTT条件可知：1-\xi_i\le y_ig(x_i)，\alpha_{i} [y_{i}g(x_{i})-(1-\xi_{i})]=0\\ &\therefore y_ig(x_i) = 1-\xi_i = 1 \end{aligned}

对于上式$$(3)$$

\begin{aligned} &\because\alpha_i = C,\alpha_i + \beta_i = C ,且在KKT条件\beta_{i}\xi_{i}=0\\ &\therefore \beta_i = 0，\xi_i \ge0\\ 又&\because 由KTT条件可知：1-\xi_i\le y_ig(x_i)，\alpha_{i} [y_{i}g(x_{i})-(1-\xi_{i})]=0\\ &\therefore y_ig(x_i) = 1-\xi_i \le 1 \end{aligned}

当然我们也可以给定一定的精度范围$$\varepsilon$$，此时KKT条件就变成了：

$$$\begin{array}{l} a_{i}=0 \Leftrightarrow y_{i} g\left(x_{i}\right) \geq 1-\varepsilon \\ 0<a_{i}<C \Leftrightarrow 1-\varepsilon \leq y_{i} g\left(x_{i}\right) \leq 1+\varepsilon \\ a_{i}=C \Leftrightarrow y_{i} g\left(x_{i}\right) \leq 1+\varepsilon \end{array}$$$

然后我们通过变形后的KKT条件，获得违背的样本点违背最严重的作为第一个变量就🆗了。那么如何度量这个严重性呢？emm，就看$$g\left(x_{i}\right)$$距离KKT条件有多远就行了。

2. 第二个变量的选择

第二个变量选择的过程称之为内层循环，其标准是希望能够使$$\alpha_2$$有足够大的变化。由式$$(27)$$我们知道：

$$$\alpha_{2}^{\text {new }, \mathrm{unc}}=\alpha_{2}^{\text {old }}+\frac{y_{2}\left(E_{1}-E_{2}\right)}{\eta}$$$

也就是说$$\alpha_2$$的变化量依赖于$$|E_1 - E_2|$$，因此我们可以选择式$$|E_1 - E_2|$$最大的$$\alpha_2$$。因为$$\alpha_1$$已经确定，所以$$E_1$$也就已经确定，因此我们只需要确定$$E_2$$即可。如果$$E_1$$为正，则选取$$\alpha_2$$使$$E_2$$最小，如果$$E_1$$为负，则选取$$\alpha_2$$使$$E_2$$最大。

• 若更新后的$$0<\alpha_{1} <C$$由式$$(28)$$中的式$$(2)$$可知：

$$$\sum_{i=1}^{N} \alpha_{i} y_{i} K_{i 1}+b=y_{1}$$$

​ 于是有：

$$$b_{1}^{\mathrm{new}}=y_{1}-\sum_{i=3}^{N} \alpha_{i} y_{i} K_{i 1}-\alpha_{1}^{\mathrm{new}} y_{1} K_{11}-\alpha_{2}^{\mathrm{new}} y_{2} K_{21}$$$

​ 由$$E_i$$的定义式$$$$E_{i}=g\left(x_{i}\right)-y_{i}=\left(\sum_{j=1}^{N} \alpha_{j} y_{j} K\left(x_{j}, x_{i}\right)+b\right)-y_{i}, \quad i=1,2$$$$，有：

$$$E_{1}=\sum_{i=3}^{N} \alpha_{i} y_{i} K_{i 1}+\alpha_{1}^{\mathrm{old}} y_{1} K_{11}+\alpha_{2}^{\mathrm{old}} y_{2} K_{21}+b^{\mathrm{old}}-y_{1}$$$

​ 因此则有：

$$$y_{1}-\sum_{i=3}^{N} \alpha_{i} y_{i} K_{i 1}=-E_{1}+\alpha_{1}^{\text {old }} y_{1} K_{11}+\alpha_{2}^{\text {old }} y_{2} K_{21}+b^{\text {old }}$$$

​ 最终：

$$$b_{1}^{\text {new }}=-E_{1}-y_{1} K_{11}\left(\alpha_{1}^{\text {new }}-\alpha_{1}^{\text {old }}\right)-y_{2} K_{21}\left(\alpha_{2}^{\text {new }}-\alpha_{2}^{\text {old }}\right)+b^{\text {old }}$$$

• 同理若$0<\alpha_{2} \lt C$，则有

$$$b_{2}^{\text {new }}=-E_{2}-y_{1} K_{12}\left(\alpha_{1}^{\text {new }}-\alpha_{1}^{\text {old }}\right)-y_{2} K_{22}\left(\alpha_{2}^{\text {new }}-\alpha_{2}^{\text {old }}\right)+b^{\text {old }}$$$

• $$\alpha_1^{new},\alpha_2^{new}$$同时满足$$0<\alpha_{i}^{new} \lt C$$，则最终：

$b^{new} = \frac{b_1^{new}+b_2^{new}}{2}$

• $$\alpha_1^{new},\alpha_2^{new}$$$$0$$或者$$C$$，那么最终：

$b^{new} = \frac{b_1^{new}+b_2^{new}}{2}$

综上：

$$$b=\left\{\begin{array}{ll} b_{1}^{new}, & 0<\alpha_{1}<C \\ b_{2}^{new}, & 0<\alpha_{2}<C \\ \frac{1}{2}\left(b_{1}^{new}+b_{2}^{new}\right), & \text { others } \end{array}\right.$$$

$$$E_{1}=\sum_{i=3}^{N} \alpha_{i} y_{i} K_{i 1}+\alpha_{1}^{\mathrm{new}} y_{1} K_{11}+\alpha_{2}^{\mathrm{new}} y_{2}K_{21}+b^{\mathrm{new}}-y_{1}\\ E_{2}=\sum_{i=3}^{N} \alpha_{i} y_{i} K_{i 2}+\alpha_{1}^{\mathrm{new}} y_{1} K_{12}+\alpha_{2}^{\mathrm{new}} y_{2}K_{22}+b^{\mathrm{new}}-y_{2}\\$$$

## 总结

### 参考

posted @ 2020-04-13 00:08  渣渣辉啊  阅读(2205)  评论(0编辑  收藏  举报