机器学习相关知识整理系列之三：Boosting算法原理，GBDT&XGBoost

2. Boosting算法推导

$$L(y,F(\vec x))$$的典型定义为：$$L(y,F(\vec x)) = \frac{1}{2}(y-F(\vec x))^2$$

$L(y,F(\vec x)) = | y-F(\vec x) |$

$F(\vec x) = \sum_{i=1}^{M}\gamma _if_i(\vec x) + C$

$F_0(\vec x) = \arg\min_c \sum_{i=1}^nL(y_i,c)$

$F_m(\vec x) = F_{m-1}(\vec x) + \arg\min_{f \in H}\sum_{i=1}^nL(y_i,F_{m-1}(\vec x_i) + f(\vec x_i))$

$F_m(\vec x) = F_{m-1}(\vec x) + \gamma_m \sum_{i=1}^n \Delta_fL(y_i,F_{m-1}(\vec x_i))$

$\gamma_m = \arg\min_\gamma \sum_{i=1}^{n}L(y_i,F_{m-1}(\vec x_i) - \gamma\cdot\Delta_fL(y_i,F_{m-1}(\vec x_i)))$

(1)初始给定模型为常数$$F_0(\vec x)$$，对于$$m=1$$$$M$$:
(2)计算伪残差：$$\gamma_{im} = \left[\frac{\partial L(y_i,F(\vec x_i))}{\partial F(\vec x_i)}\right]{F(\vec x) = F(\vec x)},i=1,2,...,n$$
(3)使用数据$$\left\{ (\vec x_i,\gamma_{im})\right\}^n_{i=1}$$计算拟合残差的基函数$$f_m(x)$$
(4)计算步长$$\gamma_m = \arg\min_\gamma \sum_{i=1}^{n}L(y_i,F_{m-1}(\vec x_i) - \gamma\cdot f_m(\vec x_i)))$$

(5)更新模型$$F_m(\vec x) = F_{m-1}(\vec x) - \gamma_m f_m(\vec x_i)$$

3. GBDT算法推导

$T_m(\vec x) = \sum_{j=1}^J b_{jm}I(\vec x \in R_{jm})其中，b_{jm}是样本x在区域R_{jm}的预测值（常量）。 使用线性搜索计算学习率，最小化损失函数 F_m(\vec x) = F_{m-1}(\vec x) + \gamma_m \cdot T_m(\vec x)$

$\gamma_m = \arg\min_\gamma \sum_{i=1}^n L(y_i,F_{m-1}(\vec x_i) + \gamma \cdot T_m(\vec x_i))$

$F_m(\vec x) = F_{m-1}(\vec x) + \sum_{j=1}^{J}\gamma_{jm} \cdot I(\vec x \in R_{jm})$

$\gamma_{jm} = \arg\min_\gamma \sum_{\vec x_i \in R_{jm}} L(y_i,F_{m-1}(\vec x_i) + \gamma \cdot T_m(\vec x_i))$

$L\left(y, f_{m-1}(\vec x) + \gamma_m \cdot T_m(\vec x)\right) = \left[y-f_{m-1}(\vec x) - \gamma_m T_m(\vec x)\right] ^2 = \left[ r - \gamma_mT_m(\vec x)\right]^2$

4. XGBoost算法推导

$J(f_t) = \sum_{i=1}^nL(y_i, \hat y_i^{(t-1)} + f_t(x_i)) + \Omega(f_t) + C$

$= \sum_{i=1}^n \left[g_if_t(x_i) + \frac{1}{2}h_if_i^2(x_i)\right] + \Omega(f_t) + C$

$= \sum_{i=1}^n \left[g_i w_{q(x_i)} + \frac{1}{2}h_i w_{q(x_i)}^2\right] + \gamma^ {T_t} + \frac{1}{2}\lambda \sum_{j=1}^{T_t} w_j^2 + C$

$=\sum_{j=1}^{T_t} \left[(\sum_{i \in I_j}g_i)w_j + \frac{1}{2}(\sum_{i \in I_j}h_i)w_j^2\right] +\gamma^ {T_t} + \frac{1}{2}\lambda \sum_{j=1}^{T_t} w_j^2 + C$

$=\sum_{j=1}^{T_t} \left[(\sum_{i \in I_j}g_i)w_j + \frac{1}{2}(\sum_{i \in I_j}h_i + \lambda)w_j^2\right] +\gamma^ {T_t} + C$

$$w$$求偏导，得：$$\frac{\partial J(f_t)}{\partial w_j} = G_j + (H_j + \lambda) w_j$$令偏导等于0，得到：$$w_j = - \frac {G_j}{H_j + \lambda}$$

• 对于某可行划分，计算划分后的$$J(f)$$;
• 对于所有可行划分，选择$$J(f)$$降低最小的分割点。

（1）使用具有权值分布$$D_m$$的训练数据集学习，得到基本分类器：$$G_m(x): \chi \to \{-1,1\}$$
（2）计算$$G_m(x)$$在训练数据集上的分类误差率：$$e_m = P(G_m(x) \ne y_i) = \sum_{i=1}^N w_{mi}I(G_m(x) \ne y_i)$$
（3）计算$$G_m(x)$$的系数：$$\alpha_m = \frac{1}{2}\log\frac{1-e_m}{e_m}$$，底取自然对数。
（4）更新训练数据集的权值分布：$$D_{m+1} = (w_{m+1,1},w_{m+1,2},...,w_{m+1,i},...,w_{m+1,N})$$

$w_{m+1,i} = \frac{w_{mi}}{Z_m} \exp(-\alpha_my_iG_m(x_i)), i=1,2,...,N这里Z_m是规范化因子，Z_m = \sum_{i=1}^N w_{mi}\exp(-\alpha_my_iG_m(x_i))，它使D_{m+1}成为一个概率分布。 （5）构建基本分类器的线性组合：f(x) = \sum_{m=1}^{M} \alpha_m G_m(x) ，得到最终分类器： G(x) = sign(f(x)) = sign( \sum_{m=1}^{M} \alpha_m G_m(x) )$

posted @ 2017-03-12 15:58  Farnear  阅读(11165)  评论(0编辑  收藏  举报