• 博客园logo
  • 会员
  • 众包
  • 新闻
  • 博问
  • 闪存
  • 赞助商
  • HarmonyOS
  • Chat2DB
    • 搜索
      所有博客
    • 搜索
      当前博客
  • 写随笔 我的博客 短消息 简洁模式
    用户头像
    我的博客 我的园子 账号设置 会员中心 简洁模式 ... 退出登录
    注册 登录
Vpegasus
E-mail: pegasus.wenjia@foxmail.com
博客园    首页    新随笔    联系   管理    订阅  订阅
Adaboost

Boosting is a very powerful technique of alogrithms ensembling. Its outstanding performance achieved by combining some or many weak classifiers to make a strong one. like Bagging, it also vote to judge a sample's catagory, but there is a significant difference: base classifiers that are made up of the strong classifier usually has different 'voting right'. The mostly used form of boosting is adaptive boosting( adaboost), is the one will talk about in detail.

There is a key point on training the difference between bagging and adaboost: the base classifiers are trained in sequence,which has to be, as the performance of previous classifiers determine the following's weights of sample and the 'voting right' of the following classifier.

The points that are misclassified will be assigned to a larger weights and the right ones to a small weights. Then in the following base classifier, 'error' points will attract more 'consideration', repeat the process, once all the base classifiers have been trained,  the final prediction will combine all the classifiers' choices, and choose the largest weights of catagory as a point's right class.

Consider a two-class classification problem, in which the training data comprise to vector \(x_1,x_2,...,x_N\) along with the corresponding binary variables \(y_1,y_2,...,y_N\), where \(y_n \in \{-1,1\}\).  And we have procedure availiable for training a base classifier m  using weighted data to give a function \(G_n(x) \in \{-1,1\}\). Each data point is given an associated weighting parametere \(w_n^{(1)}\), which is initially set 1/N for all data points.

AdaBoost Process:

1. Initialize the data weighting coeffecients \(\{w_n\}\) by setting \(w_n^{(1)} = 1/N\) for \(n = 1,2,...,N\).

2. For m = 1,2,...,M:

  (a): Fit a classifier m to train data by minimizing the weighted error function:

      \(e_m = P(G_m(x_i) \neq y_i) = \sum_{i=1}^{N}(w_{mi}I(G_m(x_i) \neq y_i)\) 

PS:  as classifier m is a weak classifier, so we can naturally assume the e_m will always greater than 0.

  (b):  Calculate m classifier's coefficient \(\alpha_m\):

      \(\alpha_m = 1/2 * \ln\frac{1 - e_m}{e_m}\)

PS:the cofficient 1/2 dissapear in some reffences; \(\alpha_m\) has two functions: update weights \(w_i^{(m)}\), and then as final cofficient in the final prediction function.

from the formula, we could find the \(\alpha_m\) is negative proportional to the \(e_m\).

  (c) Update the weights of sample points:

    \(w_i^{(m+1)} =\frac{ w_i^{(m)}}{Z_m} * exp(\alpha_m I(G_m(x_i) \neq t_i))\); where 

\( Z_m = \sum_{i=1}^{N}w_i^{(m)}exp(\alpha_m I(G_m(x_i) \neq t_i))\).

3. Make predictions by ensembling all the classifiers and their own \(\alpha_m\) as cofficients:

  \(f(x)  = \sum_{i=1}^{M} \alpha_m G_m(x)\)

  \(G(X) = sign(f(x)) = sign(\sum_{m =1}^{M}\alpha_m G_m(X))\)

 

Actually, the Adaboost is an instance of Additive Model(AM):

      \(f(x) = \sum_{m = 1}^{M}\beta_m b(x;r_m)\)

  \(f_m(x) = f_{m-1}(x) + \beta_m b(x;r_m)\)

where \(b(x;r_m)\) is called base function, and \(r_m\) is the parameters of base function, and the  \(\beta_m\) is the cofficient of base function.

Given data and loss function(L), the target of AM is to minimize the combined loss function:

   T =  \(\arg\min_{\beta_m,r_m} \sum_{i =1}^N L(yi, \sum{m=1}^M \beta_m b(x_i; r_m))\)

and this can be simplified by forward stepwise algorithms:

every step we just to calculate:

\(\min_{\beta,r} \sum_{i = }^{N} L(y_i, \beta b(x_i, r))\)

 

 

 

Reference:

1. http://blog.csdn.net/v_july_v/article/details/40718799.

2. Pattern Recogintion and Machine learning 657-662.

posted on 2017-05-20 13:40  Vpegasus  阅读(250)  评论(0)    收藏  举报
刷新页面返回顶部
博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3