集成学习算法

集成学习算法

　　在统计学和机器学习中，集成方法使用多种学习算法来获得比单独从任何组成学习算法获得更好的预测性能。Ensemble methods（组合方法）的效果往往是最好的，当然需要消耗的训练时间也会拉长。

　　所谓Ensemble methods，就是把几种机器学习的算法组合到一起，或者把一种算法的不同参数组合到一起。

一 Blending and Bagging

Motivation of Aggregation
Uniform Blending
Linear and Any Blending
Bagging (Bootstrap Aggregation

Selection by Validation

bootstrap sample D˜_t: re-sample N examples from D uniformly with replacement—can also use arbitrary N‘ instead of original N

二 AdaBoost

三 Decision Tree

决策树的两种定义：

四 Random Forest

Random Forest Algorithm
Out-Of-Bag Estimate
Feature Selection
Random Forest in Action

Random Forest Algorithm

Bagging—reduces variance by voting/averaging
Decision Tree—large variance especially if fully-grown对数据敏感
putting them together?
(i.e. aggregation of aggregation :-) )

2.another possibility for diversity:randomly sample d' features from x

d'<<d, efficient for large d

original RF re-sample new subspace for each b(x) in C&RT

RF = bagging + random-subspace C&RT

3.combination with random row pi of P: φ_i(x) = p_i^T x

RF = bagging + random-combination C&RT
—randomness everywhere!

Out-Of-Bag Estimate

Feature Selection

通过permutation test，即将一个特征的取值随机打乱，来测试这个特征的重要性。

Random Forest in Action

五 Gradient Boosted Decision Tree

Adaptive Boosted Decision Tree
Optimization View of AdaBoost
Gradient Boosting
Summary of Aggregation Models

Adaptive Boosted Decision Tree

weighted DTree(D, u(t))：

weights u expressed by sampling proportional to u_n—request size-N‘ data D˜ by sampling ∝ u on D

AdaBoost-DTree: often via AdaBoost + sampling ∝ u(t) + pruned DTree(D˜)

AdaBoost-Stump = special case of AdaBoost-DTree

Gradient Boosting

summary

来源：《统计学习方法》-李航

　　　机器学习技法公开课-台大林轩田

posted @ 2017-03-15 17:26 zcbmxvn987 阅读(393) 评论(0) 收藏举报

刷新页面返回顶部