集成学习算法
集成学习算法
在统计学和机器学习中,集成方法使用多种学习算法来获得比单独从任何组成学习算法获得更好的预测性能。Ensemble methods(组合方法)的效果往往是最好的,当然需要消耗的训练时间也会拉长。
所谓Ensemble methods,就是把几种机器学习的算法组合到一起,或者把一种算法的不同参数组合到一起。
一 Blending and Bagging
- Motivation of Aggregation
- Uniform Blending
- Linear and Any Blending
- Bagging (Bootstrap Aggregation
Selection by Validation



bootstrap sample D˜t: re-sample N examples from D uniformly with replacement—can also use arbitrary N‘ instead of original N
二 AdaBoost



三 Decision Tree

决策树的两种定义:





四 Random Forest
- Random Forest Algorithm
- Out-Of-Bag Estimate
- Feature Selection
- Random Forest in Action
-
Random Forest Algorithm
Bagging—reduces variance by voting/averaging
Decision Tree—large variance especially if fully-grown对数据敏感
putting them together?
(i.e. aggregation of aggregation :-) )
2.another possibility for diversity:randomly sample d' features from x
d'<<d, efficient for large d
original RF re-sample new subspace for each b(x) in C&RT
RF = bagging + random-subspace C&RT
3.combination with random row pi of P: φi(x) = piT x
RF = bagging + random-combination C&RT
—randomness everywhere!
-
Out-Of-Bag Estimate


-
Feature Selection
通过permutation test,即将一个特征的取值随机打乱,来测试这个特征的重要性。

-
Random Forest in Action


五 Gradient Boosted Decision Tree
- Adaptive Boosted Decision Tree
- Optimization View of AdaBoost
- Gradient Boosting
- Summary of Aggregation Models
Adaptive Boosted Decision Tree
weighted DTree(D, u(t)):
weights u expressed by sampling proportional to un —request size-N‘ data D˜ by sampling ∝ u on D
AdaBoost-DTree: often via AdaBoost + sampling ∝ u(t) + pruned DTree(D˜)
AdaBoost-Stump = special case of AdaBoost-DTree
Gradient Boosting

summary



来源:《统计学习方法》-李航
机器学习技法公开课-台大林轩田

浙公网安备 33010602011771号