Stanford机器学习---第六讲. 怎样选择机器学习方法、系统

===============================

☆模型选择和训练、验证实验数据

☆区别诊断偏离bias和偏差variance

☆规则化 和 bias/variance

Learning Curve：什么时候增加训练数据training set才是有效的？

===============================

Get more training examples
Try smaller sets of features
Try adding polynomial features (e.g.  x1^2, x2^2, x1x2...)
Try decreasing λ
Try increasing λ

Machine Learning 方法的诊断：

- 什么是诊断Dignostic呢？诊断就是能够判断一种学习算法能不能work，并且改善该算法性能的一个测试。

Diagnostic: A test that you can run to gain insight what is/isn't working with a learning algorithm, and gain guidance as to how best to improve its performance.

-诊断的效果：Diagnostics can take time to implement, but doing so can be a very good use of your time.

===============================

-线性回归的error：

-逻辑回归的error：

===============================

-首先，建立d个model 假设（图中有10个，d表示其id），分别在training set 上求使其training error最小的θ向量，那么得到d个θ

-然后，对这d个model假设，带入θ，在cross validation set上计算J(cv)，即cv set error最小的一个model 作为 hypothesis，如下图中J(cv)在第4组中最小，便取d=4的假设。

PS: 其实d表示dimension，也就是维度，表示该hypothesis的最大polynomial项是d维的。

PS': 一般地，J(cv)是大于等于J(train)的

===============================

bias：J(train)大，J(cv)大，J(train)≈J(cv)，bias产生于d小，underfit阶段；

variance：J(train)小，J(cv)大，J(train)<<J(cv)，variance产生于d大，overfit阶段；

-------------------------------------------------------------

MSE = 1/n * Σ(f(x)-t(x))^2

MSE（mean square error） = Bias2 + Variance +noise

Variance: measures the extent to which the solutions for individual data sets vary around their average, hence this measures the extent to which the function f(x) is sensitive to theparticular choice of data set.

Bias: represents the extent to which the average prediction over all data sets differs from the desired regression function.

Our goal is to minimize the expected loss, which we havedecomposed into the sum of a (squared) bias, a variance, and a constant noiseterm. As we shall see, there is a trade-off between bias and variance, with very flexible models（overfit） having low bias and high variance, and relatively rigid models（underfit） having high bias and low variance

variance：估计本身的方差。

bias：估计的期望和样本数据样本希望得到的回归函数之间的差别。

variance是指，这20条估计曲线与最后估计期望（均值）之间的距离，也就是估计曲线本身的方差，是不可能为0的。

bias是指，20条估计曲线的均值与实际最佳拟合情况之间的距离。

λ小 , d大 -> overfit（flexible） ->

bias是估计均值与实际值期望的偏差 -> bias小

λ大 , d小 -> underfit（stable） ->

bias是估计均值与实际值期望的偏差 ，不能很好地进行回归-> bias大

===============================

λ太小导致overfit，产生variance，J(train)<<J(cv)

λ太大导致underfit，产生bias，J(train) ≈ J(cv)

===============================

Learning Curve：什么时候增加训练数据training set才是有效的？

Underfit 的 High bias: 增加m无济于事！

Overfit的 High Variance: 增加m使得J(train)和J(cv)之间gap减小，有助于performance提高！

posted @ 2015-12-07 10:56  莫小  阅读(978)  评论(0编辑  收藏