Lecture7 Regularization
Lecture7 Regularization
Overfitting : If we have too many features,the learned hypothesis may fit the training set very well,but fail to generalize to new examples(predict prices on new examples)
Addressing overfitting :
- Reduce numbeer of features
- Manually select which features to keep
- Model selection algorithm
- Regularization
- Keep all the features,but reduce magnitude/values of parameters \(\theta_j\)
- Works well when we have a lot of features,each of which contributes a bit to predicting y
Cost function
\[J(\theta) = \frac{1}{2m}[\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})^2 + \lambda \sum^n_{j=1}\theta_j^2]
\]
if \(\lambda\) is set to an extremely large value:
- Algorithm works fine; setting \(\lambda\) to be very large can't hurt it
- Algortihm fails to eliminate overfitting.
- Algorithm results in underfitting.(Fails to fit even training data well).
- Gradient descent will fail to converge.
Regularized linear regression
Gradient descent
Repeat{
\[\theta_0:=\theta_0 - \alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}
\\
\theta_j:=\theta_j - \alpha[\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} - \frac{\lambda}{m}\theta_j]
\]
}
Normal equation
\[\theta=(X^TX+\lambda\underbrace{\left[
\begin{matrix}
0 & 0 & 0 & \cdots & 0 \\
0 & 1 & 0 & \cdots & 0 \\
0 & 0 & 1 & \cdots & 0 \\
\vdots & \vdots & \vdots & \ddots & \vdots \\
0 & 0 & 0 & \cdots & 1 \\
\end{matrix}
\right]})^{-1}X^Ty
\\(n+1)*(n+1)
\]
Regularized logistic regression
Gradient descent
Repeat{
\[\theta_0:=\theta_0 - \alpha\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_0^{(i)}
\\
\theta_j:=\theta_j - \alpha[\frac{1}{m}\sum^m_{i=1}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)} - \frac{\lambda}{m}\theta_j]
\]
}
本文来自博客园,作者:Un-Defined,转载请保留本文署名Un-Defined,并在文章顶部注明原文链接:https://www.cnblogs.com/EIPsilly/p/15698239.html

浙公网安备 33010602011771号