Lasso回归

Lasso 是一个线性模型，它给出的模型具有稀疏的系数（sparse coefficients）。它在一些场景中是很有用的，因为它倾向于使用较少参数的情况，能够有效减少给定解决方案所依赖变量的个数。因此，Lasso 及其变体是压缩感知（compressed sensing）领域的基础。在某些特定条件下，它能够恢复非零权重的精确解。

在数学公式表达上，它由一个带有l1先验的正则项的线性模型组成。其最小化的目标函数是：

min_{w} \frac{1}{2 n_{s a m p l e s}} | | X w - y | |_{2}^{2} + α | | w | |_{1}

lasso estimator 解决了加上惩罚项 α||ω||1的最小二乘的最小化，其中，α是一个常数，||ω||1是参数向量l1-norm的范数。

from sklearn.linear_model import Lasso

lasso = Lasso()
lasso.fit([[0, 0], [1, 1]], [0,1])
print("coef: {}".format(lasso.coef_))
print(lasso.predict([[1, 1]]))

coef: [0. 0.]
[0.5]

from sklearn.linear_model import Lasso

lasso01 = Lasso(alpha=0.1)
lasso01.fit([[0, 0], [1, 1]], [0,1])
print("coef: {}".format(lasso01.coef_))
print(lasso01.predict([[1, 1]]))

coef: [0.6 0. ]
[0.8]

在人工产生的被加性噪声污染的稀疏信号上估计Lasso和Elastic-Net回归模型。估计出的稀疏与真实的稀疏进行比较。

print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import r2_score
%matplotlib notebook

# 产生一些稀疏数据
np.random.seed(42)
n_samples, n_features = 50, 200
X = np.random.randn(n_samples, n_features)  # randn(...)产生的是正态分布的数据
coef = 3 * np.random.randn(n_features)  # 每个特征对应一个系数
inds = np.arange(n_features)
np.random.shuffle(inds)
coef[inds[10:]] = 0  # 稀疏化系数--随机地把系数向量1x200的其中190个值变为0
y = np.dot(X, coef)  # 线性运算--y = X .*w

# 添加噪声：零均值，标准差为0.01的高斯噪声
y += 0.01 * np.random.normal(size=n_samples)

# 将数据划分为训练集和测试集
n_samples = X.shape[0]
X_train, y_train = X[: n_samples // 2], y[: n_samples // 2]
X_test, y_test = X[n_samples // 2: ], y[n_samples // 2: ]

# 训练 Lasso 模型
from sklearn.linear_model import Lasso

alpha = 0.1
lasso = Lasso(alpha=alpha)

y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)
r2_score_lasso = r2_score(y_test, y_pred_lasso)
print(lasso)
print("r^2 on test data:\n{:.2f}".format(r2_score_lasso))

# 训练 ElasticNet 模型
from sklearn.linear_model import ElasticNet

enet = ElasticNet(alpha=alpha, l1_ratio=0.7)
y_pred_enet = enet.fit(X_train, y_train).predict(X_test)
r2_score_enet = r2_score(y_test, y_pred_enet)
print(enet)
print("r^2 on test data:\n{:.2f}".format(r2_score_enet))

# 画图
plt.plot(enet.coef_, color='lightgreen', linewidth=2, label='Elastic net coefficients')
plt.plot(lasso.coef_, color='gold', linewidth=2, label='Lasso coefficients')
plt.plot(coef, '--', color='navy', label='original coefficient')
plt.legend(loc='best')
plt.title("Lasso r^2: {:.2f}, ElasticNet r^2: {:.2f}".format(r2_score_lasso, r2_score_enet))

Automatically created module for IPython interactive environment
Lasso(alpha=0.1, copy_X=True, fit_intercept=True, max_iter=1000,
   normalize=False, positive=False, precompute=False, random_state=None,
   selection='cyclic', tol=0.0001, warm_start=False)
r^2 on test data:
0.39
ElasticNet(alpha=0.1, copy_X=True, fit_intercept=True, l1_ratio=0.7,
      max_iter=1000, normalize=False, positive=False, precompute=False,
      random_state=None, selection='cyclic', tol=0.0001, warm_start=False)
r^2 on test data:
0.24


<IPython.core.display.Javascript object>

Text(0.5,1,'Lasso r^2: 0.39, ElasticNet r^2: 0.24')

设置正则化参数

alpha 参数控制着估计出的模型的系数的稀疏度

使用交叉验证

scikit-learn 通过交叉验证来公开设置 Lasso alpha 参数的对象：LassoCV 和 LassoLarsCV。LassoLarsCV是基于最小角回归的算法。

对于带有很多共线回归器（collinearity）的高维数据集，LassoCV 是经常被选择的模型。然而，LassoLarsCV在寻找更有相关性的 alpha 参数值上更有优势，而且如果样本数量与特征数量相比非常小时，通常LassoLarsCV比LassoCV要快。

基于信息标准的模型选择

作为替代方案，估计器 LassoLarsIC 建议使用Akaike信息准则（AIC）和Bayes信息准则（BIC）。使用基于信息准则的方法寻找alpha的最优值是一种计算成本较低的方法，因为这种方法中正则化路径只计算一次而不是使用k-fold交叉验证时的k+1次。然而，这类准则需要对解的自由度进行适当的估计，是为大样本（渐近结果）导出的，并假定模型是正确的（即数据实际上是由该模型生成的）。当问题条件数不好（特征数大于样本数），模型可能会崩溃。

对于交叉验证，使用两种算法在20-fold上计算Lasso路径（path）：坐标下降（由LassoCV类实现）和Lars（最小角回归）（由LassoLarsCV类实现）。这两种算法给出的结果大致相同。它们在执行速度和数值误差来源方面存在差异。

Lars只为路径中的每个扭结（Kink）计算其路径解（path solution）。因此，当只有很少的扭结时，它是非常有效的，如果有很少的特征或样本那么扭结就会很少。此外，它能够计算完整的路径而不设置任何元参数。相反，坐标下降法计算预先指定的网格上的路径点（这里使用默认值）。因此，如果网格点的数目小于路径中的扭结数，坐标下降法效率更高。在数值误差方面，对于高度相关的变量，Lars会积累更多的误差，而坐标下降算法只会对网格上的路径进行采样。

注意alpha的最优值在每个fold上是如何变化的。这说明了为什么在试图评估通过交叉验证选择参数的方法的性能时，嵌套交叉验证时必要的：对于未见数据，这种参数选择可能不是最优的。

print(__doc__)

import time

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import LassoCV, LassoLarsCV, LassoLarsIC
from sklearn import datasets

diabetes = datasets.load_diabetes()
X = diabetes.data
y = diabetes.target

rng = np.random.RandomState(42)
X = np.c_[X, rng.randn(X.shape[0], 14)]  # 添加一些坏特征

# 将数据标准化以便比较
X /= np.sqrt(np.sum(X ** 2, axis=0))

# LassoLarsIC：使用BIC/AIC 准则的最小角回归（Lars）

model_bic = LassoLarsIC(criterion='bic')
t1 = time.time()
model_bic.fit(X, y)
t_bic = time.time() - t1
alpha_bic = model_bic.alpha_

model_aic = LassoLarsIC(criterion='aic')
model_aic.fit(X, y)
alpha_aic = model_aic.alpha_

def plot_ic_criterion(model, name, color):
    alpha_ = model.alpha_
    alphas_ = model.alphas_
    criterion_ = model.criterion_
    plt.plot(-np.log10(alphas_), criterion_, '--', color=color, linewidth=3, label='{} criterion'.format(name))
    plt.axvline(-np.log10(alpha_), color=color, linewidth=3, label='alpha: {} estimate'.format(name))
    plt.xlabel('-log(alpha)')
    plt.ylabel('criterion')

plt.figure()
plot_ic_criterion(model_aic, 'AIC', 'b')
plot_ic_criterion(model_bic, 'BIC', 'r')
plt.legend()
plt.title('Information-criterion for model selection (training time %.3fs)' % t_bic)

# LassoCV：坐标下降法（coordinate descent）

# 计算正则化路径
print("Computing regularization path using the coordinate descent lasso...")
t1 = time.time()
model = LassoCV(cv=20).fit(X, y)
t_lasso_cv = time.time() - t1

# 展示结果
m_log_alphas = -np.log10(model.alphas_)

plt.figure()
ymin, ymax = 2300, 3800
plt.plot(m_log_alphas, model.mse_path_, ':')
plt.plot(m_log_alphas, model.mse_path_.mean(axis=-1), 'k', label='Average across the folds', linewidth=2)
plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k', label='alpha: CV estimate')

plt.legend()
plt.xlabel('-log(alpha)')
plt.ylabel('Mean square error')
plt.title("Mean square eoor on each fold: coordinate descent (train time: {:.2f}s)".format(t_lasso_cv))

plt.axis('tight')
plt.ylim(ymin, ymax)

# LassoLarsCV：最小角回归（Least angle regression）

# 计算正则化路径
print("Computing regularization path using the Lars lasso...")
t1 = time.time()
model = LassoLarsCV(cv=20).fit(X, y)
t_lasso_lars_cv = time.time() - t1

# 展示结果
m_log_alphas = -np.log10(model.cv_alphas_)

plt.figure()
plt.plot(m_log_alphas, model.mse_path_, ':')
plt.plot(m_log_alphas, model.mse_path_.mean(axis=1), 'k', label='Average across the folds', linewidth=2)
plt.axvline(-np.log10(model.alpha_), linestyle='--', color='k', label='alpha CV')
plt.legend()
plt.xlabel('-log(alpha)')
plt.ylabel('Mean square error')
plt.title('Mean square error on ecah fold: Lars(train time {:.2f}s)'.format(t_lasso_lars_cv))
plt.axis('tight')
plt.ylim(ymin, ymax)

Automatically created module for IPython interactive environment


<IPython.core.display.Javascript object>

Computing regularization path using the coordinate descent lasso...

C:\Users\Administrator\Anaconda3\lib\site-packages\ipykernel_launcher.py:37: RuntimeWarning: divide by zero encountered in log10


<IPython.core.display.Javascript object>

Computing regularization path using the Lars lasso...

C:\Users\Administrator\Anaconda3\lib\site-packages\ipykernel_launcher.py:82: RuntimeWarning: divide by zero encountered in log10


<IPython.core.display.Javascript object>

(2300, 3800)

与SVM的正则化参数的比较

alpha 和 SVM 的正则化参数 C 之间的等式关系是 alpha = 1 / C 或者 alpha = 1 / (n_samples * C)，并依赖于估计器和模型优化的确切的目标函数。

posted @ 2020-05-08 11:06 wsilj 阅读(2524) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

wsilj

Lasso回归

设置正则化参数

使用交叉验证

基于信息标准的模型选择

与SVM的正则化参数的比较

公告