过拟合处理:正则化
-
将过拟合的曲线的凹凸幅度减少即可将过拟合曲线趋近于拟合曲线
-
正则化可以通过不断尝试发现高次项的特征,然后将这些特征的权重w调小到0,则高次项特征消失,凹凸幅度便减少,趋近于拟合曲线
L2正则化:
-
使用带有正则化算法的回归模型(Ridge岭回归)处理过拟合的问题
Ridge岭回归:具备L2正则化的线性回归模型
-
API: from sklearn.linear_model import Ridge
-
Ridge(alpha=1.0):
-
alpha:正则化力度,力度越大,则表示高次项的权重w越接近于0,导致过拟合曲线的凹凸幅度越小
-
取值:0-1小数或1-10整数
-
-
-
-
岭回归的优点:
-
获取的回归系数更符合实际,更可靠
-
在病态数据(异常值多的数据)偏多的研究中有更大的存在意义
-
-
使用
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
import numpy as np
import matplotlib.pyplot as plt
import joblib
# 岭回归
# 样本的训练数据,特征和目标值
x_train = [[6], [8], [10], [14], [18]] # 大小
y_train = [[7], [9], [13], [17.5], [18]] # 大小
poly3 = PolynomialFeatures(degree=3)
x_train_poly3 = poly3.fit_transform(x_train)
# 建立模型预测
regressor_poly3 = LinearRegression()
regressor_poly3.fit(x_train_poly3, y_train)
print(regressor_poly3.coef_)
print("alpha = 0.1")
# 使用岭回归可以通过控制正则化力度参数alpha降低高次项特征的权重
regressor_poly3 = Ridge(alpha=0.1)
regressor_poly3.fit(x_train_poly3, y_train)
print(regressor_poly3.coef_)
print("alpha = 0.3")
regressor_poly3 = Ridge(alpha=0.3)
regressor_poly3.fit(x_train_poly3, y_train)
print(regressor_poly3.coef_)
print("alpha = 0.5")
regressor_poly3 = Ridge(alpha=0.5)
regressor_poly3.fit(x_train_poly3, y_train)
print(regressor_poly3.coef_)
print("alpha = 0.7")
regressor_poly3 = Ridge(alpha=0.7)
regressor_poly3.fit(x_train_poly3, y_train)
print(regressor_poly3.coef_)
[[ 0. -1.42626096 0.31320489 -0.01103344]]
alpha = 0.1
[[ 0. -0.54302902 0.23512062 -0.00888958]]
alpha = 0.3
[[ 0. -0.23508271 0.20785367 -0.00813998]]
alpha = 0.5
[[ 0. -0.14579637 0.19991159 -0.00792083]]
alpha = 0.7
[[ 0. -0.10329945 0.19610472 -0.00781518]]
Process finished with exit code 0
模型的保存和加载
import joblib
-
joblib.dump(knn, './123.m')
-
knn = joblib.load('./123.m')
# 建立并保存模型
iris = datasets.load_iris()
feature = iris['data']
target = iris['target']
x_train, x_test, y_train, y_test = train_test_split(feature, target, test_size=0.2, random_state=2020)
knn = KNeighborsClassifier(n_neighbors=3)
knn = knn.fit(x_train, y_train)
joblib.dump(knn, './knn.m')
y_pred = knn.predict(x_test)
y_true = y_test
print("模型分类结果", y_pred)
print("真实分类结果", y_true)
# 输入测试特征与测试标签
score = knn.score(x_test, y_test)
print(score)
# 使用模型
iris = datasets.load_iris()
feature = iris['data']
target = iris['target']
x_train, x_test, y_train, y_test = train_test_split(feature, target, test_size=0.2, random_state=2020)
# 加载模型
knn = joblib.load('./knn.m')
y_pred = knn.predict(x_test)
y_true = y_test
print("模型分类结果", y_pred)
print("真实分类结果", y_true)
# 输入测试特征与测试标签
score = knn.score(x_test, y_test)
print(score)
浙公网安备 33010602011771号