xxdd123321

导航

 

岭回归

过拟合处理:正则化

  • 将过拟合的曲线的凹凸幅度减少即可将过拟合曲线趋近于拟合曲线

  • 正则化可以通过不断尝试发现高次项的特征,然后将这些特征的权重w调小到0,则高次项特征消失,凹凸幅度便减少,趋近于拟合曲线

L2正则化:

  • 使用带有正则化算法的回归模型(Ridge岭回归)处理过拟合的问题

Ridge岭回归:具备L2正则化的线性回归模型

  • API: from sklearn.linear_model import Ridge

  • Ridge(alpha=1.0):

    • alpha:正则化力度,力度越大,则表示高次项的权重w越接近于0,导致过拟合曲线的凹凸幅度越小

      • 取值:0-1小数或1-10整数

    • coef_:回归系数

  • 岭回归的优点:

    • 获取的回归系数更符合实际,更可靠

    • 在病态数据(异常值多的数据)偏多的研究中有更大的存在意义

  • 使用

    from sklearn.preprocessing import PolynomialFeatures
    from sklearn.linear_model import LinearRegression
    from sklearn.linear_model import Ridge
    import numpy as np
    import matplotlib.pyplot as plt
    import joblib


    # 岭回归
    # 样本的训练数据,特征和目标值
    x_train = [[6], [8], [10], [14], [18]]  # 大小
    y_train = [[7], [9], [13], [17.5], [18]]  # 大小
    poly3 = PolynomialFeatures(degree=3)
    x_train_poly3 = poly3.fit_transform(x_train)
    # 建立模型预测
    regressor_poly3 = LinearRegression()
    regressor_poly3.fit(x_train_poly3, y_train)
    print(regressor_poly3.coef_)

    print("alpha = 0.1")
    # 使用岭回归可以通过控制正则化力度参数alpha降低高次项特征的权重
    regressor_poly3 = Ridge(alpha=0.1)
    regressor_poly3.fit(x_train_poly3, y_train)
    print(regressor_poly3.coef_)
    print("alpha = 0.3")
    regressor_poly3 = Ridge(alpha=0.3)
    regressor_poly3.fit(x_train_poly3, y_train)
    print(regressor_poly3.coef_)
    print("alpha = 0.5")
    regressor_poly3 = Ridge(alpha=0.5)
    regressor_poly3.fit(x_train_poly3, y_train)
    print(regressor_poly3.coef_)
    print("alpha = 0.7")
    regressor_poly3 = Ridge(alpha=0.7)
    regressor_poly3.fit(x_train_poly3, y_train)
    print(regressor_poly3.coef_)


    [[ 0.         -1.42626096  0.31320489 -0.01103344]]
    alpha = 0.1
    [[ 0.         -0.54302902  0.23512062 -0.00888958]]
    alpha = 0.3
    [[ 0.         -0.23508271  0.20785367 -0.00813998]]
    alpha = 0.5
    [[ 0.         -0.14579637  0.19991159 -0.00792083]]
    alpha = 0.7
    [[ 0.         -0.10329945  0.19610472 -0.00781518]]

    Process finished with exit code 0

模型的保存和加载

import joblib

  • joblib.dump(knn, './123.m')

  • knn = joblib.load('./123.m')

# 建立并保存模型

iris = datasets.load_iris()
feature = iris['data']
target = iris['target']
x_train, x_test, y_train, y_test = train_test_split(feature, target, test_size=0.2, random_state=2020)
knn = KNeighborsClassifier(n_neighbors=3)
knn = knn.fit(x_train, y_train)
joblib.dump(knn, './knn.m')
y_pred = knn.predict(x_test)
y_true = y_test
print("模型分类结果", y_pred)
print("真实分类结果", y_true)
# 输入测试特征与测试标签
score = knn.score(x_test, y_test)
print(score)

# 使用模型
iris = datasets.load_iris()
feature = iris['data']
target = iris['target']
x_train, x_test, y_train, y_test = train_test_split(feature, target, test_size=0.2, random_state=2020)
# 加载模型
knn = joblib.load('./knn.m')
y_pred = knn.predict(x_test)
y_true = y_test
print("模型分类结果", y_pred)
print("真实分类结果", y_true)
# 输入测试特征与测试标签
score = knn.score(x_test, y_test)
print(score)
posted on 2022-08-02 16:54  xxdd123321  阅读(272)  评论(0)    收藏  举报