Pyhton 支持向量回归(SVR森林火灾燃烧面积预测实战)

一、数据读取、绘制及预处理

1.数据读取(有些指标不懂,练练手而已)

 1 # 导入第三方模块
 2 from sklearn import svm
 3 import pandas as pd
 4 from sklearn import model_selection
 5 from sklearn import metrics
 6 import seaborn as sns
 7 import matplotlib.pyplot as plt
 8 from scipy.stats import norm
 9 from sklearn import preprocessing
10 import numpy as np
11 
12 # 读取外部数据
13 forestfires = pd.read_csv(r'forestfires.csv')
14 # 数据前5行
15 forestfires.head()

结果:

2.绘制森林燃烧的面积

1 # 绘制森林烧毁面积的直方图
2 sns.distplot(forestfires.area, bins = 50, kde = True, fit = norm, hist_kws = {'color':'steelblue'}, 
3              kde_kws = {'color':'red', 'label':'Kernel Density'}, 
4              fit_kws = {'color':'black','label':'Nomal', 'linestyle':'--'})
5 # 显示图例
6 plt.legend()
7 # 显示图形
8 plt.show()

3.数据预处理(这里涉及一个偏峰时数据处理的问题参考博客链接:https://www.cnblogs.com/wqbin/p/10346292.html

 1 # 删除day变量
 2 forestfires.drop('day',axis = 1, inplace = True)
 3 # 将月份作数值化处理
 4 forestfires.month = pd.factorize(forestfires.month)[0]
 5 
 6 # 对area变量作对数变换(数据平滑处理),相对的有np.expm1(s)
 7 y = np.log1p(forestfires.area)
 8 # 将变量作标准化处理
 9 predictors = forestfires.columns[:-1]
10 print(predictors)
11 X = preprocessing.scale(forestfires[predictors])

结果:

二、模型训练及预测

 1 # 将数据拆分为训练集和测试集
 2 X_train,X_test,y_train,y_test = model_selection.train_test_split(X, y, test_size = 0.25, random_state = 1234)
 3 
 4 # 构建默认参数的SVM回归模型
 5 svr = svm.SVR()
 6 # 模型在训练数据集上的拟合
 7 svr.fit(X_train,y_train)
 8 # 模型在测试上的预测
 9 pred_svr = svr.predict(X_test)
10 # 计算模型的均方误差(MSE)
11 metrics.mean_squared_error(y_test,pred_svr)

结果:

1.925863595333521

三、优化模型参数及模型训练和预测

1.优化参数

 1 # 使用网格搜索法,选择SVM回归中的最佳C值、epsilon值和gamma值
 2 epsilon = np.arange(0.1,1.5,0.2)
 3 C= np.arange(100,1000,200)
 4 gamma = np.arange(0.001,0.01,0.002)
 5 parameters = {'epsilon':epsilon,'C':C,'gamma':gamma}
 6 grid_svr = model_selection.GridSearchCV(estimator = svm.SVR(),param_grid =parameters,
 7                                         scoring='neg_mean_squared_error',cv=5,verbose =1, n_jobs=2)
 8 # 模型在训练数据集上的拟合
 9 grid_svr.fit(X_train,y_train)
10 # 返回交叉验证后的最佳参数值
11 print(grid_svr.best_params_, grid_svr.best_score_)

结果:

{'C': 300, 'epsilon': 1.1000000000000003, 'gamma': 0.001} -1.9940579497706303

2.模型训练及预测

1 # 模型在测试集上的预测
2 svm_svr = svm.SVR(C = 300, epsilon = 1.1000000000000003, gamma = 0.001)
3 # 计算模型在测试集上的MSE值
4 svm_svr.fit(X_train,y_train)
5 pred_grid_svr = svm_svr.predict(X_test)
6 metrics.mean_squared_error(y_test,pred_grid_svr)

结果:

1.7455012238826595

posted on 2019-08-29 14:07  LiErRui  阅读(1134)  评论(0)    收藏  举报

导航