Pyhton 支持向量回归(SVR森林火灾燃烧面积预测实战)
一、数据读取、绘制及预处理
1.数据读取(有些指标不懂,练练手而已)
1 # 导入第三方模块 2 from sklearn import svm 3 import pandas as pd 4 from sklearn import model_selection 5 from sklearn import metrics 6 import seaborn as sns 7 import matplotlib.pyplot as plt 8 from scipy.stats import norm 9 from sklearn import preprocessing 10 import numpy as np 11 12 # 读取外部数据 13 forestfires = pd.read_csv(r'forestfires.csv') 14 # 数据前5行 15 forestfires.head()
结果:

2.绘制森林燃烧的面积
1 # 绘制森林烧毁面积的直方图 2 sns.distplot(forestfires.area, bins = 50, kde = True, fit = norm, hist_kws = {'color':'steelblue'}, 3 kde_kws = {'color':'red', 'label':'Kernel Density'}, 4 fit_kws = {'color':'black','label':'Nomal', 'linestyle':'--'}) 5 # 显示图例 6 plt.legend() 7 # 显示图形 8 plt.show()

3.数据预处理(这里涉及一个偏峰时数据处理的问题参考博客链接:https://www.cnblogs.com/wqbin/p/10346292.html)
1 # 删除day变量 2 forestfires.drop('day',axis = 1, inplace = True) 3 # 将月份作数值化处理 4 forestfires.month = pd.factorize(forestfires.month)[0] 5 6 # 对area变量作对数变换(数据平滑处理),相对的有np.expm1(s) 7 y = np.log1p(forestfires.area) 8 # 将变量作标准化处理 9 predictors = forestfires.columns[:-1] 10 print(predictors) 11 X = preprocessing.scale(forestfires[predictors])
结果:

二、模型训练及预测
1 # 将数据拆分为训练集和测试集 2 X_train,X_test,y_train,y_test = model_selection.train_test_split(X, y, test_size = 0.25, random_state = 1234) 3 4 # 构建默认参数的SVM回归模型 5 svr = svm.SVR() 6 # 模型在训练数据集上的拟合 7 svr.fit(X_train,y_train) 8 # 模型在测试上的预测 9 pred_svr = svr.predict(X_test) 10 # 计算模型的均方误差(MSE) 11 metrics.mean_squared_error(y_test,pred_svr)
结果:
1.925863595333521
三、优化模型参数及模型训练和预测
1.优化参数
1 # 使用网格搜索法,选择SVM回归中的最佳C值、epsilon值和gamma值 2 epsilon = np.arange(0.1,1.5,0.2) 3 C= np.arange(100,1000,200) 4 gamma = np.arange(0.001,0.01,0.002) 5 parameters = {'epsilon':epsilon,'C':C,'gamma':gamma} 6 grid_svr = model_selection.GridSearchCV(estimator = svm.SVR(),param_grid =parameters, 7 scoring='neg_mean_squared_error',cv=5,verbose =1, n_jobs=2) 8 # 模型在训练数据集上的拟合 9 grid_svr.fit(X_train,y_train) 10 # 返回交叉验证后的最佳参数值 11 print(grid_svr.best_params_, grid_svr.best_score_)
结果:
{'C': 300, 'epsilon': 1.1000000000000003, 'gamma': 0.001} -1.9940579497706303
2.模型训练及预测
1 # 模型在测试集上的预测 2 svm_svr = svm.SVR(C = 300, epsilon = 1.1000000000000003, gamma = 0.001) 3 # 计算模型在测试集上的MSE值 4 svm_svr.fit(X_train,y_train) 5 pred_grid_svr = svm_svr.predict(X_test) 6 metrics.mean_squared_error(y_test,pred_grid_svr)
结果:
1.7455012238826595
浙公网安备 33010602011771号