R-多元线性回归的建模和验证
问题描述
分析地面采集的光谱和LiDAR结构信息,估计由于病虫害引起的失叶率
- 由光谱信息建立模型
- 由结构信息建立模型
- 由光谱信息和结构信息相结合建立模型
- 分别画预测和验证集散点图,计算R2和RMSE
代码实现
train = read.csv("F:/RData/实验一/实验数据/Simplified_Canopy_defoliation_analysis_training-simp.csv")
test = read.csv("F:/RData/实验一/实验数据/Simplified_Canopy_defoliation_analysis_vald-simp.csv")
mydata_train = train[-15,] #去除有NA值的一行
#library(Hmisc)
r = rcorr(as.matrix(mydata_train[,2:14]), type = 'pearson')
r
#library(corrplot)
corrplot(r$r, tl.pos = 'upper', tl.cex = 0.8, tl.col = 'black', bg = 'gray')
corrplot(r$P, add=TRUE, type="lower", method="number", diag = FALSE, tl.pos = "n", cl.pos="n", bg = 'gray')
#pearson相关系数表明自变量NONPHO_fraction_16和dNONPHO_fraction之间存在强负相关性
#光谱信息
model_1 = lm(Average.Defol..Status.x ~ NONPHO_fraction_15 + GV_fraction_15 +
Shade_fraction_15 + Fra_under_15 +
NONPHO_fraction_16 + GV_fraction_16 +
dFra_UDS, data = mydata_train)
summary(model_1)
#去除不显著的预测变量
model_11 = lm(Average.Defol..Status.x ~ Shade_fraction_15 + GV_fraction_16 + dFra_UDS, data = mydata_train)
summary(model_11)
#结构信息
model_2 = lm(Average.Defol..Status.x ~ b70_16 + dske + dkur + dint_p25, data = mydata_train)
summary(model_2)
model_21 = lm(Average.Defol..Status.x ~ b70_16 + dint_p25, data = mydata_train)
summary(model_21)
#光谱和结构
model_3 = lm(Average.Defol..Status.x ~ NONPHO_fraction_15 + GV_fraction_15 +
Shade_fraction_15 + Fra_under_15 +
NONPHO_fraction_16 + GV_fraction_16 +
dFra_UDS + b70_16 + dske +
dkur + dint_p25, data = mydata_train)
summary(model_3)
#去除不显著的预测变量和截距项
model_31 = lm(Average.Defol..Status.x ~ 0 + Shade_fraction_15 + GV_fraction_16 +
dFra_UDS + b70_16, data = mydata_train)
summary(model_31)
#进行预测和对比
tru = test[,10]
n = length(tru)
predict_data = matrix(0, n, 3)
predict_data[,1] = predict(model_1, test)
predict_data[,2] = predict(model_2, test)
predict_data[,3] = predict(model_31, test)
tit = c('光谱信息','结构信息','光谱信息和结构信息')
for (i in 1:3) {
plot(x = predict_data[,i], y = tru, xlab = 'Predict', ylab = 'True', xlim = c(-20,100), ylim = c(-20,100), main = tit[i])
lines(x = c(-20,100), y = c(-20,100), lty = 5)
}
for (i in 1:3) {
plot(predict_data[,i], col = 'blue', type = 'b', xlab = 'Row', ylab = 'Average.Defol..Status', ylim = c(-20,100), main = tit[i])
lines(tru, col = 'red', type = 'b')
legend('bottomleft', legend = c('predict','true'), pch = c(1,1), col = c('blue','red'))
}
#计算R_squre
pre = predict_data
sse = apply((pre-tru)^2, 2, sum)
sst = sum((tru- mean(tru))^2)
R_squre = 1- sse/sst
R_squre
#计算RMSE
RMSE=sqrt((apply((tru-pre)^2, 2, sum))/n)
RMSE

浙公网安备 33010602011771号