【R统计】主成分分析2——主成分回归
习题:
对某地区的某消费品的销售量Y进行调查,它与下面四个变量有关:x1居民可支配收入,x2该类消费品平均价格指数,x3社会该消费品保有量,x4其他消费品平均价格指数,历史资料如下表所示。试用主成分回归方法建立销售量Y与其他四个变量x1,x2, x3 和 x4的回归方程。
数据资料data.txt:
x1 x2 x3 x4 y 1 82.9 92 17.1 94 8.4 2 88.0 93 21.3 96 9.6 3 99.9 96 25.1 97 10.4 4 105.3 94 29.0 97 11.4 5 117.7 100 34.0 100 12.2 6 131.0 101 40.0 101 14.2 7 148.2 105 44.0 104 15.8 8 161.8 112 49.0 109 17.9 9 174.2 112 51.0 111 19.6 10 184.7 112 53.0 111 20.8
脚本:
#270
#230
conomy <- read.table("data.txt",header = TRUE, sep = "\t");
#### 作线性回归
lm.sol<-lm(y~x1+x2+x3, data=conomy);
summary(lm.sol);
Call:
lm(formula = y ~ x1 + x2 + x3, data = conomy);
# Residuals:
# Min 1Q Median 3Q Max
# -0.44365 -0.20719 0.04925 0.18879 0.47673
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 0.23574 5.39534 0.044 0.96657
# x1 0.14167 0.02587 5.477 0.00155 **
# x2 -0.02763 0.07265 -0.380 0.71685
# x3 -0.04743 0.05903 -0.803 0.45235
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 0.349 on 6 degrees of freedom
# Multiple R-squared: 0.9957, Adjusted R-squared: 0.9935
# F-statistic: 462.5 on 3 and 6 DF, p-value: 1.744e-07
#### 作主成分分析
conomy.pr<-princomp(~x1+x2+x3, data=conomy, cor=T);
summary(conomy.pr, loadings=TRUE);
# Importance of components:
# Comp.1 Comp.2 Comp.3
# Standard deviation 1.720206 0.17628306 0.099081994
# Proportion of Variance 0.986369 0.01035857 0.003272414
# Cumulative Proportion 0.986369 0.99672759 1.000000000
# Loadings:
# Comp.1 Comp.2 Comp.3
# x1 0.579 0.180 0.795
# x2 0.576 -0.781 -0.243
# x3 0.577 0.598 -0.556
#### 预测测样本主成分, 并作主成分分析
pre<-predict(conomy.pr);
conomy$z1<-pre[,1];
conomy$z2<-pre[,2];
lm.sol<-lm(y~z1+z2, data=conomy);
# summary(lm.sol);
# Call:
# lm(formula = y ~ z1 + z2, data = conomy)
# Residuals:
# Min 1Q Median 3Q Max
# -0.79867 -0.45194 0.06536 0.36712 0.83831
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 14.0300 0.1897 73.972 2.17e-11 ***
# z1 2.3763 0.1103 21.552 1.17e-07 ***
# z2 0.6977 1.0759 0.648 0.537
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 0.5998 on 7 degrees of freedom
# Multiple R-squared: 0.9852, Adjusted R-squared: 0.9809
# F-statistic: 232.5 on 2 and 7 DF, p-value: 3.975e-07
#### 作变换, 得到原坐标下的关系表达式
beta<-coef(lm.sol); A<-loadings(conomy.pr);
x.bar<-conomy.pr$center; x.sd<-conomy.pr$scale;
coef<-(beta[2]*A[,1]+ beta[3]*A[,2])/x.sd;
beta0 <- beta[1]- sum(x.bar * coef);
c(beta0, coef);
# (Intercept) x1 x2 x3
# -7.75109994 0.04347167 0.10678004 0.14573976
### 结论:y=-7.75109994+0.04347167x1+ 0.10678004x2+0.14573976x3
博文源代码和习题均来自于教材《统计建模与R软件》(ISBN:9787302143666,作者:薛毅)。
浙公网安备 33010602011771号