No.16 相关性分析
主要内容:
- 图表分析查看相关性
- 相关系数计算
- 相关系数显著性检验
- 相关系数矩阵可视化
1. 图表分析查看相关性
> mtcars
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
1.1先删除vs 和am 两列非连续型变量,然后用cor()函数计算相关性系数
cor(mtcars[,-c(8,9)])
结果:
> #计算相关性系数的函数:cor() 默认计算的是皮尔逊相关系数
> cor(mtcars[,-c(8,9)])
mpg cyl disp hp drat wt qsec gear carb
mpg 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.68117191 -0.8676594 0.41868403 0.4802848 -0.5509251
cyl -0.8521620 1.0000000 0.9020329 0.8324475 -0.69993811 0.7824958 -0.59124207 -0.4926866 0.5269883
disp -0.8475514 0.9020329 1.0000000 0.7909486 -0.71021393 0.8879799 -0.43369788 -0.5555692 0.3949769
hp -0.7761684 0.8324475 0.7909486 1.0000000 -0.44875912 0.6587479 -0.70822339 -0.1257043 0.7498125
drat 0.6811719 -0.6999381 -0.7102139 -0.4487591 1.00000000 -0.7124406 0.09120476 0.6996101 -0.0907898
wt -0.8676594 0.7824958 0.8879799 0.6587479 -0.71244065 1.0000000 -0.17471588 -0.5832870 0.4276059
qsec 0.4186840 -0.5912421 -0.4336979 -0.7082234 0.09120476 -0.1747159 1.00000000 -0.2126822 -0.6562492
gear 0.4802848 -0.4926866 -0.5555692 -0.1257043 0.69961013 -0.5832870 -0.21268223 1.0000000 0.2740728
carb -0.5509251 0.5269883 0.3949769 0.7498125 -0.09078980 0.4276059 -0.65624923 0.2740728 1.0000000
1.2 计算斯皮尔曼相关系数(秩系数)
#计算相关性系数的函数:cor(),指定方法为斯皮尔曼 cor(mtcars[,-c(8,9)], method = "spearman")
结果:
> #计算相关性系数的函数:cor(),指定方法为斯皮尔曼
> cor(mtcars[,-c(8,9)], method = "spearman")
mpg cyl disp hp drat wt qsec gear carb
mpg 1.0000000 -0.9108013 -0.9088824 -0.8946646 0.65145546 -0.8864220 0.46693575 0.5427816 -0.6574976
cyl -0.9108013 1.0000000 0.9276516 0.9017909 -0.67888119 0.8577282 -0.57235095 -0.5643105 0.5800680
disp -0.9088824 0.9276516 1.0000000 0.8510426 -0.68359210 0.8977064 -0.45978176 -0.5944703 0.5397781
hp -0.8946646 0.9017909 0.8510426 1.0000000 -0.52012499 0.7746767 -0.66660602 -0.3314016 0.7333794
drat 0.6514555 -0.6788812 -0.6835921 -0.5201250 1.00000000 -0.7503904 0.09186863 0.7448162 -0.1252229
wt -0.8864220 0.8577282 0.8977064 0.7746767 -0.75039041 1.0000000 -0.22540120 -0.6761284 0.4998120
qsec 0.4669358 -0.5723509 -0.4597818 -0.6666060 0.09186863 -0.2254012 1.00000000 -0.1481997 -0.6587181
gear 0.5427816 -0.5643105 -0.5944703 -0.3314016 0.74481617 -0.6761284 -0.14819967 1.0000000 0.1148870
carb -0.6574976 0.5800680 0.5397781 0.7333794 -0.12522294 0.4998120 -0.65871814 0.1148870 1.0000000
1.3肯德尔相关系数(秩系数)
#计算相关性系数的函数:cor(),指定方法为肯德尔
cor(mtcars[,-c(8,9)], method = "kendall")
结果:
> #计算相关性系数的函数:cor(),指定方法为肯德尔
> cor(mtcars[,-c(8,9)], method = "kendall")
mpg cyl disp hp drat wt qsec gear carb
mpg 1.0000000 -0.7953134 -0.7681311 -0.7428125 0.46454879 -0.7278321 0.31536522 0.43315089 -0.50439455
cyl -0.7953134 1.0000000 0.8144263 0.7851865 -0.55131785 0.7282611 -0.44896982 -0.51254349 0.46542994
disp -0.7681311 0.8144263 1.0000000 0.6659987 -0.49898277 0.7433824 -0.30081549 -0.47597955 0.41373600
hp -0.7428125 0.7851865 0.6659987 1.0000000 -0.38262689 0.6113081 -0.47290613 -0.27944584 0.59598416
drat 0.4645488 -0.5513178 -0.4989828 -0.3826269 1.00000000 -0.5471495 0.03272155 0.58392476 -0.09535193
wt -0.7278321 0.7282611 0.7433824 0.6113081 -0.54714953 1.0000000 -0.14198812 -0.54359562 0.37137413
qsec 0.3153652 -0.4489698 -0.3008155 -0.4729061 0.03272155 -0.1419881 1.00000000 -0.09126069 -0.50643945
gear 0.4331509 -0.5125435 -0.4759795 -0.2794458 0.58392476 -0.5435956 -0.09126069 1.00000000 0.09801487
carb -0.5043945 0.4654299 0.4137360 0.5959842 -0.09535193 0.3713741 -0.50643945 0.09801487 1.00000000
2. 相关系数的显著性检验
2.1 cor.test() #相关系数的显著性检验,两个变量之间的,一次只能算一个
#相关系数的显著性检验,两个变量之间的,一次只能算一个
cor.test(mtcars$disp,mtcars$wt)
结果:
> #相关系数的显著性检验,两个变量之间的
> cor.test(mtcars$disp,mtcars$wt)
Pearson's product-moment correlation #默认的是对皮尔逊相关系数的检验
data: mtcars$disp and mtcars$wt
t = 10.576, df = 30, p-value = 1.222e-11 #df:自由度
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval: #置信区间
0.7811586 0.9442902
sample estimates:
cor
0.8879799 #相关系数的值
2.2 corr.test()
#加载包 library(psych) #计算去掉数据集mtcars 第8,9列后的数据集的相关系数显著性检验 corr.test(mtcars[,-c(8,9)])
结果:
> #计算去掉数据集mtcars 第8,9列后的数据集的相关系数显著性检验
> corr.test(mtcars[,-c(8,9)])
Call:corr.test(x = mtcars[, -c(8, 9)])
Correlation matrix
mpg cyl disp hp drat wt qsec gear carb
mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.48 -0.55
cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.49 0.53
disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.56 0.39
hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.13 0.75
drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.70 -0.09
wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.58 0.43
qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 -0.21 -0.66
gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 1.00 0.27
carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 0.27 1.00
Sample Size
[1] 32
Probability values (Entries above the diagonal are adjusted for multiple tests.)
mpg cyl disp hp drat wt qsec gear carb
mpg 0.00 0 0.00 0.00 0.00 0.00 0.14 0.06 0.02
cyl 0.00 0 0.00 0.00 0.00 0.00 0.01 0.05 0.03
disp 0.00 0 0.00 0.00 0.00 0.00 0.13 0.02 0.18
hp 0.00 0 0.00 0.00 0.11 0.00 0.00 1.00 0.00
drat 0.00 0 0.00 0.01 0.00 0.00 1.00 0.00 1.00
wt 0.00 0 0.00 0.00 0.00 0.00 1.00 0.01 0.13
qsec 0.02 0 0.01 0.00 0.62 0.34 0.00 1.00 0.00
gear 0.01 0 0.00 0.49 0.00 0.00 0.24 0.00 0.77
carb 0.00 0 0.03 0.00 0.62 0.01 0.00 0.13 0.00
To see confidence intervals of the correlations, print with the short=FALSE option
斯皮尔曼:
#计算去掉数据集mtcars 第8,9列后的数据集的相关系数显著性检验
corr.test(mtcars[,-c(8,9)], method = "spearman", adjust = "none")
可视化:
library(ggcorrplot)
a <- cor(mtcars[,-c(8,9)], method = "spearman") #显著性检验的矩阵赋值给a
ggcorrplot(a)

可以调参:
ggcorrplot(a,lab = T) #其他参数可参考函数使用方法

ggcorrplot(a,lab = T, type = "lower") #upper

ggcorrplot(a,lab = T, type = "upper", colors = c("red", "white", "green"))


浙公网安备 33010602011771号