# R语言主成分分析——prcomp VS princomp

http://strata.uga.edu/software/pdf/pcaTutorial.pdf很好的一个介绍

http://gastonsanchez.wordpress.com/2012/06/17/principal-components-analysis-in-r-part-1/很好的一个介绍

prcomp ： Performs a principalcomponents analysis on the givendata matrix and returns the results as anobject of class prcomp.

princomp ： Performs a principal components analysison the givennumeric data matrix and returns the results as an object of class princomp.

str(USArrests)

'data.frame': 50 obs. of 4 variables:

$Murder : num 13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...$ Assault : int NA 263 294 190 276 204 110 238 335 211 ...

$UrbanPop: int 58 48 80 50 91 78 77 72 80 60 ...$ Rape : num 21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...

prcomp ：

prcomp(x, ...)

prcomp(formula, data = NULL, subset, na.action, ...)

prcomp(x, retx = TRUE, center = TRUE, scale. = FALSE, tol = NULL, ...)

prcomp(USArrests)  #inappropriate，没有scale不太合适

prcomp(USArrests, scale = TRUE) #直接数据矩阵

prcomp(~ Murder + Assault + Rape, data = USArrests, scale = TRUE) #直接方程

plot(prcomp(USArrests))

summary(prcomp(USArrests, scale = TRUE))

biplot(prcomp(USArrests, scale = TRUE))


princomp ：

princomp(x, ...) #完全一样

princomp(formula, data = NULL, subset, na.action, ...) #继续完全一样

princomp(x, cor = FALSE, scores = TRUE, covmat = NULL, subset = rep(TRUE,nrow(as.matrix(x))), ...) #参数变化

princomp(USArrests, cor = TRUE) # =^= prcomp(USArrests, scale=TRUE) 近似但不完全一样，标准差differ by a factor of sqrt(49/50)

summary(pc.cr <- princomp(USArrests, cor = TRUE))

plot(pc.cr) # shows a screeplot.

biplot(pc.cr)


prcomp ：

sdev

the standard deviations of the principal components (i.e., the square roots of the eigenvalues of the covariance/correlation matrix, though the calculation is actually done with the singular values of the data matrix).

rotation

x

center, scale

the centering and scaling used, or FALSE.

princomp ：

sdev

the standard deviations of the principal components.

center

the means that were subtracted.

scale

the scalings applied to each variable.

n.obs

the number of observations.

scores

if scores = TRUE, the scores of the supplied data on the principal components. These are non-null only if x was supplied, and if covmat was also supplied if it was a covariance list. For the formula method, napredict() is applied to handle the treatment of values omitted by the na.action.

call

the matched call.

na.action

If relevant.

prcomp ：
The calculation is doneby a singular value decomposition奇异值分解 of the (centered and possibly scaled) datamatrix, not by using eigen on the covariance matrix而不使用协方差矩阵的特征根. This is generally the preferred method for numerical accuracy提高数值型准确性.

The print method for these objects prints the results in a nice format and theplot method produces a screeplot.

Unlike princomp, variances are computed with the usual divisor N - 1.

Note that scale= TRUE cannot be used if there are zero or constant(for center = TRUE) variables.
princomp ：

princomp is a generic function with "formula" and "default" methods.

The calculation is done using eigen on the correlation or covariance matrix, as determined by cor. This is done for compatibility with the S-PLUS result. Apreferred method of calculation is to use svd on x, as is done in prcomp.

Note that the default calculation uses divisor N for the covariance matrix.

The print method for these objects prints the results in a nice formatand the plot method produces a scree plot (screeplot).There is also a biplot method.

If x is a formula then the standard NA-handling is applied to the scores (if requested): seenapredict.

princomp only handles so-calledR-mode PCA, that is feature extraction of variables. If a data matrix is supplied (possibly via a formula) it is required that there are at least as many units as variables. ForQ-mode PCA use prcomp.
R和Q-Mode区别:
R-mode PCA examines the correlations or covariances among variables变量的相关性和协方差
Q-mode focusses on the correlations or covariances among samples.样本的相关性和协方差

                         Parameter Estimates

Parameter      Standard                           Variance


Variable DF Estimate Error t Value Pr > |t| Inflation

Intercept 1 134.96790 237.81430 0.57 0.5778 0
occup 1 -1.28377 0.80469 -1.60 0.1291 2.16276
checkin 1 1.80351 0.51624 3.49 0.0028 4.52397
hours 1 0.66915 1.84640 0.36 0.7215 1.35735
common 1 -21.42263 10.17160 -2.11 0.0504 2.33264
wings 1 5.61923 14.74609 0.38 0.7079 3.65318
cap 1 -14.48025 4.22018 -3.43 0.0032 37.12912
rooms 1 29.32475 6.36590 4.61 0.0003 63.70809

Eigenvalues of the Correlation Matrix

             Eigenvalue Difference    Proportion    Cumulative

1    4.64302239    3.90281147        0.6633        0.6633
2    0.74021092    0.03390878        0.1057        0.7690
3    0.70630215    0.25669541        0.1009        0.8699
4    0.44960674    0.15020062        0.0642        0.9342
5    0.29940611    0.14798282        0.0428        0.9769
6    0.15142329    0.14139489        0.0216        0.9986
7    0.01002840                      0.0014        1.0000


Parameter Estimates

                                         Variance
Variable    DF     Inflation

Intercept    1             0
Prin1        1       1.00000
Prin2        1       1.00000
Prin3        1       1.00000
Prin4        1       1.00000
Prin5        1       1.00000
Prin6        1       1.00000
Prin7        1       1.00000


### 参考资料###

http://blog.csdn.net/youliye/article/details/16892723

