Correlation Analysis

1. Pearson (or Bravais–Pearson) correlation coefficient

1.1 Formulation

Assumption：

Two variables \(X\) and \(Y\) are measured on a continuous scale
They are linearly related

The correlation coefficient \(r(X,Y)=r\) measures the degree of linear relationship between \(X\) and \(Y\) using

\[r = \frac{ \sum \limits_{i=1}^{n}\left(x_{i}-\bar{x}\right)\left(y_{i}-\bar{y}\right) }{ \sqrt{\sum \limits_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2} \cdot \sum \limits_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2}} } =\frac{S_{x y}}{\sqrt{S_{x x} S_{y y}}} \]

where:

\[\begin{align*} S_{x x} &= \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)^{2}=n \tilde{s}_{X}^{2} \\ S_{y y} &= \sum_{i=1}^{n}\left(y_{i}-\bar{y}\right)^{2}=n \tilde{s}_{Y}^{2} \\ S_{x y} &= \sum_{i=1}^{n}\left(x_{i}-\bar{x}\right)\left(y_{i}- \bar{y}\right)=\sum_{i=1}^{n} x_{i} y_{i}-n \bar{x} \bar{y} \end{align*} \]

1.2 Properties

The correlation coefficient is independent of the units of measurement of \(X\) and \(Y\).
- e.g., if someone measures the height and weight in metres and kilograms respectively and another person measures them in centimetres and grams, respectively, then the correlation coefficient between the two sets of data will be the same.
The correlation coefficient is symmetric
- i.e., \(r(X,Y)=r(Y,X)\)
The limits of \(r\) are \(−1 \leq r \leq 1\).
If the relationship between X and Y is
- perfectly linear and increasing, then \(r=+1\) and
- perfectly linear and decreasing, then \(r=-1\).
- The signs of r thus determine the direction of the association.
If \(r\) is close to zero, then it indicates that the variables are independent or the relationship is not linear.

1.3 \(t\)-test of a sample (pearson) correlation coefficient

Assumption:

The relationship is linear.
Random variables \(X\) and \(Y\) originate from a bivariate normal distribution
Population correlation coefficient is zero.

Null hypothesis \(H_0\): samples correlation coefficient is zero, i.e., \(r=0\)

The test statistic

\[T=\frac{r}{\sqrt{1-r^{2}}} \cdot \sqrt{n-2} \quad \sim \quad t(n-2) \]

is calculated and this follows Student's t-distribution with \(n − 2\) degrees of freedom.

1.4 \(Z\)-test of a population (pearson) correlation coefficient

Assumption:

The relationship is linear.
Random variables \(X\) and \(Y\) originate from a bivariate normal distribution
The variance of \(Y\) is independent of \(X\).
The sample correlation coefficient is observed as \(r\)
Correlation coefficient of population has a specified value \(\rho_0\).

Null hypothesis \(H_0\): population correlation coefficient is equal to \(\rho_0\), i.e., \(\rho=\rho_0\)

Using the Fisherv \(Z\)-transformation we have:

\[Z_{1}= \tanh^{-1}(r) = \frac{1}{2} \ln \left(\frac{1+r}{1-r}\right)=1.1513 \log _{10}\left(\frac{1+r}{1-r}\right) \]

Where \(r\) is the sample correlation coefficient.

The distribution of \(Z_1\) is approximately normal with mean \(\mu_{Z_1}\) and standard deviation \(\sigma_{Z_1}\) where

\[\begin{aligned} \mu_{Z_{1}} &=\frac{1}{2} \ln\left(\frac{1+\rho_{0}}{1-\rho_{0}}\right) =1.1513 \log _{10}\left(\frac{1+\rho_{0}}{1-\rho_{0}}\right) \\ \sigma_{Z_1} &=\frac{1}{\sqrt{n-3}} \end{aligned} \]

The test statistic is

\[Z = \frac{Z_1 - \mu_{Z_1}}{\sigma_{Z_1}} \quad \sim \quad \mathcal{N}(0,1) \]

1.4 Implementation

(1) `R`

(a) Pearson correlation coefficient

cor(x1, x2, method='pearson')
# x1, x2 are two vector
cor(data, method='pearson')
# data is dataframe

Self Codes

# self-coding
cor.pearson = function(x, y) {
  x.mean = mean(x)
  y.mean = mean(y)
  r = sum((x - x.mean) * (y - y.mean)) / sqrt(sum((x - x.mean)^2) * sum((y - y.mean)^2))
  return (r)
}

Example. The following example is selected from Heumann et al.'s book (2016, pp. 83 Table 4.7)

x1 = c(10.85, 10.44, 10.50, 10.89, 10.62)
x2 = c( 7.84,  7.96,  7.81,  7.47,  7.74)

cor(x1, x2, method='pearson')
cor.pearson(x1, x2)
# Output: [1] -0.6932787   [2] -0.6932787

(b) t-test

# self-coding
cor.pearson.test.t = function(x, y) {
  n = length(x)
  r = cor.pearson(x, y)
  Sr = sqrt((1 - r^2) / (n-2))
  t = abs(r) / Sr
  return (t)
}

cor.test() function is to execute the t-test

cor.test(x1, x2, method="pearson", conf.level=0.95, alternative="two.sided")$statistic
cor.pearson.test.t(x1, x2)
# Output  [1] -1.66622  [2] 1.66622

Note that Student's t is a symmetric distribution.

corr.test() function (require install psych package) is to execute the t-test for multi-pair variables

states <- state.x77
states = as.data.frame(states)
states = na.omit(states)

library(psych)
corr.test(states, method="pearson", use="pairwise", alpha=0.05)

(c) Z-test

# self-coding
cor.pearson.test.z = function(x, y, rho=0) {
  n = length(x)
  r = cor.pearson(x, y)
  z1 = atanh(r)
  mu1 = atanh(rho)
  sigma1 = 1 / sqrt(n-3)
  z = (z1 - mu1) / sigma1
  return (z)
}

cor.pearson.test.z(x1, x2, rho=0.2)
cor.pearson.test.z(x1, x2, rho=0.7)
cor.pearson.test.z(x1, x2, rho=-0.7)
# Output: [1] -1.494787  [2] -2.434628  [3] 0.01846855

(2) `Python`

Pearson correlation coefficient

numpy.corrcoef

numpy.corrcoef(x, y=None)

return : correlation coefficient matrix, the diagonal entries will be ones.

scipy.stats.pearsonr

scipy.stats.pearsonr(x, y)

return:
- r : Pearson's correlation coefficient.
- p-value : Two-tailed p-value

pandas.Series.corr

s1.corr(s2, method='pearson')
# s1, s2 : type of pandas.Series
df1.corr(method='pearson')
# df1 : type of pandas.DataFrame

Example

2. Spearman's Rank Correlation Coefficient

2.1 Formulation

Assumption：

two ranked variables \(X\) and \(Y\)
- the lowest score \(x_i\) is assigned rank 1

This measure is suitable for both ordinal and continuous variables.

Denotations:

\(\text{Rank}(x_i)\) denote the rank of the \(i\)th observation on \(X\)
- \(\text{Rank}(x_i)\) is the rank \(x_i\) among the ordered values of \(X\)
- For the tied samples (i.e., \(x_i=x_j\)), the rank of them will be same and is the average rank.
\(\text{Rank}(y_i)\) denotes the rank of the \(i\)th observation of \(y\)
\(d_i = \text{Rank}(x_i) − \text{Rank}(y_i)\) is the difference between the two rank values

Spearman's rank correlation coefficient is defined as the Pearson product moment sample correlation of the \(R(x_i)\) and the \(R(y_i)\). That is

\[R = r(\text{Rank}(X), \text{Rank}(Y)) \]

When no ties within a sample are present, this is equivalent to the computationally efficient formula (Hollander et al. 2004, pp. 428 & Heumann et al. 2016, pp. 84):

\[R=1-\frac{6 \sum \limits_{i=1}^{n} d_{i}^{2}}{n\left(n^{2}-1\right)} \]

If there are tied \(X\)'s and/or tied \(Y\)'s, the above formula should be revised, can refer to the Equation 8.77 in page 429 of Hollander et al.'s book (2004).

2.2 Properties

The values of \(R\) lie between \(−1\) and \(+1\) and measure the degree of correlation between the ranks of \(X\) and \(Y\).
The value of \(R\) remains the same whether we choose an ascending or descending order of the ranks.
\(R=1\) represents all the observations are assigned exactly the same ranks.
\(R=-1\) represents the observations are assigned exactly the opposite ranks.

2.3 The Spearman rank correlation test (paired observations)

Assumption：

For large samples (\(n > 10\)), \(R\) asymptotically follows the normal distribution with mean zero and variance \(\frac{1}{n-1}\) (Hollander et al. 2004, pp. 427). Thus, the test statistic is

\[Z=\frac{R - 0}{\sqrt{\frac{1}{n-1}}} = \frac{n\left(n^{2}-1\right) - 6 \sum \limits_{i=1}^{n} d_i^2}{n(n+1) \sqrt{(n-1)}} \quad \sim \quad \mathcal{N}(0,1) \]

Note that Kanji (2006, pp.109) also gives a formulation, but it is additive inverse of above formulation.

For small samples, the test statistic is (Kanji 2006, pp. 109)

\[r_{S}=R=1-\frac{6 \sum \limits_{i=1}^{n} d_i^2}{n\left(n^{2}-1\right)} \]

must be compared with critical values obtained from Table 26 in Kanji's book (2006, p.p. 226).

2.4 Implementation

(1) `R`

(a) Spearman's Rank Correlation Coefficient

# compute by definition, used for tied and non-tied sample
cor.spearman.1 = function(x, y) { 
  x.rank = rank(x)
  y.rank = rank(y)
  r = cor.pearson(x.rank, y.rank)
  return (r)
}
# computationally efficient formula, only used for non-tied sample
cor.spearman.2 = function(x, y) { 
  n = length(x)
  x.rank = rank(x)
  y.rank = rank(y)
  r = 1 - 6 * sum((x.rank - y.rank)^2) / (n * (n^2 - 1))
  return (r)
}

Examples

# non-tied samples
x1 = c(10.85, 10.44, 10.50, 10.89, 10.62)
x2 = c( 7.84,  7.96,  7.81,  7.47,  7.74)
cor(x1, x2, method='spearman')  # used for both tied and non-tied sample
cor.spearman.1(x1, x2)  # used for both tied and non-tied sample
cor.spearman.2(x1, x2)  # only used for non-tied sample
# output
# [1] -0.7   [2] -0.7   [3] -0.7

# tied samples
x1 = c(7.1, 7.1, 7.2, 8.3, 9.4, 10.5, 11.4)
x2 = c(2.8, 2.9, 2.8, 2.6, 3.5,  4.6,  5.0)
cor(x1, x2, method='spearman')  # used for both tied and non-tied sample
cor.spearman.1(x1, x2)  # used for both tied and non-tied sample
cor.spearman.2(x1, x2)  # only used for non-tied sample
# output
# [1] 0.7   [2] 0.7  [3] 0.7053571

(b) Hypothesis testing for non-tied samples

cor.spearman.test.z = function(x, y) {
  n = length(x)
  S = sum((rank(x) - rank(y))^2)
  r = cor.spearman.1(x, y)
  z = r / sqrt(1 / (n-1))
  res = list(stats=z, S=S)
  return (res)
}

Using self-codes cor.spearman.test.z()

# non-tied samples
x1 = c(10.85, 10.44, 10.50, 10.89, 10.62)
x2 = c( 7.84,  7.96,  7.81,  7.47,  7.74)
cor.spearman.test.z(x1, x2)
# Output: z = -1.4  S = 34

Using cor.test(x, y, method="spearman") function

cor.test(x1, x2, method="spearman", conf.level=0.95, alternative="two.sided")
# Output: S = 34, p-value = 0.2333

Note that the statistic S of cor.test() is \(\sum_{i=1}^{n} d_i^2\), i.e., sum((rank(x1) - rank(y1))^2).

Use pSpearman() function (require install SuppDists package)

library(SuppDists)
pSpearman(q, r, lower.tail=TRUE, log.p=FALSE)

Parameters:
- q : quantity, here is Spearman's rank correlation coefficient
- r : number of observations
Return: probability, i.e., p-value

r = cor.spearman.1(x1, x2)
library(SuppDists)
pSpearman(q=-abs(r), r=length(x1), lower.tail=T)*2   # two-sided test
# Output: p-value = 0.2333333

(c) Hypothesis testing for tied samples

Example : the following tied example is selected from Hollander et al.'s book (2014, Example 8.5, pp. 430)

# tied samples
x1 = c(7.1, 7.1, 7.2, 8.3, 9.4, 10.5, 11.4)
x2 = c(2.8, 2.9, 2.8, 2.6, 3.5,  4.6,  5.0)
r = cor(x1, x2, method="spearman")

Using self codes cor.spearman.test.z()

cor.spearman.test.z(x1, x2)
# Output: S = 16.5;  z = 1.714643
z = cor.spearman.test.z(x1, x2)$z
pnorm(-abs(z), lower.tail=T)*2
# Output: p-value = 0.08641073

Note that the statistic z is in agreement with the result in Hollander et al.'s book (pp. 431, \(r_s^*\))

Using cor.test(x1, x2, method="spearman") for tied samples will raise a warning
Using pSpearman()

r = cor(x1, x2, method="spearman")
pSpearman(q=-abs(r), r=length(x1))*2
# output
[1] 0.08809524

(2) `Python`

3. Kendall Rank Correlation Coefficient

3.1 Formulation

Denotations:

\((x_{1},y_{1}),...,(x_{n},y_{n})\) a ordered set of observations of the joint random variables \(X\) and \(Y\), such that all the values of \(x_{i}\) and \(y_{i}\) are unique (ties are neglected for simplicity).

Definitions:

For any pair of observations \((x_{i},y_{i})\) and \((x_{j},y_{j})\), where \(i<j\):
concordant: if \(x_{i}>x_{j}\) and \(y_{i}>y_{j}\) (or \(x_{i}<x_{j}\) and \(y_{i}<y_{j}\))
discordant: if \(x_{i}>x_{j}\) and \(y_{i}<y_{j}\) (or \(x_{i}<x_{j}\) and \(y_{i}>y_{j}\))
tied: if \(x_{i}=x_{j}\) or \(x_{i}=x_{j}\)

The Kendall \(\tau\) coefficient is defined as:

\[\tau ={\frac {({\text{number of concordant pairs}})-({\text{number of discordant pairs}})}{n \choose 2}} \]

where \({n \choose 2}={n(n-1) \over 2}\) is the binomial coefficient for the number of ways to choose two items from \(n\) items.

Further denotations:

When \(x_i − x_j\) and \(y_i − y_j\) have the same sign a score of \(+1\) is obtained.
When they have opposite signs a score of \(−1\) is obtained.
When there is a difference of zero, no score is obtained.
These scores are summed together and this sum is denoted \(S\).

The explicit expression:

\[\tau = {\frac {2}{n(n-1)}} \sum_{i<j}\text{sgn}(x_{i}-x_{j}) \text{sgn}(y_{i}-y_{j}) = {\frac {2}{n(n-1)}} S \]

where:

\(\text{sgn}(\cdot)\) is the sign function

3.2 Properties

The denominator is the total number of pair combinations, thus, \(-1 \leq \tau \leq 1\).
If the agreement between the two rankings is perfect (i.e., the two rankings are the same) the coefficient has value 1.
If the disagreement between the two rankings is perfect (i.e., one ranking is the reverse of the other) the coefficient has value −1.
If \(X\) and \(Y\) are independent, then we would expect the coefficient to be approximately zero.

3.3 The Kendall rank correlation test (paired observations)

Hypothesis test for \(\tau\) (Kanji, 2006 & Wikipedia 2022)

Null Hypothesis \(H_0\): \(\tau = 0\)

For large simple size \(n (n > 10)\), \(\tau\) approximately follows a normal distribution with mean zero and variance \(\frac {2(2n+5)}{9n(n-1)}\) (Hollander et al. 2004, pp. 393-399 & Wikipedia 2022). Thus, the test statistic

\[Z=\frac{\tau-0}{\sqrt{\frac{2(2n+5)}{9n(n-1)}}} = \frac{S}{\sqrt{\frac{n(n-1)(2 n+5)}{18}}} \sim \mathcal{N}(0,1) \]

For small samples, critical values of \(S\) may be obtained from Table.

3.4 Accounting for ties

A pair \((x_{i},x_{j}),(y_{i},y_{j})\) is said to be tied if \(x_{i}=x_{j}\) or \(y_{i}=y_{j}\)
A tied pair is neither concordant nor discordant.
When tied pairs arise in the data, the coefficient may be modified in a number of ways to keep it in the range \([−1, 1]\).

Tau-a \(\tau_A\)

The Tau-a statistic tests the strength of association of the cross tabulations
Both variables have to be ordinal
Tau-a will not make any adjustment for ties

\[\tau_{A}=\frac{n_{c}-n_{d}}{n_{0}} \]

where:

\(n_0 = n(n-1)/2\)
\(n_{c} = \text{Number of concordant pairs}\)
\(n_{d} = \text{Number of discordant pairs}\)

Tau-b \(\tau_B\)

A value of zero indicates the absence of association.

\[\tau_{B}={\frac {n_{c}-n_{d}}{\sqrt {(n_{0}-n_{1})(n_{0}-n_{2})}}} \]

where:

\(n_1 = \sum_i t_i (t_i-1)/2\)
\(n_2 = \sum_j u_j (u_j-1)/2\)
\(t_{i} = {\text{Number of tied values in the }}i^{\text{th}}{\text{ group of ties for the first quantity}}\)
\(u_{j} = {\text{Number of tied values in the }}j^{\text{th}}{\text{ group of ties for the second quantity}}\)

Tau-c \(\tau_C\) (also called Stuart-Kendall Tau-c)

Tau-c is more suitable than Tau-b for the analysis of data based on non-square contingency tables (i.e. rectangular table, the row and column of the table are different).
Use Tau-b if the underlying scale of both variables has the same number of possible values (before ranking) and Tau-c if they differ.

\[\tau_{C}={\frac {2(n_{c}-n_{d})}{n^{2}{\frac{(m-1)}{m}}}} \]

where:

\(r={\text{Number of rows of the contingency table}}\)
\(c={\text{Number of columns of the contingency table}}\)
\(m=\min(r,c)\)

Hypothesis Test

The statistic test for \(\tau_A\) and \(\tau_B\) can refer to Wikipedia.

3.4 Implementation

(1) `R`

(a) Kendall Rank Correlation Coefficient

# tau_A, only used for non-tied samples
cor.kendall.1 = function(x, y) {
  n = length(x)
  S = 0
  for (i in 1:(n-1)){ for (j in (i+1):n){
    S = S + sign(x[i]-x[j]) * sign(y[i]-y[j])
  } }
  r = 2 / (n * (n-1)) * S
  return (r)
}

# tau_B, used for tied and non-tied samples. When sample is non-tied, tau_B = tau_A
cor.kendall.2 = function(x, y) {
  n = length(x)
  S = c(0, 0, 0, 0) # names(S) = c('nc', 'nd', 'n1', 'n2')
  for (i in 1:(n-1)){ for (j in (i+1):n){
    sign1 = sign(x[i]-x[j])
    sign2 = sign(y[i]-y[j])
    if (sign1 * sign2 > 0) { S[1] = S[1] + 1 }
    else if (sign1 * sign2 < 0) { S[2] = S[2] + 1 }
    else if (sign1 * sign2 == 0) {
      if (sign1 != 0) { S[4] = S[4] + 1 } 
      else if (sign2 != 0) { S[3] = S[3] + 1 }
  } } }
  r = (S[1] - S[2]) / sqrt((n*(n-1) / 2 - S[3]) * (n*(n-1) / 2 - S[4]))
  return (r)
}

cor(x1, x2, method='kendall') function computes Tau-b

Example

# Example: non-tied samples
x1 = c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
x2 = c( 2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)
cor(x1, x2, method='kendall')  # used for tied and non-tied samples
cor.kendall.2(x1, x2)  # tau_B, used for tied and non-tied samples
cor.kendall.1(x1, x2)  # tau_A, only used for non-tied samples
# Output:  [1] 0.4444444  [2] 0.4444444  [3] 0.4444444

# Example: tied samples
x1 = c(7.1, 7.1, 7.2, 8.3, 9.4, 10.5, 11.4)
x2 = c(2.8, 2.9, 2.8, 2.6, 3.5,  4.6,  5.0)
cor(x1, x2, method='kendall')  # used for tied and non-tied samples
cor.kendall.2(x1, x2)   # tau_B, used for tied and non-tied samples
cor.kendall.1(x1, x2)   # tau_A, only used for non-tied samples
# Output:  [1] 0.55  [2] 0.55  [3] 0.5238095

(b) Hypothesis Testing for non-tied samples

# Self-coding
cor.kendall.test.z = function(x, y){
  n = length(x)
  r = cor.kendall.1(x, y)
  z = r / sqrt( 2 * (2*n+5) / (9 * n * (n-1)))
  
  S = 0  # the number of concordant pairs
  for (i in 1:(n-1)) { for (j in (i+1):n) {
    s1 = sign(x[i]-x[j]) * sign(y[i]-y[j])
    if (s1 > 0) { S = S + s1} 
  } }
  res = list("S"=S, "z"=z)
  return (res)
}

Samples with no ties. The following example with non-tied samples is selected from Hollander et al.'s book (Example 8.1, pp. 397)

x1 = c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1)
x2 = c( 2.6,  3.1,  2.5,  5.0,  3.6,  4.0,  5.2,  2.8,  3.8)
# Kendall Correlation Coefficient
r = cor(x1, x2, method="kendall")
# r = 0.4444444444

Using self codes

cor.kendall.test.z(x1, x2)
# Output:  S = 26; z = 1.668115
z = cor.kendall.test.z(x1, x2)$z
pnorm(-abs(z), lower.tail=T)*2  # two-sided test
# p-value = 0.09852102

Note that the statistic z is in agreement with the result in Hollander et al.'s book (pp. 399, K*)

Using cor.test(x1, x2, method='kendall') function

# H0: r = 0
cor.test(x1, x2, method='kendall', conf.level=0.95, alternative='two.sided')
# Output: T = 26; p-value = 0.1194

Note that the statistic T of cor.test(method='kendall') is the number of concordant pairs

Use pKendall() function (require install SuppDists package)

library(SuppDists)
# p-value of two-sided test
pKendall(-abs(r), N=length(x1), lower.tail=T)*2  # two-sided test, H0 : r=0
# Output:  0.1194389

(c) Hypothesis Testing for tied samples

Using cor.test(x1, x2, method="kendall") for tied samples will raise a warning

Using pKendall()

# Example: tied samples
x1 = c(7.1, 7.1, 7.2, 8.3, 9.4, 10.5, 11.4)
x2 = c(2.8, 2.9, 2.8, 2.6, 3.5,  4.6,  5.0)
r = cor(x1, x2, method='kendall')
library(SuppDists)
pKendall(-abs(r), N=length(x1), lower.tail=T)*2  # two-sided test, H0 : r=0
# Output : p-value = 0.1361111

(2) `Python`

4. Goodman and Kruskal's Gamma

4.1 Formulation

Denotation

For the contingency table

Goodman and Kruskal's \(\gamma\)

\[\gamma=\frac{K}{K+D}-\frac{D}{K+D}=\frac{K-D}{K+D} \]

where \(K\) and \(D\) is computed as:

\[K = \sum_{i<m} \sum_{j<n} n_{i j} n_{m n}, \qquad D = \sum_{i<m} \sum_{j>n} n_{i j} n_{m n} \]

where:

\(K=n_c\) describes the number of concordant observation pairs
\(D=n_d\) describes the number of discordant observation pairs

4.2 Properties

\(-1 \leq \gamma \leq 1\)
Larger values indicate a stronger association and the sign indicates the direction of the association

Reference

Pearson相关性分析(Pearson Correlation Analysis)——理论介绍, website

Wikipedia, Kendall rank correlation coefficient, website

Heumann, C., Schomaker, M., & Shalabh. (2016). Introduction to Statistics and Data Analysis. Springer International Publishing.

Hollander, M., Wolfe, D. A., & Chicken, E. (2014). Nonparametric statistical methods (Third edition). John Wiley & Sons, Inc.

Kanji, G. K. (2006). 100 statistical tests (3rd ed). Sage Publications.

Test 12 t-test of a correlation coefficient, pp. 39
Test 13 Z-test of a correlation coefficient, pp. 39
Test 58 The Spearman rank correlation test (paired observations), pp. 109
Test 59 The Kendall rank correlation test (paired observations), pp. 110

posted @ 2022-05-06 16:44 veager 阅读(195) 评论(0) 收藏举报

刷新页面返回顶部

veager

Correlation Analysis

Correlation Analysis

1. Pearson (or Bravais–Pearson) correlation coefficient

1.1 Formulation

1.2 Properties

1.3 \(t\)-test of a sample (pearson) correlation coefficient

1.4 \(Z\)-test of a population (pearson) correlation coefficient

1.4 Implementation

(1) R

(a) Pearson correlation coefficient

(b) t-test

(c) Z-test

(2) Python

2. Spearman's Rank Correlation Coefficient

2.1 Formulation

2.2 Properties

2.3 The Spearman rank correlation test (paired observations)

2.4 Implementation

(1) R

(a) Spearman's Rank Correlation Coefficient

(b) Hypothesis testing for non-tied samples

(c) Hypothesis testing for tied samples

(2) Python

3. Kendall Rank Correlation Coefficient

3.1 Formulation

3.2 Properties

3.3 The Kendall rank correlation test (paired observations)

3.4 Accounting for ties

Tau-a \(\tau_A\)

Tau-b \(\tau_B\)

Tau-c \(\tau_C\) (also called Stuart-Kendall Tau-c)

Hypothesis Test

3.4 Implementation

(1) R

(a) Kendall Rank Correlation Coefficient

(b) Hypothesis Testing for non-tied samples

(c) Hypothesis Testing for tied samples

(2) Python

4. Goodman and Kruskal's Gamma

4.1 Formulation

4.2 Properties

Reference

(1) `R`

(2) `Python`

(1) `R`

(2) `Python`

(1) `R`

(2) `Python`