Statistical Inference - Implement in R

1. 分布函数

1.1. Basis

R语言中提供了 4 类统计分布的函数，以下为函数和相应前缀：

d (x, distparams, log = FALSE) ：概率密度函数，PDF，PMF
- x：数字或向量
- log ：logical; if TRUE, will return log(p).
  - d(x, log.p=T) = log(d(x, log.p=F))
- distparams：分布函数的参数，见下表
p (q, distparams, lower.tail = TRUE, log.p = FALSE)：累计分布函数，CDF
- q：分位数
- lower.tail：logical; if True, $p(x)=P(X\leq x)$, else if FALSE, $p(x)=P(X > x)$
- log.p：logical; if TRUE, will return log(p).
  - p(q, log.p=T) = log(p(q, log.p=F))
q (p, distparams, lower.tail = TRUE, log.p = FALSE) ：分位函数，Quartile
- p：probability
- log.p：
  - q(q, log.p=T) = q(log(q), log.p=T)
r (n, distparams) ：随机数函数（抽样）Random
- n：the number of sample

下表为分布函数表，加上不同的前缀表示不同的含义：

1.2. Discrete Distribution

Distribution Name	Mathematical Expression	Function Name in `R`	Distribution parameters
binomial	$p(x) = {n \choose x} {p}^{x} {(1-p)}^{n-x} \quad x = 0, \ldots, n$	`binom`	`size`：$n$ `prob`：$p$
Poisson	$p(x) = \dfrac{\lambda^x e^{-\lambda}}{x!} \quad x = 0, 1, 2, \ldots$	`pois`	`lambda`：$\lambda$
geometric	$p(x) = p {(1-p)}^{x}$ $x = 0, 1, 2, \ldots; \quad 0 < p \le 1$	`geom`	`prob`：
hypergeometric	$ p(x) = \dfrac{{m \choose x}{n \choose k-x}}{{m+n \choose k}} \quad x = 0, \ldots, k$	`hyper`	`m`: `n`： `k`：
negative binomial	$p(x) = \dfrac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x = {x+n-1 \choose x} p^n (1-p)^x$ $x = 0, 1, 2, \ldots, \quad n > 0, \quad 0 < p \le 1$	`nbinom`	`size`：$n$ `prob`：$p$ `mu`：alternative parametrization

1.3. Continuous Distribution

Distribution Name	Mathematical Expression	Function Name in `R`	Distribution parameters
uniform	$f(x) = \dfrac{1}{\max-\min} \quad \min \le x \le \max$	`unif`	`min`： `max`：
normal	$f(x) = \dfrac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}$	`norm`	`mean=0`：$\mu$ `sd=1`：$\sigma$
logistic	$f(x)= \dfrac{1}{\sigma}\dfrac{e^{(x-\mu)/\sigma}}{(1 + e^{(x-\mu)/\sigma})^2}$ $ F(x) = \dfrac{1}{1 + e^{-(x-\mu)/\sigma}}% $	`logis`	`location=0`：$\mu$ `scale=1`：$\sigma$
exponential	$f(x) = \lambda {e}^{- \lambda x} \quad x \geq 0$	`exp`	`rate=1`：$\lambda$
chi-squared	$f_n(x) = \dfrac{1}{{2}^{\frac{n}{2}} \Gamma (\frac{n}{2})} {x}^{\frac{n}{2}-1} {e}^{-\frac{x}{2}} \quad x>0$	`chisq`	`df`：$n$ `ncp=1`：for non-central chi-squared distribution
Student's	$f(x) = \dfrac{\Gamma (\frac{\nu+1}{2})}{\sqrt{\pi \nu} \Gamma (\frac{\nu}{2})} \left(1 + \dfrac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}$	`t`	`df`：$\nu$ `ncp`：for noncentral t-distribution
F	$f(x) = \dfrac{\Gamma \left(\frac{n_1}{2} + \frac{n_2}{2} \right)}{\Gamma(\frac{n_1}{2}) \Gamma(\frac{n_2}{2})} \left(\dfrac{n_1}{n_2}\right)^{\frac{n_1}{2}} x^{\frac{n_1}{2} -1} \left(1 + \dfrac{n_1}{n_2} x \right)^{-\frac{n_1+n_2}{2}}$ $x>0$	`f`	`df1`：$n_1$ `df2`：$n_2$ `ncp`：for noncentral F-distribution
Gamma	$ f(x)= \dfrac{1}{{\sigma}^{\alpha}\Gamma(\alpha)} {x}^{\alpha-1} e^{-\frac{x}{\sigma}} ={\dfrac {\beta ^{\alpha }x^{\alpha -1}e^{-\beta x}}{\Gamma (\alpha )}}$ $x \ge 0, \quad \alpha, \sigma, \beta> 0$	`gamma`	`shape`：$\alpha$ `rate = 1`：$\beta$ `scale = 1/rate`：$\sigma$

2. 统计检验

2.1. Function in `R`

2.1.1. $z$-test in

z.test(x, y = NULL,
  	   alternative = "two.sided",
       mu = 0, sigma.x = NULL, sigma.y = NULL, 
       conf.level = 0.95)

parameters:

x：samples 1
y：samples 2. y=NULL indicates one-sample test
alternative：string and one of"two.sided", "less", "greater". Alternative hypothesis.
- alternative = "greater" represent two-side test
- alternative = "less" or alternative = "greater" represents one-side test
- alternative = "greater" is the alternative that x has a larger mean than y.
mu：$\mu_0$
- for one-sample test: $H_0: \mu = \mu_0$
- for two-sample test: $H_0: \mu_1 - \mu_2 = \mu_0$
sigma.x：$\sigma_1$, the population standard deviation for x
sigma.y：$\sigma_2$, the population standard deviation for y
conf.level：confidence level of the interval, equal to 1 minus significance level: $1-\alpha$

2.1.2. $t$-test in

t.test(x, y = NULL,
       alternative = "two.sided",
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, …)

mu：$\mu_0$
- for one-sample test: $H_0: \mu = \mu_0$
- for two-sample test: $H_0: \mu_1 - \mu_2 = \mu_0$
var.equal：Only effect for two-sample test, whether the variances of two population is equal or not.
- var.equal=TRUE：$\sigma_1^2=\sigma_2^2$
- var.equal=False：$\sigma_1^2 \neq \sigma_2^2$
conf.level：confidence level of the interval, equal to 1 minus significance level: $1-\alpha$
paired：logical, indicating whether a paired t-test or not

2.1.3. Binomial test

binom.test(x, n, p = 0.5,
           alternative = c("two.sided", "less", "greater"),
           conf.level = 0.95)

x：number of successes, or a vector of length 2 giving the numbers of successes and failures, respectively.
n：number of trials; ignored if x has length 2.
p：$p_0$ hypothesized probability of success.

2.1.4. $F$-test (variances test)

var.test(x, y, ratio = 1,
         alternative = c("two.sided", "less", "greater"),
         conf.level = 0.95, …)

ratio: $H_0: \sigma_1^2 / \sigma_2^2 =\text{ratio}$. the hypothesized ratio of the population variances of x and y.

2.1.5. $\chi^2$- test

chisq.test(x, y = NULL, correct = TRUE,
           p = rep(1/length(x), length(x)), rescale.p = FALSE,
           simulate.p.value = FALSE, B = 2000)

x: a numeric vector or matrix; x and y can also both be factors.
y: a numeric vector or matrix
- ignored if x is a matrix
- If x is a factor, y should be a factor of the same length
correct:
p: probabilities of the same length of x.
rescale.p：
- if rescale.p=TRUE then p is rescaled (if necessary) to sum to 1.
- If rescale.p=FALSE , and p does not sum to 1, an error is given.
simulate.p.value: logical; indicating whether to compute p-values by Monte Carlo simulation.
B: an integer specifying the number of replicates used in the Monte Carlo test.

2.2. Application

2.1.1. Inference on the mean $\mu$ of populations

The population mean of one sample
- One-sample Gauss Test ($z$-test): test for the population mean when the population variance $\sigma$ is known
  
  z.test(x, ...)
- One-sample $t$-test: Test for the population mean When the population variance is Unknown
t.test(x, var.equal=T, ...)
One-sample binomial test for the probability $p$

binom.test(x, n, p, ...)
Comparing the population mean of two independent samples
- Two-sample Gauss Test ($z$-test): The population variances are known $\sigma_1^2$, $\sigma_2^2$
  
  z.test(x, y, ...)
- **Pooled $t$-test: ** The population variances are unknown, but equal $\sigma_1^2=\sigma_2^2$.
t.test(x, y, var.equal=T, ...)
- **Welch's $t$-test: ** The population variances are unknown and unequal
  
  t.test(x, y, var.equal=F, ...)
- Paired $t$-test: Test for comparing the population mean of two dependent samples
  
  t.test(x, y, paired=T, ...)
Testing the ratio of two population variances

var.test(x, y, ratio, ...)

2.1.2. Statistical Test

$\chi^2$ Goodness-of-Fit Test
- $H_0$: $F(x)=F_0(x)$
- $H_1$: $F(x)\neq F_0(x)$
  
  chisq.test(x, p)
  - x: the observed absolute frequencies
  - p: are calculated from the assumed distribution of $F_0(x)$ under $H_0$
$\chi^2$ Independence Test
- $H_0$: The two classification variables are statistically independent.
- $H_1$: The two classification variables are not statistically independent.
  
  chisq.test(x=matrix)

posted @ 2023-01-06 14:17 veager 阅读(45) 评论(0) 收藏举报

刷新页面返回顶部

veager

Statistical Inference - Implement in R

1. 分布函数

1.1. Basis

1.2. Discrete Distribution

1.3. Continuous Distribution

2. 统计检验

2.1. Function in `R`

2.1.1. \(z\)-test in

2.1.2. \(t\)-test in

2.1.3. Binomial test

2.1.4. \(F\)-test (variances test)

2.1.5. \(\chi^2\)- test

2.2. Application

2.1.1. Inference on the mean \(\mu\) of populations

2.1.2. Statistical Test

Distribution Name	Mathematical Expression	Function Name in `R`	Distribution parameters
binomial	\(p(x) = {n \choose x} {p}^{x} {(1-p)}^{n-x} \quad x = 0, \ldots, n\)	`binom`	`size`：\(n\) `prob`：\(p\)
Poisson	\(p(x) = \dfrac{\lambda^x e^{-\lambda}}{x!} \quad x = 0, 1, 2, \ldots\)	`pois`	`lambda`：\(\lambda\)
geometric	\(p(x) = p {(1-p)}^{x}\) \(x = 0, 1, 2, \ldots; \quad 0 < p \le 1\)	`geom`	`prob`：
hypergeometric	$ p(x) = \dfrac{{m \choose x}{n \choose k-x}}{{m+n \choose k}} \quad x = 0, \ldots, k$	`hyper`	`m`: `n`： `k`：
negative binomial	\(p(x) = \dfrac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x = {x+n-1 \choose x} p^n (1-p)^x\) \(x = 0, 1, 2, \ldots, \quad n > 0, \quad 0 < p \le 1\)	`nbinom`	`size`：\(n\) `prob`：\(p\) `mu`：alternative parametrization

Distribution Name	Mathematical Expression	Function Name in `R`	Distribution parameters
uniform	\(f(x) = \dfrac{1}{\max-\min} \quad \min \le x \le \max\)	`unif`	`min`： `max`：
normal	\(f(x) = \dfrac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\)	`norm`	`mean=0`：\(\mu\) `sd=1`：\(\sigma\)
logistic	\(f(x)= \dfrac{1}{\sigma}\dfrac{e^{(x-\mu)/\sigma}}{(1 + e^{(x-\mu)/\sigma})^2}\) $ F(x) = \dfrac{1}{1 + e^{-(x-\mu)/\sigma}}% $	`logis`	`location=0`：\(\mu\) `scale=1`：\(\sigma\)
exponential	\(f(x) = \lambda {e}^{- \lambda x} \quad x \geq 0\)	`exp`	`rate=1`：\(\lambda\)
chi-squared	\(f_n(x) = \dfrac{1}{{2}^{\frac{n}{2}} \Gamma (\frac{n}{2})} {x}^{\frac{n}{2}-1} {e}^{-\frac{x}{2}} \quad x>0\)	`chisq`	`df`：\(n\) `ncp=1`：for non-central chi-squared distribution
Student's	\(f(x) = \dfrac{\Gamma (\frac{\nu+1}{2})}{\sqrt{\pi \nu} \Gamma (\frac{\nu}{2})} \left(1 + \dfrac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}\)	`t`	`df`：\(\nu\) `ncp`：for noncentral t-distribution
F	\(f(x) = \dfrac{\Gamma \left(\frac{n_1}{2} + \frac{n_2}{2} \right)}{\Gamma(\frac{n_1}{2}) \Gamma(\frac{n_2}{2})} \left(\dfrac{n_1}{n_2}\right)^{\frac{n_1}{2}} x^{\frac{n_1}{2} -1} \left(1 + \dfrac{n_1}{n_2} x \right)^{-\frac{n_1+n_2}{2}}\) \(x>0\)	`f`	`df1`：\(n_1\) `df2`：\(n_2\) `ncp`：for noncentral F-distribution
Gamma	$ f(x)= \dfrac{1}{{\sigma}^{\alpha}\Gamma(\alpha)} {x}^{\alpha-1} e^{-\frac{x}{\sigma}} ={\dfrac {\beta ^{\alpha }x^{\alpha -1}e^{-\beta x}}{\Gamma (\alpha )}}$ \(x \ge 0, \quad \alpha, \sigma, \beta> 0\)	`gamma`	`shape`：\(\alpha\) `rate = 1`：\(\beta\) `scale = 1/rate`：\(\sigma\)

veager

Statistical Inference - Implement in R

1. 分布函数

1.1. Basis

1.2. Discrete Distribution

1.3. Continuous Distribution

2. 统计检验

2.1. Function in R

2.1.1. \(z\)-test in

2.1.2. \(t\)-test in

2.1.3. Binomial test

2.1.4. \(F\)-test (variances test)

2.1.5. \(\chi^2\)- test

2.2. Application

2.1.1. Inference on the mean \(\mu\) of populations

2.1.2. Statistical Test

2.1. Function in `R`