Statistical Inference - Implement in R

1. 分布函数

1.1. Basis

R语言中提供了 4 类统计分布的函数,以下为函数和相应前缀:

  • d (x, distparams, log = FALSE) :概率密度函数,PDF,PMF
    • x:数字或向量
    • log :logical; if TRUE, will return log(p).
      • d(x, log.p=T) = log(d(x, log.p=F))
    • distparams:分布函数的参数,见下表
  • p (q, distparams, lower.tail = TRUE, log.p = FALSE): 累计分布函数,CDF
    • q:分位数
    • lower.tail:logical; if True, \(p(x)=P(X\leq x)\), else if FALSE, \(p(x)=P(X > x)\)
    • log.p:logical; if TRUE, will return log(p).
      • p(q, log.p=T) = log(p(q, log.p=F))
  • q (p, distparams, lower.tail = TRUE, log.p = FALSE) :分位函数,Quartile
    • p:probability
    • log.p
      • q(q, log.p=T) = q(log(q), log.p=T)
  • r (n, distparams) :随机数函数(抽样)Random
    • n:the number of sample

下表为分布函数表,加上不同的前缀表示不同的含义:

1.2. Discrete Distribution

Distribution Name Mathematical Expression Function Name in R Distribution parameters
binomial \(p(x) = {n \choose x} {p}^{x} {(1-p)}^{n-x} \quad x = 0, \ldots, n\) binom size\(n\)
prob\(p\)
Poisson \(p(x) = \dfrac{\lambda^x e^{-\lambda}}{x!} \quad x = 0, 1, 2, \ldots\) pois lambda\(\lambda\)
geometric \(p(x) = p {(1-p)}^{x}\)
\(x = 0, 1, 2, \ldots; \quad 0 < p \le 1\)
geom prob
hypergeometric $ p(x) = \dfrac{{m \choose x}{n \choose k-x}}{{m+n \choose k}} \quad x = 0, \ldots, k$ hyper m:
n
k
negative binomial \(p(x) = \dfrac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x = {x+n-1 \choose x} p^n (1-p)^x\)
\(x = 0, 1, 2, \ldots, \quad n > 0, \quad 0 < p \le 1\)
nbinom size\(n\)
prob\(p\)
mu:alternative parametrization

1.3. Continuous Distribution

Distribution Name Mathematical Expression Function Name in R Distribution parameters
uniform \(f(x) = \dfrac{1}{\max-\min} \quad \min \le x \le \max\) unif min
max
normal \(f(x) = \dfrac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\) norm mean=0\(\mu\)
sd=1\(\sigma\)
logistic \(f(x)= \dfrac{1}{\sigma}\dfrac{e^{(x-\mu)/\sigma}}{(1 + e^{(x-\mu)/\sigma})^2}\)
$ F(x) = \dfrac{1}{1 + e^{-(x-\mu)/\sigma}}% $
logis location=0\(\mu\)
scale=1\(\sigma\)
exponential \(f(x) = \lambda {e}^{- \lambda x} \quad x \geq 0\) exp rate=1\(\lambda\)
chi-squared \(f_n(x) = \dfrac{1}{{2}^{\frac{n}{2}} \Gamma (\frac{n}{2})} {x}^{\frac{n}{2}-1} {e}^{-\frac{x}{2}} \quad x>0\) chisq df\(n\)
ncp=1:for non-central chi-squared distribution
Student's \(f(x) = \dfrac{\Gamma (\frac{\nu+1}{2})}{\sqrt{\pi \nu} \Gamma (\frac{\nu}{2})} \left(1 + \dfrac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}\) t df\(\nu\)
ncp:for noncentral t-distribution
F \(f(x) = \dfrac{\Gamma \left(\frac{n_1}{2} + \frac{n_2}{2} \right)}{\Gamma(\frac{n_1}{2}) \Gamma(\frac{n_2}{2})} \left(\dfrac{n_1}{n_2}\right)^{\frac{n_1}{2}} x^{\frac{n_1}{2} -1} \left(1 + \dfrac{n_1}{n_2} x \right)^{-\frac{n_1+n_2}{2}}\)
\(x>0\)
f df1\(n_1\)
df2\(n_2\)
ncp:for noncentral F-distribution
Gamma $ f(x)= \dfrac{1}{{\sigma}^{\alpha}\Gamma(\alpha)} {x}^{\alpha-1} e^{-\frac{x}{\sigma}} ={\dfrac {\beta ^{\alpha }x^{\alpha -1}e^{-\beta x}}{\Gamma (\alpha )}}$
\(x \ge 0, \quad \alpha, \sigma, \beta> 0\)
gamma shape\(\alpha\)
rate = 1\(\beta\)
scale = 1/rate\(\sigma\)

2. 统计检验

2.1. Function in R

2.1.1. \(z\)-test in

z.test(x, y = NULL,
  	   alternative = "two.sided",
       mu = 0, sigma.x = NULL, sigma.y = NULL, 
       conf.level = 0.95)

parameters:

  • x:samples 1

  • y:samples 2. y=NULL indicates one-sample test

  • alternative:string and one of"two.sided", "less", "greater". Alternative hypothesis.

    • alternative = "greater" represent two-side test

    • alternative = "less" or alternative = "greater" represents one-side test

    • alternative = "greater" is the alternative that x has a larger mean than y.

  • mu\(\mu_0\)

    • for one-sample test: \(H_0: \mu = \mu_0\)
    • for two-sample test: \(H_0: \mu_1 - \mu_2 = \mu_0\)
  • sigma.x\(\sigma_1\), the population standard deviation for x

  • sigma.y\(\sigma_2\), the population standard deviation for y

  • conf.level:confidence level of the interval, equal to 1 minus significance level: \(1-\alpha\)

2.1.2. \(t\)-test in

t.test(x, y = NULL,
       alternative = "two.sided",
       mu = 0, paired = FALSE, var.equal = FALSE,
       conf.level = 0.95, …)
  • mu\(\mu_0\)

    • for one-sample test: \(H_0: \mu = \mu_0\)
    • for two-sample test: \(H_0: \mu_1 - \mu_2 = \mu_0\)
  • var.equal:Only effect for two-sample test, whether the variances of two population is equal or not.

    • var.equal=TRUE\(\sigma_1^2=\sigma_2^2\)
    • var.equal=False\(\sigma_1^2 \neq \sigma_2^2\)
  • conf.level:confidence level of the interval, equal to 1 minus significance level: \(1-\alpha\)

  • paired:logical, indicating whether a paired t-test or not

2.1.3. Binomial test

binom.test(x, n, p = 0.5,
           alternative = c("two.sided", "less", "greater"),
           conf.level = 0.95)
  • x:number of successes, or a vector of length 2 giving the numbers of successes and failures, respectively.
  • n:number of trials; ignored if x has length 2.
  • p\(p_0\) hypothesized probability of success.

2.1.4. \(F\)-test (variances test)

var.test(x, y, ratio = 1,
         alternative = c("two.sided", "less", "greater"),
         conf.level = 0.95, …)
  • ratio: \(H_0: \sigma_1^2 / \sigma_2^2 =\text{ratio}\). the hypothesized ratio of the population variances of x and y.

2.1.5. \(\chi^2\)- test

chisq.test(x, y = NULL, correct = TRUE,
           p = rep(1/length(x), length(x)), rescale.p = FALSE,
           simulate.p.value = FALSE, B = 2000)
  • x: a numeric vector or matrix; x and y can also both be factors.

  • y: a numeric vector or matrix

    • ignored if x is a matrix
    • If x is a factor, y should be a factor of the same length
  • correct:

  • p: probabilities of the same length of x.

  • rescale.p

    • if rescale.p=TRUE then p is rescaled (if necessary) to sum to 1.
    • If rescale.p=FALSE , and p does not sum to 1, an error is given.
  • simulate.p.value: logical; indicating whether to compute p-values by Monte Carlo simulation.

  • B: an integer specifying the number of replicates used in the Monte Carlo test.

2.2. Application

2.1.1. Inference on the mean \(\mu\) of populations

  • The population mean of one sample

    • One-sample Gauss Test (\(z\)-test): test for the population mean when the population variance \(\sigma\) is known

      z.test(x, ...)

    • One-sample \(t\)-test: Test for the population mean When the population variance is Unknown

    t.test(x, var.equal=T, ...)

  • One-sample binomial test for the probability \(p\)

    binom.test(x, n, p, ...)

  • Comparing the population mean of two independent samples

    • Two-sample Gauss Test (\(z\)-test): The population variances are known \(\sigma_1^2\), \(\sigma_2^2\)

      z.test(x, y, ...)

    • **Pooled \(t\)-test: ** The population variances are unknown, but equal \(\sigma_1^2=\sigma_2^2\).

    t.test(x, y, var.equal=T, ...)

    • **Welch's \(t\)-test: ** The population variances are unknown and unequal

      t.test(x, y, var.equal=F, ...)

    • Paired \(t\)-test: Test for comparing the population mean of two dependent samples

      t.test(x, y, paired=T, ...)

  • Testing the ratio of two population variances

    var.test(x, y, ratio, ...)

2.1.2. Statistical Test

  • \(\chi^2\) Goodness-of-Fit Test

    • \(H_0\): \(F(x)=F_0(x)\)

    • \(H_1\): \(F(x)\neq F_0(x)\)

      chisq.test(x, p)

      • x: the observed absolute frequencies
      • p: are calculated from the assumed distribution of \(F_0(x)\) under \(H_0\)
  • \(\chi^2\) Independence Test

    • \(H_0\): The two classification variables are statistically independent.

    • \(H_1\): The two classification variables are not statistically independent.

      chisq.test(x=matrix)

posted @ 2023-01-06 14:17  veager  阅读(45)  评论(0)    收藏  举报