Statistical Inference - Implement in R
1. 分布函数
1.1. Basis
R语言中提供了 4 类统计分布的函数,以下为函数和相应前缀:
d (x, distparams, log = FALSE):概率密度函数,PDF,PMFx:数字或向量log:logical; ifTRUE, will returnlog(p).d(x, log.p=T) = log(d(x, log.p=F))
distparams:分布函数的参数,见下表
p (q, distparams, lower.tail = TRUE, log.p = FALSE): 累计分布函数,CDFq:分位数lower.tail:logical; ifTrue, \(p(x)=P(X\leq x)\), else ifFALSE, \(p(x)=P(X > x)\)log.p:logical; ifTRUE, will returnlog(p).p(q, log.p=T) = log(p(q, log.p=F))
q (p, distparams, lower.tail = TRUE, log.p = FALSE):分位函数,Quartilep:probabilitylog.p:q(q, log.p=T) = q(log(q), log.p=T)
r (n, distparams):随机数函数(抽样)Randomn:the number of sample
下表为分布函数表,加上不同的前缀表示不同的含义:
1.2. Discrete Distribution
| Distribution Name | Mathematical Expression | Function Name in R |
Distribution parameters |
|---|---|---|---|
| binomial | \(p(x) = {n \choose x} {p}^{x} {(1-p)}^{n-x} \quad x = 0, \ldots, n\) | binom |
size:\(n\)prob:\(p\) |
| Poisson | \(p(x) = \dfrac{\lambda^x e^{-\lambda}}{x!} \quad x = 0, 1, 2, \ldots\) | pois |
lambda:\(\lambda\) |
| geometric | \(p(x) = p {(1-p)}^{x}\) \(x = 0, 1, 2, \ldots; \quad 0 < p \le 1\) |
geom |
prob: |
| hypergeometric | $ p(x) = \dfrac{{m \choose x}{n \choose k-x}}{{m+n \choose k}} \quad x = 0, \ldots, k$ | hyper |
m:n:k: |
| negative binomial | \(p(x) = \dfrac{\Gamma(x+n)}{\Gamma(n) x!} p^n (1-p)^x = {x+n-1 \choose x} p^n (1-p)^x\) \(x = 0, 1, 2, \ldots, \quad n > 0, \quad 0 < p \le 1\) |
nbinom |
size:\(n\) prob:\(p\)mu:alternative parametrization |
1.3. Continuous Distribution
| Distribution Name | Mathematical Expression | Function Name in R |
Distribution parameters |
|---|---|---|---|
| uniform | \(f(x) = \dfrac{1}{\max-\min} \quad \min \le x \le \max\) | unif |
min:max: |
| normal | \(f(x) = \dfrac{1}{\sqrt{2\pi}\sigma} e^{-\frac{(x-\mu)^2}{2\sigma^2}}\) | norm |
mean=0:\(\mu\)sd=1:\(\sigma\) |
| logistic | \(f(x)= \dfrac{1}{\sigma}\dfrac{e^{(x-\mu)/\sigma}}{(1 + e^{(x-\mu)/\sigma})^2}\) $ F(x) = \dfrac{1}{1 + e^{-(x-\mu)/\sigma}}% $ |
logis |
location=0:\(\mu\) scale=1:\(\sigma\) |
| exponential | \(f(x) = \lambda {e}^{- \lambda x} \quad x \geq 0\) | exp |
rate=1:\(\lambda\) |
| chi-squared | \(f_n(x) = \dfrac{1}{{2}^{\frac{n}{2}} \Gamma (\frac{n}{2})} {x}^{\frac{n}{2}-1} {e}^{-\frac{x}{2}} \quad x>0\) | chisq |
df:\(n\)ncp=1:for non-central chi-squared distribution |
| Student's | \(f(x) = \dfrac{\Gamma (\frac{\nu+1}{2})}{\sqrt{\pi \nu} \Gamma (\frac{\nu}{2})} \left(1 + \dfrac{x^2}{\nu}\right)^{-\frac{\nu+1}{2}}\) | t |
df:\(\nu\)ncp:for noncentral t-distribution |
| F | \(f(x) = \dfrac{\Gamma \left(\frac{n_1}{2} + \frac{n_2}{2} \right)}{\Gamma(\frac{n_1}{2}) \Gamma(\frac{n_2}{2})} \left(\dfrac{n_1}{n_2}\right)^{\frac{n_1}{2}} x^{\frac{n_1}{2} -1} \left(1 + \dfrac{n_1}{n_2} x \right)^{-\frac{n_1+n_2}{2}}\) \(x>0\) |
f |
df1:\(n_1\)df2:\(n_2\)ncp:for noncentral F-distribution |
| Gamma | $ f(x)= \dfrac{1}{{\sigma}^{\alpha}\Gamma(\alpha)} {x}^{\alpha-1} e^{-\frac{x}{\sigma}} ={\dfrac {\beta ^{\alpha }x^{\alpha -1}e^{-\beta x}}{\Gamma (\alpha )}}$ \(x \ge 0, \quad \alpha, \sigma, \beta> 0\) |
gamma |
shape:\(\alpha\)rate = 1:\(\beta\)scale = 1/rate:\(\sigma\) |
2. 统计检验
2.1. Function in R
2.1.1. \(z\)-test in
z.test(x, y = NULL,
alternative = "two.sided",
mu = 0, sigma.x = NULL, sigma.y = NULL,
conf.level = 0.95)
parameters:
-
x:samples 1 -
y:samples 2.y=NULLindicates one-sample test -
alternative:string and one of"two.sided","less","greater". Alternative hypothesis.-
alternative = "greater"represent two-side test -
alternative = "less"oralternative = "greater"represents one-side test -
alternative = "greater"is the alternative thatxhas a larger mean thany.
-
-
mu:\(\mu_0\)- for one-sample test: \(H_0: \mu = \mu_0\)
- for two-sample test: \(H_0: \mu_1 - \mu_2 = \mu_0\)
-
sigma.x:\(\sigma_1\), the population standard deviation forx -
sigma.y:\(\sigma_2\), the population standard deviation fory -
conf.level:confidence level of the interval, equal to 1 minus significance level: \(1-\alpha\)
2.1.2. \(t\)-test in
t.test(x, y = NULL,
alternative = "two.sided",
mu = 0, paired = FALSE, var.equal = FALSE,
conf.level = 0.95, …)
-
mu:\(\mu_0\)- for one-sample test: \(H_0: \mu = \mu_0\)
- for two-sample test: \(H_0: \mu_1 - \mu_2 = \mu_0\)
-
var.equal:Only effect for two-sample test, whether the variances of two population is equal or not.var.equal=TRUE:\(\sigma_1^2=\sigma_2^2\)var.equal=False:\(\sigma_1^2 \neq \sigma_2^2\)
-
conf.level:confidence level of the interval, equal to 1 minus significance level: \(1-\alpha\) -
paired:logical, indicating whether a paired t-test or not
2.1.3. Binomial test
binom.test(x, n, p = 0.5,
alternative = c("two.sided", "less", "greater"),
conf.level = 0.95)
x:number of successes, or a vector of length 2 giving the numbers of successes and failures, respectively.n:number of trials; ignored ifxhas length 2.p:\(p_0\) hypothesized probability of success.
2.1.4. \(F\)-test (variances test)
var.test(x, y, ratio = 1,
alternative = c("two.sided", "less", "greater"),
conf.level = 0.95, …)
ratio: \(H_0: \sigma_1^2 / \sigma_2^2 =\text{ratio}\). the hypothesized ratio of the population variances ofxandy.
2.1.5. \(\chi^2\)- test
chisq.test(x, y = NULL, correct = TRUE,
p = rep(1/length(x), length(x)), rescale.p = FALSE,
simulate.p.value = FALSE, B = 2000)
-
x: a numeric vector or matrix;xandycan also both befactors. -
y: a numeric vector or matrix- ignored if
xis a matrix - If
xis a factor,yshould be a factor of the same length
- ignored if
-
correct: -
p: probabilities of the same length ofx. -
rescale.p:- if
rescale.p=TRUEthenpis rescaled (if necessary) to sum to 1. - If
rescale.p=FALSE, andpdoes not sum to 1, an error is given.
- if
-
simulate.p.value: logical; indicating whether to compute p-values by Monte Carlo simulation. -
B: an integer specifying the number of replicates used in the Monte Carlo test.
2.2. Application
2.1.1. Inference on the mean \(\mu\) of populations
-
The population mean of one sample
-
One-sample Gauss Test (\(z\)-test): test for the population mean when the population variance \(\sigma\) is known
z.test(x, ...) -
One-sample \(t\)-test: Test for the population mean When the population variance is Unknown
t.test(x, var.equal=T, ...) -
-
One-sample binomial test for the probability \(p\)
binom.test(x, n, p, ...) -
Comparing the population mean of two independent samples
-
Two-sample Gauss Test (\(z\)-test): The population variances are known \(\sigma_1^2\), \(\sigma_2^2\)
z.test(x, y, ...) -
**Pooled \(t\)-test: ** The population variances are unknown, but equal \(\sigma_1^2=\sigma_2^2\).
t.test(x, y, var.equal=T, ...)-
**Welch's \(t\)-test: ** The population variances are unknown and unequal
t.test(x, y, var.equal=F, ...) -
Paired \(t\)-test: Test for comparing the population mean of two dependent samples
t.test(x, y, paired=T, ...)
-
-
Testing the ratio of two population variances
var.test(x, y, ratio, ...)
2.1.2. Statistical Test
-
\(\chi^2\) Goodness-of-Fit Test
-
\(H_0\): \(F(x)=F_0(x)\)
-
\(H_1\): \(F(x)\neq F_0(x)\)
chisq.test(x, p)x: the observed absolute frequenciesp: are calculated from the assumed distribution of \(F_0(x)\) under \(H_0\)
-
-
\(\chi^2\) Independence Test
-
\(H_0\): The two classification variables are statistically independent.
-
\(H_1\): The two classification variables are not statistically independent.
chisq.test(x=matrix)
-

浙公网安备 33010602011771号