# 为什么样本方差（sample variance）的分母是 n-1？

（補充一句哦，題主問的方差 estimator 通常用 moments 方法估計。如果用的是 ML 方法，請不要多想不是你們想的那樣， 方差的 estimator 的期望一樣是有 bias 的，有興趣的同學可以自己用正態分佈算算看。）

.

==================================================================
===================== 答案的分割线 ===================================
==================================================================

.

,

### Sample variance

Main article: Sample variance

The sample variance of a random variable demonstrates two aspects of estimator bias: firstly, the naive estimator is biased, which can be corrected by a scale factor; second, the unbiased estimator is not optimal in terms of mean squared error (MSE), which can be minimized by using a different scale factor, resulting in a biased estimator with lower MSE than the unbiased estimator. Concretely, the naive estimator sums the squared deviations and divides by n, which is biased. Dividing instead by n − 1 yields an unbiased estimator. Conversely, MSE can be minimized by dividing by a different number (depending on distribution), but this results in a biased estimator. This number is always larger than n − 1, so this is known as a shrinkage estimator, as it "shrinks" the unbiased estimator towards zero; for the normal distribution the optimal value is n + 1.

Suppose X1, ..., Xn are independent and identically distributed (i.i.d.) random variables with expectation μ and variance σ2. If the sample mean and uncorrected sample variance are defined as

$\overline{X}=\frac{1}{n}\sum_{i=1}^nX_i, \qquad S^2=\frac{1}{n}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2,$

then S2 is a biased estimator of σ2, because

\begin{align} \operatorname{E}[S^2] &= \operatorname{E}\left[ \frac{1}{n}\sum_{i=1}^n \left(X_i-\overline{X}\right)^2 \right] = \operatorname{E}\bigg[ \frac{1}{n}\sum_{i=1}^n \big((X_i-\mu)-(\overline{X}-\mu)\big)^2 \bigg] \\[8pt] &= \operatorname{E}\bigg[ \frac{1}{n}\sum_{i=1}^n (X_i-\mu)^2 - 2(\overline{X}-\mu)\frac{1}{n}\sum_{i=1}^n (X_i-\mu) + (\overline{X}-\mu)^2 \bigg] \\[8pt] &= \operatorname{E}\bigg[ \frac{1}{n}\sum_{i=1}^n (X_i-\mu)^2 - (\overline{X}-\mu)^2 \bigg] = \sigma^2 - \operatorname{E}\left[ (\overline{X}-\mu)^2 \right] < \sigma^2. \end{align}

In other words, the expected value of the uncorrected sample variance does not equal the population variance σ2, unless multiplied by a normalization factor. The sample mean, on the other hand, is an unbiased[1] estimator of the population mean μ.

The reason that S2 is biased stems from the fact that the sample mean is an ordinary least squares (OLS) estimator for μ$\overline{X}$ is the number that makes the sum $\sum_{i=1}^n (X_i-\overline{X})^2$ as small as possible. That is, when any other number is plugged into this sum, the sum can only increase. In particular, the choice $\mu \ne \overline{X}$ gives,

$\frac{1}{n}\sum_{i=1}^n (X_i-\overline{X})^2 < \frac{1}{n}\sum_{i=1}^n (X_i-\mu)^2,$

and then

\begin{align} \operatorname{E}[S^2] &= \operatorname{E}\bigg[ \frac{1}{n}\sum_{i=1}^n (X_i-\overline{X})^2 \bigg] < \operatorname{E}\bigg[ \frac{1}{n}\sum_{i=1}^n (X_i-\mu)^2 \bigg] = \sigma^2. \end{align}

Note that the usual definition of sample variance is

$s^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\overline{X}\,)^2,$

and this is an unbiased estimator of the population variance. This can be seen by noting the following formula, which follows from the Bienaymé formula, for the term in the inequality for the expectation of the uncorrected sample variance above:

$\operatorname{E}\big[ (\overline{X}-\mu)^2 \big] = \frac{1}{n}\sigma^2 .$

The ratio between the biased (uncorrected) and unbiased estimates of the variance is known as Bessel's correction.

posted @ 2015-07-20 19:38  菜鸡一枚  阅读(9621)  评论(0编辑  收藏  举报