三大分布密度函数推导
一、\(\chi^{2}\)分布密度的推导
令\(Y_{1}, \cdots, Y_{n}\)独立同分布,且每个\(Y_i\)服从标准正态分布\(N(0,1)\),由定义,
\[X = Y_{1}^{2} + \cdots + Y_{n}^{2} \sim \chi_{n}^{2}
\]
令\(h(x)\)为任一非负函数,使得\(h(X)\)为一随机变量,则
\[E[h(X)] = \int_{-\infty}^{\infty} \cdots \int_{-\infty}^{\infty} h\left(y_{1}^{2} + \cdots + y_{n}^{2}\right) \left(\frac{1}{\sqrt{2 \pi}}\right)^{n} e^{-\frac{1}{2}\left(y_{1}^{2} + \cdots + y_{n}^{2}\right)} dy_{1} \cdots dy_{n}
\]
作多维球坐标变换:
\[\begin{cases}
y_{1} = r \cos \theta_{1} \\
y_{2} = r \sin \theta_{1} \cos \theta_{2} \\
y_{3} = r \sin \theta_{1} \sin \theta_{2} \cos \theta_{3} \\
\vdots \\
y_{n-1} = r \sin \theta_{1} \cdots \sin \theta_{n-2} \cos \theta_{n-1} \\
y_{n} = r \sin \theta_{1} \cdots \sin \theta_{n-2} \sin \theta_{n-1}
\end{cases}
\]
其中\(0 \leq r < \infty\),\(0 \leq \theta_{i} \leq \pi\)(\(i=1, \cdots, n-2\)),\(0 \leq \theta_{n-1} \leq 2\pi\)。
变换的雅可比行列式为
\[J = r^{n-1} \sin^{n-2} \theta_{1} \sin^{n-3} \theta_{2} \cdots \sin^{2} \theta_{n-3} \sin \theta_{n-2}
\]
由定义有
\[J = \left|\begin{array}{ccc}
\frac{\partial y_{1}}{\partial r} & \frac{\partial y_{1}}{\partial \theta_{1}} & \cdots & \frac{\partial y_{1}}{\partial \theta_{n-1}} \\
\vdots & \vdots & \vdots \\
\frac{\partial y_{n}}{\partial r} & \frac{\partial y_{n}}{\partial \theta_{1}} & \cdots & \frac{\partial y_{n}}{\partial \theta_{n-1}}
\end{array}\right|_{+} = r^{n-1} c
\]
因为其中从第二列开始直至最后一列,每列均可提出一个因子\(r\),将\(r\)提出后剩余部分仅与\(\theta_{1}, \cdots \theta_{n-1}\)有关,记成\(c\),由此
\[\begin{align}
E[h(X)] &= \int_{0}^{\infty} dr \int_{0}^{\pi} \cdots \int_{0}^{\pi} d\theta_{1} \cdots d\theta_{n-2} \int_{0}^{2\pi} h\left(r^{2}\right) \left(\frac{1}{\sqrt{2\pi}}\right)^{n} e^{-\frac{1}{2} r^{2}} r^{n-1} c d\theta_{n-1} \\
&= c' \int_{0}^{\infty} h\left(r^{2}\right) r^{n-1} e^{-\frac{1}{2} r^{2}} dr
\end{align}
\]
其中\(c'\)为常数。
进一步简化得到
\[E[h(X)] = c'' \int_{0}^{\infty} h(u) u^{\frac{n}{2}-1} e^{-\frac{1}{2} u} du
\]
其中\(c''\)为常数。
为了求\(c''\),应有
\[1 = c'' \int_{0}^{\infty} u^{\frac{n}{2}-1} e^{-\frac{1}{2} u} du = c'' 2^{\frac{n}{2}} \int_{0}^{\infty} y^{\frac{n}{2}-1} e^{-y} dy = c'' 2^{\frac{n}{2}} \Gamma\left(\frac{n}{2}\right)
\]
即
\[c'' = \left(2^{\frac{n}{2}} \Gamma\left(\frac{n}{2}\right)\right)^{-1}
\]
这就证明了\(\chi^{2}\)分布的密度函数为
\[f(x) = \frac{1}{2^{\frac{n}{2}} \Gamma\left(\frac{n}{2}\right)} x^{\frac{n}{2}-1} e^{-\frac{1}{2} x}
\]
推导这个式子还可以直接用归纳法,主要利用以下事实:
- 由例1.18知\(\chi^{2}\)的分布密度为\(\frac{1}{\sqrt{2\pi}} x^{-1/2} e^{-x/2}\);
- 若随机变量\(X\)和\(Y\)独立,各有分布密度\(f_1(x)\)和\(f_2(y)\),则\(Z = X + Y\)有分布密度\(f(z) = \int_{-\infty}^{\infty} f_1(z-x) f_2(x) dx\);
- 利用上面两个事实对\(n\)作归纳法便可导出该式,其中用到贝塔函数的简单性质。
有兴趣的读者可以去试试看,此外,在文献[22]第九章还给出了\(\chi^{2}\)分布的其它推导法。
二、t分布密度的推导
设\(X \sim N(0,1)\)与\(Y \sim \chi_{n}^{2}\)独立,则随机变量
\[T = \frac{\sqrt{n} X}{\sqrt{Y}}
\]
的分布是具有\(n\)个自由度的t分布。由假设条件可知\(X\)和\(Y\)的联合分布密度是
\[C_{n} e^{-x^{2}/2} e^{-y/2} y^{n/2-1}
\]
其中
\[C_{n} = \frac{1}{\sqrt{2\pi} 2^{n/2} \Gamma(n/2)}
\]
令\(h(t)\)为任一非负函数使得\(h(T)\)为一随机变量,于是
\[E[h(T)] = C_{n} \int_{-\infty}^{\infty} dx \int_{0}^{\infty} h\left(\frac{\sqrt{n} x}{\sqrt{y}}\right) e^{-x^{2}/2} e^{-y/2} y^{n/2-1} dy
\]
作变换
\[\left\{
\begin{array}{l}
t = \frac{\sqrt{n} x}{\sqrt{y}} \\
y = y
\end{array}
\right.
\]
或
\[\left\{
\begin{array}{l}
x = \frac{\sqrt{y} t}{\sqrt{n}} \\
y = y
\end{array}
\right.
\]
则变换的雅可比行列式是
\[J = \left|\begin{array}{cc}
\frac{\partial x}{\partial t} & \frac{\partial x}{\partial y} \\
\frac{\partial y}{\partial t} & \frac{\partial y}{\partial y}
\end{array}\right|_{+} = \left|\begin{array}{cc}
\sqrt{y} / \sqrt{n} & \frac{1}{2} t y^{-1/2} / \sqrt{n} \\
0 & 1
\end{array}\right|_{+} = \sqrt{y / n}
\]
代入得
\[E[h(T)] = C_{n} \int_{-\infty}^{\infty} h(t) dt
\]
\[\int_{0}^{\infty} e^{-y t^{2} / 2 n} e^{-y / 2} y^{n / 2-1} \sqrt{y / n} dy
\]
上式右端第二重积分是
\[\frac{1}{\sqrt{n}} \int_{0}^{\infty} y^{(n-1) / 2} e^{-\frac{y}{2}\left(1+\frac{t^{2}}{n}\right)} dy
\]
令
\[z = \left(1 + \frac{t^{2}}{n}\right) y
\]
则
\[\begin{align*}
& \int_{0}^{\infty} \left(1 + \frac{t^{2}}{n}\right)^{-(n-1) / 2} z^{(n-1) / 2} e^{-z / 2} \left(1 + \frac{t^{2}}{n}\right)^{-1} dz \\
&= \left(1 + \frac{t^{2}}{n}\right)^{-(n-1) / 2 - 1} \int_{0}^{\infty} z^{(n-1) / 2} e^{-z / 2} dz \\
&= \left(1 + \frac{t^{2}}{n}\right)^{-(n+1) / 2} \int_{0}^{\infty} z^{(n+1) / 2 - 1} e^{-z / 2} dz \\
&= \left(1 + \frac{t^{2}}{n}\right)^{-(n+1) / 2} \Gamma\left(\frac{n+1}{2}\right) 2^{(n+1) / 2}
\end{align*}
\]
上面最后一步利用了积分号内的函数是\(\chi^{2}_{n+1}\)的分布密度(差一个常数),将结果代入到(3.55)中去,得
\[\begin{align*}
E[h(T)] &= C_{n} \left(\frac{1}{\sqrt{n}}\right) \Gamma\left(\frac{n+1}{2}\right) 2^{(n+1) / 2} \int_{-\infty}^{\infty} h(t) \left(1 + \frac{t^{2}}{n}\right)^{-(n+1) / 2} dt \\
&= \frac{\Gamma\left(\frac{n+1}{2}\right)}{\sqrt{n \pi} \Gamma\left(\frac{n}{2}\right)} \int_{-\infty}^{\infty} h(t) \left(1 + \frac{t^{2}}{n}\right)^{-(n+1) / 2} dt
\end{align*}
\]
这就证明了t分布的密度函数为
\[f(t) = \frac{\Gamma\left(\frac{n+1}{2}\right)}{\sqrt{n \pi} \Gamma\left(\frac{n}{2}\right)} \left(1 + \frac{t^{2}}{n}\right)^{-(n+1) / 2}
\]
三、F分布密度的推导
设\(X \sim \chi_{m}^{2}\)与\(Y \sim \chi_{n}^{2}\)独立,则\(F = \frac{(n / m)}{(X / Y)}\)的分布是自由度为\(m\)和\(n\)的F分布,由假设条件知\(X\)和\(Y\)的联合密度为
\[C_{m, n} x^{\frac{m}{2}-1} y^{\frac{n}{2}-1} e^{-\frac{1}{2}(x+y)}
\]
其中
\[C_{m, n}^{-1} = 2^{(n+m) / 2} \Gamma(m / 2) \Gamma(n / 2)
\]
令\(h(f)\)为任一非负函数使得\(h(F)\)为随机变量,于是
\[E[h(F)] = \int_{0}^{\infty} \int_{0}^{\infty} h\left(\frac{n}{m} \frac{x}{y}\right) C_{m, n} x^{\frac{m}{2}-1} y^{\frac{n}{2}-1} e^{-\frac{1}{2}(x+y)} dx dy
\]
作变换
\[\left\{
\begin{array}{l}
f = \frac{n}{m} \frac{x}{y} \\
y = y
\end{array}
\right.
\]
或
\[\left\{
\begin{array}{l}
x = \frac{m}{n} f y \\
y = y
\end{array}
\right.
\]
则变换的雅可比行列式为
\[\left(\frac{m}{n} y\right)
\]
于是
\[\begin{align*}
E[h(F)] &= C_{m, n} \int_{0}^{\infty} h(f) df \int_{0}^{\infty} \left(\frac{m}{n} f y\right)^{\frac{m}{2}-1} y^{\frac{n}{2}-1} \\
&\quad \times e^{-\frac{y}{2}\left(1 + \frac{m}{n} f\right)} \left(\frac{m}{n} y\right) dy \\
&= C_{m, n} \left(\frac{m}{n}\right)^{\frac{m}{2}} \int_{0}^{\infty} h(f) f^{\frac{m}{2}-1} \int_{0}^{\infty} y^{\frac{m+n}{2}-1} \\
&\quad \times e^{-\frac{y}{2}\left(1 + \frac{m}{n} f\right)} dy
\end{align*}
\]
令
\[z = \left(1 + \frac{m}{n} f\right) y
\]
上式右边第二重积分为
\[\int_{0}^{\infty} \left(1 + \frac{m}{n} f\right)^{-\frac{m+n}{2}} z^{\frac{m+n}{2}-1} e^{-z / 2} dz = \left(1 + \frac{m}{n} f\right)^{-\frac{m+n}{2}} \Gamma\left(\frac{m+n}{2}\right) 2^{\frac{m+n}{2}}
\]
因此
\[E[h(F)] = \frac{1}{B\left(\frac{m}{2}, \frac{n}{2}\right)} \left(\frac{m}{n}\right)^{\frac{m}{2}} \int_{0}^{\infty} h(f) f^{\frac{m}{2}-1} \left(1 + \frac{m}{n} f\right)^{-\frac{m+n}{2}} df
\]
这就证明了F分布的密度函数为
\[f(f) = \frac{\Gamma\left(\frac{m+n}{2}\right)}{\Gamma\left(\frac{m}{2}\right) \Gamma\left(\frac{n}{2}\right)} \left(\frac{m}{n}\right)^{\frac{m}{2}} \frac{f^{\frac{m}{2}-1}}{\left(1 + \frac{m}{n} f\right)^{\frac{m+n}{2}}}
\]