Probability Density Function (pdf) (概率密度函数)
pdf is defined as the derivative of the cdf:
\[f_{X}(x)=\frac{\text{d}F(x)}{\text{d}x}=\lim_{h\rightarrow 0}\frac{F(x+h)-F(x)}{h}
\]
Note that pdf is NOT the probability that \(X=x\), but is proportional to the probability that \(X\) is close to \(x\):
\[\begin{aligned}
P[x<X\leq x+h]&=F_{X}(x+h)-F_{X}(x)\\\\
&=\frac{F_{X}(x+h)-F_{X}(x)}{h}\cdot h\\\\
&\approx f_{X}(x)\cdot h \text{ (if } h \text{ is small)}
\end{aligned}
\]
Properties
-
\(f_{X}(x)\geq 0\)
-
\(F_{X}(x)=\int_{-\infty}^xf_{X}(t)\text{d}t\)
-
\(\int_{-\infty }^{\infty}f_{X}(t)\text{d}t=1\)
-
\(P[a<X\leq b]=\int_{a^{+}}^bf_{X}(x)\text{d}x\)
Expection
\[E[X]=\int_{-\infty}^{\infty}xf_{X}(x)\text{d}x
\]
Variance
\[\text{VAR}[X]=E[(X-E[X])^2]
\]
Exponential (指数分布) Random Variable
cdf:
\[F_{X}(x)=
\begin{cases}
0, &x<0\\\\
1-e^{-\lambda x}, &x\geq 0
\end{cases}
\]
pdf:
\[f_{X}(x)=
\begin{cases}
0, &x<0\\\\
\lambda e^{-\lambda x}, &x\geq 0
\end{cases}
\]
- \(E[X]=\frac{1}{\lambda}\)
Proof
\[\begin{aligned}
E[X]&=\int_{0^{+}}^{\infty} xf_{X}(x)\\\\
&=\int_{0^{+}}^{\infty} x\lambda e^{-\lambda x}\\\\
&=-\int_{0^{+}}^{\infty} -x\lambda e^{-\lambda x}\\\\
&=-[(t-\frac{1}{\lambda})e^{\lambda t}]|_{-\infty}^{0^{-}} & (t=-x)\\\\
&=\frac{1}{\lambda}
\end{aligned}
\]
- \(\text{VAR}[X]=\frac{1}{\lambda^2}\)
Proof
\[\begin{aligned}
E[X^2]&=\int_{0^{+}}^{\infty} x^2f_{X}(x)\\\\
&=\int_{0^{+}}^{\infty} x^2\lambda e^{-\lambda x}\\\\
&=[(-x^2-\frac{2x}{\lambda}-\frac{2}{\lambda^2})e^{-\lambda x}]|_{0^{+}}^{\infty}=\frac{2}{\lambda^2}
\end{aligned}
\]
\[\text{VAR}[X]=E[X^2]-(E[X])^2=\frac{2}{\lambda^2}-(\frac{1}{\lambda})^2=\frac{1}{\lambda^2}
\]

- The exponential random variable is the limiting form of the geometric random variable.
i.e. For a Geometric RV \(M\) and a Exponentail RV \(X\), take \(\lambda =np\), we have
\(P[X\leq t]=P[M\leq nt]=1-P[M> nt]=1-(1-p)^{nt}\stackrel{n\rightarrow \infty}{\longrightarrow} 1-e^{-\lambda t}\)
as \(n\) is sufficient large.
- Memoryless Property: \(P[X>t+h|X>t]=P[X>h]\)

Gaussian (高斯分布) Random Variable
\[X\sim \mathscr{N}(\mu, \sigma^2)
\]
Here \(\mu\triangleq E[X]\), \(\sigma^2\triangleq \text{VAR}[X^2]\)
pdf:
\[f_{X}(x)=\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}
\]
- Special value
-
\(P[\mu-\sigma<X\leq \mu+\sigma]=68\%\)
-
\(P[\mu-2\sigma<X\leq \mu+2\sigma]=95\%\)
-
\(P[\mu-3\sigma<X\leq \mu+3\sigma]=99.7\%\)
Normalized Gaussian
-
\[X\sim\mathscr{N}(\mu, \sigma^2)\Leftrightarrow Y\sim\mathscr{N}(0, 1)
\]
-
\(Y\triangleq\frac{X-\mu}{\sigma}\)
-
cdf of Normalized Gaussian
\[\phi_{X}(x)=P[X\leq x]=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^xe^{\frac{-t^2}{2}}\text{d}t
\]
cdf
\[F_{Y}(y)=\phi_{\frac{Y-\mu}{\sigma}}(\frac{y-\mu}{\sigma})
\]
Q-function for Normalized Gaussian
\[Q_{X}(x)=P[X>x]
\]
where \(X\) is the Normalized Gaussian.
Q-function
\[P[Y>y]=Q_{\frac{Y-\mu}{\sigma}}(\frac{y-\mu}{\sigma})
\]
Recurrence for the raw moments of a normal distribution
Denote mean is \(\mu\) and variance is \(\sigma^2\):
\[E[X^{n+1}]=\mu E[X^{n}]+n\sigma^2 E[X^{n-1}]
\]
Proof
\[\begin{aligned}
E[X^{n-1}]&=\int_{-\infty}^{+\infty}x^{n-1}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\text{d}x\\\\
&=\int_{-\infty}^{+\infty}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\text{d}\frac{x^{n}}{n}\\\\
&=[\frac{x^{n}}{n}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}]|_{x=-\infty}^{+\infty}-\int_{-\infty}^{+\infty}\frac{x^{n}}{n}\text{d}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\\\\
&=0-\int_{-\infty}^{+\infty}\frac{x^{n}}{n}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\cdot\frac{\mu-x}{\sigma^2}\text{d}x\\\\
&=-\frac{\mu}{n\sigma^2}\int_{-\infty}^{+\infty}x^{n}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\text{d}x+\frac{1}{n\sigma^2}\int_{-\infty}^{+\infty}x^{n+1}\frac{1}{\sqrt{2\pi}\sigma}e^{\frac{-(x-\mu)^2}{2\sigma^2}}\text{d}x\\\\
&=-\frac{\mu E[X^n]}{n\sigma^2}+\frac{E[X^{n+1}]}{n\sigma^2}\\\\
\end{aligned}
\]
Thus
\[n\sigma^2E[X^{n-1}]=-\mu E[X^n]+E[X^{n+1}]
\]
Joint Cumulative Distribution Function
\[F_{X, Y}(x, y)=P[\lbrace X\leq x\rbrace\cap\lbrace Y\leq y\rbrace]
\]
Marginal cdf
\[F_{X}(x)=F_{X, Y}(x, \infty)
\]
\[F_{Y}(y)=F_{X, Y}(\infty, y)
\]
Jointly Continuous Random Variables
We say that two random variables are jointly continuous if the joint cumulative
distribution function is continuous and differentiable.
Joint Probability Density Function
\[f_{X, Y}(x, y)=\frac{\partial^2 F_{X, Y}(x, y)}{\partial x\partial y}
\]
- Note that the probability density function is not necessarily continuous.
Marginal Densities
\[f_{X}(x)=\int_{-\infty}^{\infty}f_{X, Y}(x, \beta)\text{d}\beta
\]
\[f_{Y}(y)=\int_{-\infty}^{\infty}f_{X, Y}(\alpha, y)\text{d}\alpha
\]
Joint Distributions (\(X\) Discrete, \(Y\) Continuous)
Joint CDF: \(F_{X, Y}(x, y)=P[X\leq x, Y\leq y]\)
PMF in \(X\) and CDF in \(Y\): \(p_{X, Y}(x, y)=P[X=x, Y\leq y]\)
PMF in \(X\) and PDF in \(Y\): \(f_{X, Y}(x, y)=\frac{\text{d}}{\text{d}y}P[X=x, Y\leq y]\)
Conditional cdf (Continuous \(X\) and \(Y\))
Define the conditional cdf of \(Y\) given \(X=x\) by
\[F_{Y|X}(y|x)=\frac{\int_{-\infty}^yf_{X, Y}(x, \beta)\text{d}\beta}{f_{X}(x)}
\]
Proof
\[\begin{aligned}
F_{Y|X}(y|x)&=\lim_{h\rightarrow 0}\frac{P[\lbrace Y\leq y\rbrace\cap \lbrace x\leq X\leq x+h\rbrace]}{P[\lbrace x\leq X\leq x+h\rbrace]}\\\\
&=\lim_{h\rightarrow 0}\frac{\int_{-\infty}^y\int_{x}^{x+h}f_{X, Y}(\alpha, \beta)\text{d}\alpha\text{d}\beta}{\int_{x}^{x+h}f_{X}(\alpha)\text{d}\alpha}\approx\lim_{h\rightarrow 0}\frac{h\int_{-\infty}^yf_{X, Y}(x, \beta)\text{d}\beta}{f_{X}(x)h}\\\\
&=\frac{\int_{-\infty}^yf_{X, Y}(x, \beta)\text{d}\beta}{f_{X}(x)}
\end{aligned}
\]
Conditional pdf (Continuous \(X\) and \(Y\))
Define the conditional pdf of \(Y\) given \(X=x\) by
\[f_{Y|X}(y|x)=\frac{f_{X, Y}(x, y)}{f_{X}(x)}
\]
Properties
\[f_{X, Y}(x, y)=f_{X|Y}(x|y)f_{Y}(y)
\]
\[f_{X, Y}(x, y)=f_{Y|X}(y|x)f_{X}(x)
\]
- Total Probability Theorem
\[f_{Y}(y)=\int_{-\infty}^{\infty}f_{Y|X}(y|x)f_{X}(x)\text{d}x
\]
\[f_{X|Y}(x|y)=\frac{f_{Y|X}(y|x)f_{X}(x)}{\int_{-\infty}^{\infty}f_{Y|X}(y|t)f_{X}(t)\text{d}t}
\]
Conditional pmf or pdf (\(X\) discrete, \(Y\) continuous)
Let \(f_{X, Y}(x, y)=\frac{\text{d}}{\text{d}y}P[X=x, Y\leq y]\) (joint pmf in \(X\) and pdf in \(Y\)).
- Conditional pmf of \(X\) given \(Y\):
\[p_{X|Y}(x|y)=
\begin{cases}
\frac{p_{X, Y}(x, y)}{p_{Y}(y)}, &p_{Y}(y)>0\\\\
\text{undefined}, &\text{otherwise}
\end{cases}
\]
- Conditional pmf of \(X\) given \(Y\):
\[p_{X|Y}(x|y)=
\begin{cases}
\frac{f_{X, Y}(x, y)}{f_{X}(x)}, &f_{X}(x)>0\\\\
\text{undefined}, &\text{otherwise}
\end{cases}
\]
Gamma function
\[\Gamma(z)=\int_{0}^{\infty}x^{z-1}e^{-x}\text{d}x
\]
\(\Gamma(z)=(z-1)!\) for any \(z\in\mathbb{Z}\)
[知乎] 特殊函数入门指南——伽马函数(一) - fell
Independence (Continuous Random Variables)
The following three statements are equivalent:
-
\(X\) and \(Y\) are independent.
-
\(F_{X, Y}(x, y)=F_{X}(x)F_{Y}(y)\) for all \(x\) and \(y\)
-
\(f_{X, Y}(x, y)=f_{X}(x)f_{Y}(y)\) for all \(x\) and \(y\)
Moments and Central Moments
\[E[X^jY^k]=\int_{-\infty}^{\infty}\int_{-\infty}^{\infty}x^jy^kf_{X, Y}(x, y)\text{d}x\text{d}y
\]
- Central moment: \(E[(X-E[X])^j(Y-E[Y])^k]\)
Conditional Expectation
\[E[Y|x]=
\begin{cases}
\int_{-\infty}^{\infty} yf_{Y|X}(y|x)\text{d}y, & Y \text{ is continuous}\\\\
\sum_{y} yp_{Y|X}(y|x)\text{d}y, & Y \text{ is discrete}\\\\
\end{cases}
\]
Two Joint Gaussian RVs
Joint pdf
If \(X_1\) and \(X_2\) are jointly Gaussian, their joint pdf is given by
\[f_{X_1, X_2}(x_1, x_2)=\frac{1}{2\pi\sigma_{1}\sigma_2 \sqrt{1-\rho^2} }\text{exp}\left\lbrace-\frac{\frac{(x_1-m_1)^2}{\sigma_1^2}-2\rho\frac{(x_1-m_1)(x_2-m_2)}{\sigma_1\sigma_2}+\frac{(x_2-m_2)^2}{\sigma_2^2}}{2(1-\rho^2)}\right\rbrace
\]
where \(m_i=E[X_i], \sigma_i^2=\text{VAR}[X_i]\), \(\rho\) is the correlation coefficient.

Vector Notation
\[\vec{X}=
\left[
\begin{matrix}
X_1\\\\
X_2\\\\
\vdots\\\\
X_n
\end{matrix}
\right]
\]
\[\begin{aligned}
C&=E[(\vec{X}-E[\vec{X}])(\vec{X}-E[\vec{X}])^{T}]\\\\
&=
\left[
\begin{matrix}
E[(X_1-E[X_1])(X_1-E[X_1])] &E[(X_1-E[X_1])(X_2-E[X_2])] &\cdots &E[(X_1-E[X_1])(X_n-E[X_n])]\\\\
E[(X_2-E[X_2])(X_1-E[X_1])] &E[(X_2-E[X_2])(X_2-E[X_2])] &\cdots &E[(X_2-E[X_2])(X_n-E[X_n])]\\\\
\vdots &\vdots &\ddots &\cdots\\\\
E[(X_n-E[X_n])(X_1-E[X_1])] &E[(X_n-E[X_n])(X_2-E[X_2])] &\cdots &E[(X_n-E[X_n])(X_n-E[X_n])]\\\\
\end{matrix}
\right]
\end{aligned}
\]
\[f_{\vec{X}}(\vec{x})=\frac{1}{2\pi\sqrt{\det(C)}}\text{exp}\left\lbrace -\frac{1}{2}(\vec{x}-E[\vec{X}])^{T}C^{-1}(\vec{x}-E[\vec{X}])\right\rbrace
\]

Properties
Assume that \(X_1\) and \(X_2\) are joint Gaussian.
-
The marginal densities of \(X_1\) and \(X_2\) are Gaussian.
-
If \(X_1\) and \(X_2\) are uncorrelated, they are also independent.
-
The conditional density \(X_1\) given \(X_2\) is Gaussian.
-
Any affine combination of \(X_1\) and \(X_1\) is Gaussian.
Markov Inequality
Let \(X\) be a non-negative RV., \(X\geq 0\).
\[P(X\geq a)\leq \frac{E(X)}{a}
\]
Proof
\[\begin{aligned}
E(X)&=\int_{0}^{\infty} tf_{X}(t)\text{d}t\\\\
&\geq\int_{a}^{\infty} tf_{X}(t)\text{d}t\\\\
&\geq \int_{a}^{\infty} af_{X}(t)\text{d}t=aP(X\geq a)
\end{aligned}
\]
Chebyshev Inequality
Let \(X\) be a RV. with mean \(m\) and variance \(\sigma^2\).
\[P[|X-m|\geq a]\leq \frac{\sigma^2}{a^2}
\]
In other words, for any \(a\) large enough (compared the standard deviation), the probability that \(X\) is further than \(a\) from the mean is negligible (可忽略). (Very useful consider error)
Proof
\[\begin{aligned}
P(|X-m|\geq a)&=P((X-m)^2\geq a^2)\\\\
&\leq \frac{E[(X-m)^2]}{a^2}=\frac{\sigma^2}{a^2}
\end{aligned}
\]
Independent and Identically Distributed (i.i.d.) Random Variables
In probability theory and statistics a collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent.
Central Limit Theorem
Suppose \(X_i\) for \(i\in\lbrace 1, 2, \cdots, n\rbrace\) are i.i.d random variables with mean \(\mu\) and variance \(\sigma^2\).
Define \(S_n=\sum_{i=1}^n X_i, Z_n=\frac{S_n-n\mu}{\sigma\sqrt n}\), then
\[\lim_{n\rightarrow \infty}P[Z_n<z]=\frac{1}{\sqrt{2\pi}}\int_{-\infty}^{z}\text{exp}(-\frac{x^2}{2})\text{d}x
\]