Applied Statistics Notes for [2 Discrete Random Variables]

Random Variables

  • A random variable \(X\) is a function that \(\underline{\text{assigns a number to every outcome}}\) of an experiment.

  • A discrete random variable assumes values from a \(\underline{\text{countable}}\) set \(S_X=\{x_1, x_2, \cdots\}\)

  • A discrete random variable is finite if \(\underline{\text{its range is finite}}\).

Probability Mass Function (pmf) (质量分布函数)

Consider a discrete random variable \(X\) that assumes values from a finite or countable set \(S_X=\{x_1, x_2, \cdots, x_k\}\), the pmf of \(X\) is defined as:

\[p_X(x)=P[X=x]=P[\{{\zeta:X(\zeta)=x}\}] \]

Properties of Probability Mass Functions

  • \(p_X(x)\geq 0\)

  • \(\sum_{x\in S_x}p_X(x)=1\)

  • \(p[x\in B]=\sum_{x\in B}p_{X}(x)\) where \(B\subseteq S_x\)

Cumulative Distribution Function (cdf) (累积分布函数)

A cumulative distribution function (cdf) of a random variable(r.v.) \(X\) is defined as

\[F_X(x)=P(X\leq x), \text{ for all real values } x \]

  • For discrete r.v. , \(F_X(a)=\sum_{x\leq a}p_X(x)\)

Expected Value

  • Expected value of a discrete r.v. is defined by

\[E[X]=\sum_{x\in S_X}xp_X(x) \]

  • The expected value is defined if the sum converges absolutely: \(\sum_{x}|x|p_X(x)\)

  • If this sum does not converge, then the expected value does not exist (DNE).

Properties of Expected Value

总体来讲就是期望的线性性。

  • \(E[aX]=aE[X]\)

  • \(E[c]=c\)

  • \(E[X+c]=E[X]+c\)

  • \(E[\sum X]=\sum E[X]\)

Variance

Define \(D\) to be the difference between the value of the r.v.

\[D=X-E[X] \]

The variance of a r.v. , \(\text{VAR}[X]\) (often denoted by \(s^2\)), is defined as \(\underline{\text{the expected value of the squared difference}}\):

\[\text{VAR}[X]=E[D^2]=E[(X-E[X])^2]=E[X^2]-(E[X])^2 \]

Properties of Expected Value

也是总体来讲是方差的线性性

  • \(\text{VAR}[c]=0\)

  • \(\text{VAR}[X+c]=\text{VAR}[X]\)

  • \(\text{VAR}[cX]=c^2\text{VAR}[X]\)

Moments (矩) & Central Moments (中心矩)

  • The \(n^{th}\) moment of a r.v. \(X\) is defined as

\(E[X^n]=\sum_{x}x^np_X(x)\)

  • The \(n^{th}\) central moment of a r.v. \(X\) is defined as

\(E[(X-E[X])^n]=\sum_{x}(x-E[X])^np_X(x)\)

Distribution

Bernouli

\[p_X(0)=1-p, p_X(1)=p \]

  • \(E[X]=p\)

  • \(VAR[X]=p(1-p)\)

Binomial

\[p_X(k)=C_{n}^kp^k(1-p)^{n-k} \]

  • \(E[X]=np\)

  • \(VAR[X]=np(1-p)\)

Geometric

\[p_M(k)=(1-p)^{k-1}p \]

  • \(E[M]=\frac{1}{p}\)

  • \(VAR[M]=\frac{1-p}{p^2}\)

Poisson

\[P[N=k]=\frac{\alpha^k}{k!}e^{-\alpha} \]

(\(\alpha\) is a constant.)

  • \(E[N]=\alpha\)
Proof

\[\begin{aligned} E[N]&=\sum_{k=0}^\infty\frac{\alpha^k}{k!}e^{-\alpha}\cdot k\\\\ &=\frac{\alpha}{e^\alpha}\sum_{k=0}^\infty\frac{\alpha^k}{k!}\\\\ &=\frac{\alpha}{e^\alpha}\cdot e^\alpha=\alpha \end{aligned} \]

  • \(VAR[N]=\alpha\)
Proof

\[\begin{aligned} VAR[N]&=\sum_{k=0}^\infty\frac{\alpha^k}{k!}e^{-\alpha}\cdot (k-\alpha)^2\\\\ &=\frac{1}{e^\alpha}\left[\sum_{k=1}^\infty\frac{\alpha^k\cdot k}{(k-1)!}-2\sum_{k=1}^\infty\frac{\alpha^{k+1}}{(k-1)!}+\sum_{k=0}^\infty\frac{\alpha^{k+2}}{k!}\right]\\\\ &=\frac{1}{e^\alpha}\left[\sum_{k=0}^\infty\frac{\alpha^{k+1}\cdot(k+1)}{k!}-2\alpha^2e^\alpha+\alpha^2e^\alpha\right]\\\\ &=\frac{1}{e^\alpha}\left[\alpha^2e^\alpha-\alpha e^\alpha-2\alpha^2e^\alpha+\alpha^2e^\alpha\right]\\\\ &=\alpha \end{aligned} \]

  • \(P[N=k]\) achieves its maximum value at \(\lfloor\alpha\rfloor\)

  • \[\lim_{n\rightarrow \infty}C_{n}^kp^k(1-p)^{n-k}=\lim_{n\rightarrow \infty}\frac{\alpha^k}{k!}e^{-\alpha} \]

here \(\alpha\triangleq np\)

Proof

Consider \(p_0\), the probability of no successes:

\[\lim_{n\rightarrow \infty}p_0=\lim_{n\rightarrow \infty}(1-p)^n=\lim_{n\rightarrow \infty}\left(1-\frac{\alpha}{n}\right)^n=e^{-\alpha} \]

Consider the ratio

\[\begin{aligned} \frac{p_{k+1}}{p_k}&=\frac{C_{n}^{k+1}p^{k+1}(1-p)^{n-k-1}}{C_{n}^kp^k(1-p)^{n-k}}\\\\ &=\frac{(n-k)p}{(k+1)(1-p)}\\\\ &=\frac{\left(1-\frac{k}{n}\right)\alpha}{(k+1)(1-\frac{\alpha}{n})} \end{aligned} \]

Since

\[\lim_{n\rightarrow \infty}\frac{\left(1-\frac{k}{n}\right)\alpha}{(k+1)(1-\frac{\alpha}{n})}=\frac{\alpha}{k+1} \]

And

\[\lim_{n\rightarrow\infty}p_0=e^{-\alpha} \]

We can obtain

\[\lim_{n\rightarrow \infty}p_k=\lim_{n\rightarrow \infty}\frac{\alpha^k}{k!}e^{-\alpha} \]

Joint probability mass function (joint pmf)

\[p_{X, Y}(i, j)=P[\{X=i\}\cap\{Y=j\}] \]

Joint moment

\[E[X^jY^k]=\sum_{x}\sum_{y}x^jy^kp_{X, Y}(x, y) \]

Covariance

\[\text{COV}[X, Y]=E[(X-E[X])(Y-E[Y])] \]

  • When is positive: If \(X\) is greater that its mean, Y is also usually greater than its mean.

  • When is negative: If \(X\) is greater that its mean, Y is usually less than its mean.

\[\text{COV}[X, Y]=E[XY]-E[X]\cdot E[Y] \]

  • \(X\) and \(Y\) are uncorrelated if \(\text{COV}[X, Y]=0\).

  • \(X\) and \(Y\) are uncorrelated if and only if \(E[XY]=E[X]\cdot E[Y]\)

\[\text{VAR}[X+Y]=\text{VAR}[X]+2\text{COV}[X, Y]+\text{VAR}[Y] \]

Proof

\[\begin{aligned} \text{VAR}[X+Y]&=E[(X+Y-E[X+Y])^2]\\\\ &=E[(X+Y-E[X]-E[Y])^2]\\\\ &=E[((X-E[X])+(Y-E[Y]))^2]\\\\ &=E[(X-E[X])^2]+2E[(X-E[X])(Y-E[Y])]+E[(Y-E[Y])^2]\\\\ &=\text{VAR}[X]+2\text{COV}[X, Y]+\text{VAR}[Y] \end{aligned} \]

  • If \(X\) and \(Y\) are uncorrelated, then \(\text{VAR}[X+Y]=\text{VAR}[X]+\text{VAR}[Y]\)

Correlation coefficient

\[\rho_{X, Y}=\frac{\text{COV}[X, Y]}{\sigma_X\sigma_Y} \]

where \(\sigma^2_X=\text{VAR}(X), \sigma^2_Y=\text{VAR}(Y)\)

range: \(\rho_{X, Y}\in[-1, 1]\).

(comment: linear regression.)

Conditional pmf

\[p_{Y|X}(k|j)=\frac{p_{X, Y}(j, k)}{p_{X}(j)} \]

Conditional expectation

\[E[Y|x]=\sum_{y}y\cdot p_{Y|X}(y|x) \]

Conditional variance

\[\text{VAR}[Y|x]=E[Y^2|x]-E^2[Y|x] \]

Product Rule

\[p_{X, Y}(j, k)=p_{Y|X}(k|j)p_{X}(j) \]

posted @ 2025-04-16 12:51  Displace  阅读(10)  评论(0)    收藏  举报