# 高斯分布-笔记(1)

### 1 -单变量高斯分布

$p(x)=\frac{1}{\sqrt{2\pi\sigma}}exp\{-\frac{1}{2}(\frac{x-\mu}{\sigma})^2\} \tag{1.1}$

$\mu=E(x)=\int_{-\infty}^\infty xp(x)dx \tag{1.2}$

$\sigma^2=\int_{-\infty}^\infty(x-\mu)^2p(x)dx \tag{1.3}$

### 2 - 多元高斯分布

$p({\bf x})=\frac{1}{(2\pi)^{\frac{d}{2}}|\Sigma|^\frac{1}{2}}exp\{-\frac{1}{2}({\bf x-\mu})^T{\Sigma}^{-1}({\bf x-\mu})\} \tag{2.1}$

${\bf \mu}=[\mu_1,\mu_2,...,\mu_d]^T$$d$维均值的列向量；
$\Sigma$$d\times d$维的协方差矩阵；
${\Sigma}^{-1}$$\Sigma$的逆矩阵;
$|\Sigma|$$\Sigma$的行列式；
$(\bf x-\mu)^T$$(\bf x-\mu)$的转置，且

$\mu=E(\bf x) \tag{2.2}$

$\Sigma=E\{(\bf x-\bf \mu)(\bf x - \mu)^T\}\tag{2.3}$

$\mu_i=E(x_i)=\int_{-\infty}^\infty x_ip(x_i)dx_i \tag{2.4}$

$p(x_i)=\int_{-\infty}^\infty\cdot\cdot\cdot\int_{-\infty}^\infty p({\bf x})dx_1dx_2 \cdot\cdot\cdot dx_d \tag{2.5}$

$\begin{eqnarray}\sigma_{ij}^2 &=&E[(x_i-\mu_i)(x_j-\mu_j)]\\ &=&\int_{-\infty}^\infty\int_{-\infty}^\infty(x_i-\mu_i)(x_j-\mu_j)p(x_i,x_j)dx_idx_j \end{eqnarray} \tag{2.6}$

$\Sigma= \begin{bmatrix} \sigma_{11}^2 & \sigma_{12}^2 \cdot\cdot\cdot \sigma_{1d}^2 \\ \sigma_{12}^2 & \sigma_{22}^2 \cdot\cdot\cdot \sigma_{2d}^2\\ \cdot\cdot\cdot &\cdot\cdot\cdot\\ \sigma_{1d}^2 & \sigma_{2d}^2 \cdot\cdot\cdot \sigma_{dd}^2 \end{bmatrix}$

$({\bf x}-\mu)^T{\Sigma}^{-1}({\bf x-\mu})=常数 \tag{2.7}$

$\gamma^2=({\bf x}-\mu)^T{\Sigma}^{-1}({\bf x}-\mu)$

$V=V_d|\Sigma|^{\frac{1}{2}}\gamma^d$

$V_d=\begin{cases}\frac{\pi^{\frac{d}{2}}}{(\frac{d}{2})!},&d 为偶数\\ \frac{2^d\pi^{(\frac{d-1}{2})}(\frac{d-1}{2})!}{d!},d为奇数 \end{cases}$

#### 2.1 - 多变量高斯分布中马氏距离的2维表示

${\bf\Sigma}^{-1}={\bf U^{-T}\Lambda^{-1}U^{-1}}={\bf U\Lambda^{-1}U^T}=\sum_{i=1}^d\frac{1}{\lambda_i}{\bf u}_i{\bf u}_i^T$

$\begin{eqnarray}({\bf x-\mu})^T{\Sigma}^{-1}({\bf x-\mu}) &=&({\bf x-\mu})^T\left(\sum_{i=1}^d\frac{1}{\lambda_i}{\bf u}_i{\bf u}_i^T\right)({\bf x-\mu})\\ &=&\sum_{i=1}^d\frac{1}{\lambda_i}({\bf x-\mu})^T{\bf u}_i{\bf u}_i^T({\bf x-\mu})\\ &=&\sum_{i=1}^d\frac{y_i^2}{\lambda_i} \end{eqnarray}$

$\frac{y_1^2}{\lambda_1}+\frac{y_2^2}{\lambda_2}=1$

ps：所以得出结论，马氏距离就是欧式距离先通过$\bf \mu$中心化，然后基于$\bf U$旋转得到的。

#### 2.2多变量高斯分布的最大似然估计

$\hat\mu=\frac{1}{N}\sum_{i=1}^N{\bf x}_i=\overline{\bf x}\tag{2.2.1}$

$\begin{eqnarray}\hat{\Sigma} &=&\frac{1}{N}\sum_{i=1}^N({\bf x}_i-{\bf\overline x})({\bf x}_i-{\bf\overline x})^T\\ &=&\frac{1}{N}\sum_{i=1}^N\left({\bf x}_i{\bf x}_i^T-{\bf x}_i{\bf \overline x}^T-{\bf \overline x}{\bf x}_i^T+{\bf \overline x}{\bf \overline x}^T\right)\\ &=&\frac{1}{N}\sum_{i=1}^N\left({\bf x}_i{\bf x}_i^T\right)-2{\bf \overline x}{\bf \overline x}^T+{\bf \overline x}{\bf \overline x}^T\\ &=&\frac{1}{N}\sum_{i=1}^N\left({\bf x}_i{\bf x}_i^T\right)-{\bf \overline x}{\bf \overline x}^T \end{eqnarray}\tag{2.2.2}$

${\bf x^TAx}=tr({\bf x^TAx})=tr({\bf xx^TA})=tr({\bf Axx^T})\tag{2.2.3}$

$p(d|\mu,\Sigma)= \frac{1}{{2\pi}^{d/2}}*|\Sigma^{-1}|^{1/2}*\exp\left[-\frac{1}{2}({\bf x-\mu})^T{\Sigma}^{-1}({\bf x-\mu})\right]\tag{2.2.4}$

$\begin{eqnarray} \scr L({\bf \mu},\Sigma) &=&\log p(d|{\bf \mu},\Sigma)\\ &=&0+\frac{N}{2}\log|{\bf \Lambda}|-\frac{1}{2}\sum_{i=1}^N({{\bf x}_i-\mu})^T{\bf \Lambda}({{\bf x}_i-\mu}) \end{eqnarray}\tag{2.2.5}$

$\begin{eqnarray} \frac{d}{d\mu}\left(\frac{1}{2}({{\bf x}_i-\mu})^T{\Sigma}^{-1}({{\bf x}_i-\mu})\right) &=&\frac{d}{d{\bf y}_i}\left({\bf y}_i^T\Sigma^{-1}{\bf y}_i\right)\frac{d{\bf y}_i}{d\mu}\\ &=&(\Sigma^{-1}+\Sigma^{-T}){\bf y}_i(-1)\\ &=&-(\Sigma^{-1}+\Sigma^{-T}){\bf y}_i \end{eqnarray}$

$\Sigma$是对称矩阵，所以：

$\begin{eqnarray} \frac{d}{d\mu}{\scr L}(\mu,\Sigma) &=&0+\frac{d}{d\mu}\left(-\frac{1}{2}\sum_{i=1}^N({{\bf x}_i-\mu})^T{\bf \Lambda}({{\bf x}_i-\mu})\right)\\ &=&-\frac{1}{2}\sum_{i=1}^N\left(-(\Sigma^{-1}+\Sigma^{-T}){\bf y}_i\right)\\ &=&\sum_{i=1}^N\Sigma^{-1}{\bf y}_i\\ &=&\Sigma^{-1}\sum_{i=1}^N({\bf x}_i-\mu)=0 \end{eqnarray}$

$\bf A_1B+A_2B=(A_1+A_2)B$
$tr({\bf A})+tr({\bf B})=tr(\bf A+B)$

$tr({\bf A_1 B})+tr({\bf A_2 B})=tr[(\bf A_1+A_2)B]$

$\begin{eqnarray} \scr L({\bf \mu},\Sigma) &=&\log p(d|{\bf \mu},\Sigma)\\ &=&0+\frac{N}{2}\log|{\bf \Lambda}|-\frac{1}{2}\sum_{i=1}^Ntr[({{\bf x}_i-\mu})({{\bf x}_i-\mu})^T{\bf \Lambda}]\\ &=&\frac{N}{2}\log|{\bf \Lambda}|-\frac{1}{2}tr({\bf S_\mu}{\bf \Lambda}) \end{eqnarray}\tag{2.2.5}$

$\frac{d\scr L(\mu,\Sigma)}{d{\bf \Lambda}}=\frac{N}{2}{\bf \Lambda^{-T}}-\frac{1}{2}{\bf S}_\mu^T=0$

${\bf \Lambda^{-T}}={\bf \Lambda^{-1}}=\Sigma=\frac{1}{N}{\bf S}_\mu$

$\hat{\Sigma} =\frac{1}{N}\sum_{i=1}^N({\bf x}_i-{\bf\mu})({\bf x}_i-{\bf\mu})^T$

#### 2.3 基于多元变量高斯分布的分类方法

1 - 各个类别的协方差都相等$\Sigma_{c_k}=\Sigma$:

$p(X={\bf x}|Y=c_k,{\bf \theta}) = {\cal N}({\bf x|\mu}_{c_k},\Sigma_{c_k}）\tag{3.1}$

ps：基于第$k$类基础上关于变量$\bf x$的概率，就是先挑选出所有$k$类的样本，然后再计算其多元高斯概率。且如果$\Sigma_{c_k}$是对角矩阵(即不同特征之间相互独立)，则其就等于朴素贝叶斯。

$\begin{eqnarray}\hat y({\bf x}) &=&arg\max_{c_k}P(Y={c_k}|X={\bf x})\\ &=&arg\max_{c_k}\frac{P(Y={c_k},X={\bf x})}{P(X={\bf x})} \end{eqnarray}\tag{3.2}$

$\hat y({\bf x})=arg\max_{c_k}P(X={\bf x}|Y={c_k})P(Y={c_k})$

$P(X={\bf x}|Y={c_k})=\frac{1}{(2\pi)^{d/2}|\Sigma|^{1/2}}\exp[-\frac{1}{2}({\bf x-\mu}_{c_k})^T\Sigma^{-1}({\bf x-\mu}_{c_k})]$
$P(Y={c_k})=\pi_{c_k}$

$\begin{eqnarray}P(Y={c_k}|X={\bf x}) \quad &正比于& \pi_{c_k}\exp[-\frac{1}{2}({\bf x-\mu}_{c_k})^T\Sigma^{-1}({\bf x-\mu}_{c_k})]\\ &=&\pi_{c_k}\exp[-\frac{1}{2}{\bf x}^T\Sigma^{-1}{\bf x}+\frac{1}{2}{\bf x}^T\Sigma^{-1}{\bf \mu}_{c_k}+\frac{1}{2}{\bf \mu}_{c_k}^T\Sigma^{-1}{\bf x}-\frac{1}{2}{\bf \mu}_{c_k}^T\Sigma^{-1}{\bf \mu}_{c_k}]\\ &=&\pi_{c_k}\exp[-\frac{1}{2}{\bf x}^T\Sigma^{-1}{\bf x}+{\bf \mu}_{c_k}^T\Sigma^{-1}{\bf x}-\frac{1}{2}{\bf \mu}_{c_k}^T\Sigma^{-1}{\bf \mu}_{c_k}]\\ &=&exp[{\bf \mu}_{c_k}^T\Sigma^{-1}{\bf x}-\frac{1}{2}{\bf \mu}_{c_k}^T\Sigma^{-1}{\bf \mu}_{c_k}+\log\pi_{c_k}]exp[-\frac{1}{2}{\bf x}^T\Sigma^{-1}{\bf x}]\\ &=&\frac{exp[{\bf \mu}_{c_k}^T\Sigma^{-1}{\bf x}-\frac{1}{2}{\bf \mu}_{c_k}^T\Sigma^{-1}{\bf \mu}_{c_k}+\log\pi_{c_k}]}{exp[\frac{1}{2}{\bf x}^T\Sigma^{-1}{\bf x}]} \end{eqnarray}$

$P(Y={c_k}|X={\bf x})=\frac{exp({\beta_{c_k}^T{\bf x}+\gamma_{c_k})}}{\sum_{k=1}^{|c|}exp({\beta_{c_k}^T{\bf x}+\gamma_{c_k})}}=S(\eta)_{c_k}$

$S(\eta)_{c_k}=\frac{exp(\eta_{c_k})}{\sum_{k=1}^{|c|}exp(\eta_{c_k})}$

softmax之所以这样命名就是因为它有点像max函数。

$P(Y={c_k}|X={\bf x})=P(Y={c_k'}|X={\bf x})$
$\beta_{c_k}^T{\bf x}+\gamma_{c_k}=\beta_{c_k'}^T{\bf x}+\gamma_{c_k'}$
${\bf x}^T(\beta_{c_k'}-\beta_{c_k})=\eta_{c_k'}-\eta_{c_k}$

[] 边肇祺。模式识别 第二版
[] Machine learning A Probabilistic Perspective
[] William.Feller, 概率论及其应用(第1卷)

posted @ 2018-10-11 16:15  仙守  阅读(3192)  评论(0编辑  收藏  举报