# VAE（Variational Auto-Encoder, 变分自动编码器）

## VAE的结构

1. 为了让输出和输入尽可能像，所以要让输出和输入的差距尽可能小，此部分用MSELoss来计算，即最小化MSELoss。
2. 训练过程中，如果仅仅使输入和输出的误差尽可能小，那么随着不断训练，会使得$$\sigma$$趋近于0，这样就使得VAE越来越像AE，对数据产生了过拟合，编码的噪声也会消失，导致无法生成未见过的数据。因此为了解决这个问题，我们要对$$\mu$$$$\sigma$$加以约束，使其构成的正态分布尽可能像标准正态分布，具体做法是计算$$\mathcal{N}\left(\mu_{}, \sigma_{}^2\right)$$$$\mathcal{N}\left(0, 1\right)$$之间的KL散度，即最小化下式（具体推导过程下面介绍）：

$\mathrm {KL}(\mathcal{N}\left(\mu_{}, \sigma_{}^2\right)||\mathcal{N}\left(0, 1\right))=\frac{1}{2}\left(-\log \sigma^{2}+\mu^{2}+\sigma^{2}-1\right)$

## KL Loss推导

### 先导知识

$\mathrm{KL}(P \| Q)=\int P(x) \log \frac{P(x)}{Q(x)} d x$

$E(X)=\int_{-\infty}^{\infty} x f(x) d x$

$E(Y)=E(g(X))=\int_{-\infty}^{\infty} g(x) f(x) d x$

$\mathrm D(\mathrm X) = \mathrm E\big([x-\mathrm E(\mathrm X)]^2\big)= \mathrm E(\mathrm X^2)-[\mathrm E(\mathrm X)]^2$

$$X \sim N\left(\mu, \sigma^{2}\right)$$的概率密度函数：

$f(x)=\frac{1}{\sqrt{2 \pi} \sigma} \exp \left(-\frac{(x-\mu)^{2}}{2 \sigma^{2}}\right)$

### KL Loss推导过程

\begin{aligned} \mathrm {KL}(\mathcal{N}\left(\mu_{}, \sigma_{}^2\right)||\mathcal{N}\left(0, 1\right)) &=\int \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\left(\log \frac{\exp\big({-(x-\mu)^{2} / 2 \sigma^{2}\big) \big/ \sqrt{2 \pi \sigma^{2}}}}{\exp\big({-x^{2} / 2\big) \big/ \sqrt{2 \pi}}}\right)\ dx\\ &=\int \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\log \bigg( \frac{1}{\sqrt{\sigma^2}}\cdot \exp\big(\frac 1 2(x^2-(x-\mu)^2/\sigma^2)\big) \bigg)\ d x\\ &=-\frac 1 2 \int \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\bigg(\log\sigma^2-x^2+(x-\mu)^2/\sigma^2\bigg)\ dx \end{aligned}

\begin{aligned} &\int \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\cdot \log\sigma^2 \ dx\\ &=\log\sigma^2\int \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg) \ dx\\ &=\log\sigma^2 \end{aligned}

\begin{aligned} &\int \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\cdot x^2 \ dx\\ \end{aligned}

\begin{aligned} \int \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\cdot x^2 \ dx &=\mathrm E(\mathrm X^2) \\ &=\mathrm D(\mathrm X)+[\mathrm E(\mathrm X)]^2\\ &=\sigma^2+\mu^2 \end{aligned}

\begin{aligned} &\int \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\cdot (x-\mu)^2/\sigma^2 \ dx\\ &=\frac {1} {\sigma^2} \int \frac{(x-\mu)^2}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\ dx\\ \end{aligned}

\begin{aligned} \frac {1} {\sigma^2} \int \frac{(x-\mu)^2}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\ dx &=\frac {1} {\sigma^2} \cdot \mathrm E\big([x-\mathrm E(\mathrm X)]^2\big)\\ &=\frac {1} {\sigma^2} \cdot \mathrm D(\mathrm X) \\ &= \frac {1} {\sigma^2} \cdot \sigma^2\\ &=1 \end{aligned}

\begin{aligned} \mathrm {KL}(\mathcal{N}\left(\mu_{}, \sigma_{}^2\right)||\mathcal{N}\left(0, 1\right)) &=-\frac 1 2 \int \frac{1}{\sqrt{2 \pi \sigma^{2}}} \exp\bigg({-(x-\mu)^{2} / 2 \sigma^{2}}\bigg)\bigg(\log\sigma^2-x^2+(x-\mu)^2/\sigma^2\bigg)\ dx\\ &=-\frac 1 2(\log\sigma^2-\sigma^2-\mu^2+1) \end{aligned}

# 小结

• AE主要用于数据的压缩与还原，在生成数据上使用VAE。
• AE是将数据映直接映射为数值code，而VAE是先将数据映射为分布，再从分布中采样得到数值code。
• VAE的缺点是生成的数据不一定那么“真”，如果要使生成的数据“真”，则要用到GAN。

# 参考

https://datawhalechina.github.io/leeml-notes/#/chapter29/chapter29

Tutorial on Variational Autoencoders

posted @ 2021-04-21 18:49  火锅先生  阅读(6028)  评论(0编辑  收藏  举报