变分自编码器(VAE)

1. 背景知识

1.1 ELBo

1.1.1 为什么引入隐变量 \(z\) ?

因为我们在现实世界看到的物体可能也产生于高层级的表示,高层级的表示或许概括了颜色、大小、形状等的抽象属性。

1.1.2 如何推导ELBo (Evidence Lower Bound)?

无条件的生成模型学习的是如何建模真实分布 \(p\left (x\right )\) ,所以有:

\[\begin{align} \log{\underbrace{p\left (x\right )}_{\text evidence}} =& \log{p\left (x\right )}\int \underbrace{q_{\phi}\left (z\mid x\right )}_{\text{approximate posterior}}dz \\ =&\int q_{\phi}\left (z\mid x\right )\left (\log{p\left (x\right )}\right )dz \\ =&\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{p\left (x\right )}\right ] \\ =&\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{\frac{p\left (x,z\right )}{p\left (z\mid x\right )}}\right ] \\ =&\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [ \log{\frac{p\left (x,z\right )q_{\phi}\left (z\mid x\right )}{p\left (z\mid x\right )q_{\phi}\left (z\mid x\right )}}\right ] \\ =&\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{\frac{p\left (x, z\right )}{q_{\phi}\left (z\mid x\right )}}\right ] + \mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{\frac{q_{\phi}\left (z\mid x\right )}{p\left (z\mid x\right )}}\right ] \\ =&\underbrace{\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{\frac{p\left (x,z\right )}{q_{\phi}\left (z\mid x\right )}}\right ]}_{\text{ELBo}} + \underbrace{D_{KL}\left (\underbrace{q_{\phi}\left (z\mid x\right )}_{\text{approximate posterior}} \mid \underbrace{p\left (z\mid x\right )}_{\text{true posterior}}\right )}_{\geq 0} \\ &\geq \underbrace{\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{\frac{p\left (x, z\right )}{q_{\phi}\left (z\mid x\right )}}\right ]}_{\text{ELBo}} \end{align}\]

1.1.3 为什么要最大化ELBo?

原因1:因为我们想要模型学习近似后验 \(q_{\phi}\left (z\mid x\right )\) 无限接近真实后验 \(p\left (z\mid x\right )\),但是无法直接求公式 (7) 中的 \(D_{KL}\) 项:

\[\begin{align} \min_{\phi}{\underbrace{D_{KL}\left (\underbrace{\underbrace{q_{\phi}\left (z\mid x\right )}_{\text{approximate posterior}} }_{\text{Encoder is learnable}} \mid \underbrace{\underbrace{p\left (z\mid x\right )}_{\text{true posterior}}}_{\text{unknow}}\right )}_{\text{untractable}}} \end{align}\]

原因2:对于任意的样本\(x_i \sim p\left (x\right )\)\(p\left (x_i\right )\)是个常数,那么通过\(\max_{\phi}{\text{ELBo}}\)等价于\(\min_{\phi}{D_{KL}}\)

\[\begin{align} \because\log{\underbrace{p\left (x_i\right )}_{\text constant}} =& \underbrace{\mathbb{E}_{q_{\phi}\left (z\mid x_i\right )}\left [\log{\frac{p\left (x_{i},z\right )}{q_{\phi}\left (z\mid x_i\right )}}\right ]}_{\text{ELBo}} + \underbrace{D_{KL}\left (\underbrace{q_{\phi}\left (z\mid x_i\right )}_{\text{approximate posterior}} \mid \underbrace{p\left (z\mid x_i\right )}_{\text{true posterior}}\right )}_{\geq 0} \\ \min_{\phi}{D_{KL}} &\iff \max_{\phi}{\text{ELBo}} \end{align}\]

2. VAE (Variational Auto Encoder)

2.1 为什么有Variational?

因为我们优化的 \(q_{\phi}\left (z\mid x\right )\) 服从某一分布族,该分布族被 \(\mathbf{\phi}\) 参数化,这就是 Variational 的来源。

2.2 为什么有Auto Encoder?

因为模型会像 Autoencoder 模型一样压缩数据维度,提取数据中的有效信息。

2.3 VAE的优化目标

\[\begin{align} &\max_{\phi}\underbrace{\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{\frac{p\left (x, z\right )}{q_{\phi}\left (z\mid x\right )}}\right ]}_{\text{ELBo}} \\ =& \max_{\phi,\theta}\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{\frac{p_{\theta}\left (x\mid z\right )p\left (z\right )}{q_{\phi}\left (z\mid x\right )}}\right ] \\ =&\max_{\phi,\theta}\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{p_{\theta}\left (x\mid z\right )}\right ] + \mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{\frac{p\left (z\right )}{q_{\phi}\left (z\mid x\right )}}\right ]\\ =&\max_{\phi,\theta}\underbrace{\mathbb{E}_{q_{\phi}\left (z\mid x\right )}\left [\log{\underbrace{p_{\theta}\left (x\mid z\right )}_{\text{Decoder}}}\right ]}_{\text{resconstraction term}} - \underbrace{D_{KL}\left (\underbrace{q_{\phi}\left (z\mid x\right )}_{\text{Encoder}} \mid \underbrace{p\left (z\right )}_{\text{prior}}\right )}_{\text{prior matching term}} \\ \approx & \max_{\phi,\theta}\sum_{l=1}^{L}\log{p_{\theta}\left (x\mid z^{l}\right )} - D_{KL}\left (\underbrace{q_{\phi}\left (z\mid x\right )}_{\sim N\left (\mu,\sigma^2\right )}\mid \underbrace{p\left (z\right )}_{\sim N\left (0,1\right )}\right ) \quad \text{(apply Monte Carlo Estimate)} \\ =&\max_{\phi,\theta}\sum_{l=1}^{L}\log{p_{\theta}\left (x\mid z^{l}\right )} - \frac{1}{2}\left (-\log{\sigma^2} + \mu^2 + \sigma^2 - 1\right ) \end{align}\]

优化目标主要包括了两项:

  • 重构项 (reconstruction term) 迫使模型的解码器 (Decoder) 学习由隐变量 \(\boldsymbol{z}\) 恢复原始样本的能力;
  • 先验匹配项 (prior matching term) 迫使模型的编码器 (Encoder) 学习将原始样本转换到先验分布 (标准正态分布) 的能力。

2.4 VAE模型架构

VAE模型示例图

3. VAE的训练

训练过程将批量的图片送入模型中,每张图片由 Encoder 产生 \(\mu\)\(\sigma\),进而生成服从 \(\mathcal{N}\left (\mu, \sigma^2\right )\) 的隐变量 \(\boldsymbol{z}\),最后经过 Decoder 生成图片,整体流程如下:

\[\underbrace{x}_{x \sim p\left (x\right )} \rightarrow\underbrace{\text{Encoder}}_{q_{\phi}\left (z\mid x\right )}\rightarrow \mu,\sigma \rightarrow \underbrace{\underbrace{z\sim \mathbf{N\left (\mu,\sigma^2\right )}}_{z=\mu + \sigma \odot \epsilon, \text{with } \epsilon \sim N\left (0,I\right )}}_{\text{reparameterization trick}} \rightarrow \underbrace{\text{Decoder}}_{p_{\theta}\left (x\mid z\right )}\rightarrow\hat{x} \]

其中,训练过程中会采用重参数化技巧 (reparameterization trick) 使得整个过程可导,因为这样 \(\mu\)\(\sigma\) 变成可导的参数,变化的 \(\epsilon\) 被看作不用求导的常数,不被算在梯度图中。

4. VAE的推理

推理只需要从标准正态分布中采样隐变量 \(z\) 即可以生成新的样本,因为 VAE 目标函数中的先验匹配项迫使 \(z\) 逐渐逼近标准正态分布,整体流程如下:

\[\underbrace{z}_{\mathbf{z \sim N\left (0,I\right )}} \rightarrow \underbrace{\text{Decoder}}_{p_{\theta}\left (x \mid z\right )} \rightarrow \underbrace{\hat{x}}_{\text{new sample}} \]

posted @ 2024-04-27 09:35  RenjieW  阅读(130)  评论(0)    收藏  举报