AIGC拾遗:DDIM

前言

AIGC拾遗:扩散模型一文中,推导了扩散模型的前向扩散过程,训练目标和反向扩散采样。推导时,我们假设\(x_{0} \rightarrow x_{1} \dots \rightarrow x_{T}\)满足马尔科夫链,但在采样过程中,我们只使用了\(p(x_{t-1}|x_{t}, x_{0})\)这一分布,可以通过\(x_{t} \rightarrow \hat{x}_{0} \rightarrow \hat{x}_{t-1}\)这一过程采样\(x_{t-1}\),绕过\(p(x_{t-1}|x_{t})\)这一分布。因此,我们能否放宽前向扩散的先验条件,去掉\(p(x_{t-1}|x_{t})\)这一假设呢?其次,在使用扩散模型时,采样步数权衡了生成速度和生成质量,更多的采样步数带来更高的生成质量但延长了生成时间,那么能否减少采样步数呢?DDIM给出了解决这两个问题的方法,使得扩散过程更为灵活。

扩散框架

前向扩散:DDIM去掉了\(x_{t-1}\)\(x_{t}\)的加噪先验,只保留了从\(x_{0}\)\(x_{t}\)加噪的先验,即\(p(x_{t}|x_{0}) \sim \mathcal{N}(x_{t}; \bar{\alpha}_{t}x_{0}, \bar{\beta}_{t}^{2}I)\)。同时,保留了\(p(x_{t-1}|x_{t}, x_{0})\)的形式,假设\(p(x_{t-1}|x_{t}, x_{0}) \sim \mathcal{N}(x_{t}; m_{t}x_{t}+n_{t}x_{0}, \sigma_{t}^{2}I)\)。因此,有

\[x_{t}=\bar{\alpha}_{t}x_{0}+\bar{\beta}_{t}\epsilon_{t} \]

\[x_{t-1}=\bar{\alpha}_{t-1}x_{0}+\bar{\beta}_{t-1}\epsilon_{t-1} \]

\[x_{t-1}=m_{t}x_{t}+n_{t}x_{0}+\sigma_{t}z=(\bar{\alpha}_{t}m_{t}+n_{t})x_{0}+\sqrt{m_{t}^{2}\bar{\beta_{t}^{2}}+\sigma^{2}_{t}}\epsilon \]

联立三式可得

\[m_{t}=\sqrt{\frac{\bar{\beta}_{t-1}^{2}-\sigma_{t}^{2}}{\bar{\beta}_{t}^{2}}}, ~~~~~~~ n_{t}=\bar{\alpha}_{t-1} - \bar{\alpha}_{t}\sqrt{\frac{\bar{\beta}_{t-1}^{2}-\sigma_{t}^{2}}{\bar{\beta}_{t}^{2}}} \]

为匹配上式,我们假设\(q(x_{t-1}|x_{t}, x_{0}) \sim \mathcal{N}(x_{t-1}; m_{t}x_{t}+n_{t}x_{0}(x_{t}, t, \theta), \sigma_{t}^{2}I)\)\(x_{0}(x_{t}, t, \theta)=\frac{1}{\bar{\alpha}_{t}}(x_{t}-\bar{\beta}_{t}\epsilon(x_{t}, t, \theta))\)
训练目标

\[\begin{align} loss&=\mathbb{E}_{x_{0} \sim p(x_{0}), x_{t} \sim p(x_{t}|x_{0}), x_{t-1} \sim p(x_{t-1}|x_{0}, x_{t})}[-\log{q(x_{t-1}|x_{t}, x_{0})}] \notag\\ &=\mathbb{E}_{x_{0} \sim p(x_{0}), x_{t} \sim p(x_{t}|x_{0}), x_{t-1} \sim p(x_{t-1}|x_{0}, x_{t})}[\frac{1}{2\sigma_{t}^{2}}\|x_{t-1}-m_{t}x_{t}-n_{t}x_{0}(x_{t}, t, \theta)\|^2] \notag\\ &=\mathbb{E}_{x_{0} \sim p(x_{0}), x_{t} \sim p(x_{t}|x_{0}), p \sim \mathcal{N}(0, I)}[\frac{1}{2\sigma_{t}^{2}}\|m_{t}x_{t}+n_{t}x_{0}-m_{t}x_{t}-n_{t}x_{0}(x_{t}, t, \theta)+\sigma_{t}p\|^2] \notag\\ &\Leftrightarrow \mathbb{E}_{x_{0} \sim p(x_{0}), x_{t} \sim p(x_{t}|x_{0})}[\frac{n_{t}^{2}}{2\sigma_{t}^{2}}\|x_{0}-x_{0}(x_{t}, t, \theta)\|^2] \notag\\ &\Leftrightarrow \mathbb{E}_{x_{0} \sim p(x_{0}), \epsilon \sim \mathcal{N}(0, I)}[\frac{n_{t}^{2}\bar{\beta}_{t}^{2}}{2\sigma_{t}^{2}\bar{\alpha}_{t}^{2}}\|\epsilon-\epsilon(x_{t}, t, \theta)\|^2] \notag\\ \end{align} \]

采样过程

\[\begin{align} x_{t-1} &= m_{t}x_{t}+n_{t}x_{0}+\sigma_{t}z \notag\\ &= \frac{1}{\alpha_{t}}(x_{t}-(\bar{\beta}_{t}-\alpha_{t}\sqrt{\bar{\beta}_{t-1}^{2}-\sigma_{t}^{2}})\epsilon(x_{t}, t, \theta))+\sigma_{t}z \notag\\ &= \bar{\alpha}_{t-1}x_{0}(x_{t}, t, \theta)+\sqrt{\bar{\beta}_{t-1}^{2}-\sigma_{t}^{2}}\epsilon(x_{t}, t, \theta)+\sigma_{t}z\tag{1}\label{1}\\ \end{align} \]

\(\sigma_{t}=\frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}}\)时,上述的采样公式变为

\[\begin{align} x_{t-1} &= \frac{1}{\alpha_{t}}(x_{t}-(\bar{\beta}_{t}-\alpha_{t}\sqrt{\bar{\beta}_{t-1}^{2}-\sigma_{t}^{2}})\epsilon(x_{t}, t, \theta))+\sigma_{t}z \notag\\ &= \frac{1}{\alpha_{t}}(x_{t}-\frac{\beta_{t}^{2}}{\bar{\beta}_{t}}\epsilon(x_{t}, t, \theta))+\frac{\beta_{t}\bar{\beta}_{t-1}}{\bar{\beta}_{t}}z \notag \end{align} \]

AIGC拾遗:扩散模型中推出的公式一致,因此式\eqref{1}可以视作其更为泛化的版本。

\(\sigma_{t}=\bar{\beta}_{t-1}\)时,上述采样公式变为

\[x_{t-1}=\frac{1}{\alpha_{t}}(x_{t}-\bar{\beta}_{t}\epsilon(x_{t}, t, \theta))+\bar{\beta}_{t-1}z=\bar{\alpha}_{t-1}x_{0}(x_{t}, t, \theta)+\bar{\beta}_{t-1}z \]

此时,\(x_{t-1}\)只与\(x_{0}\)相关,而与\(x_{t}\)完全无关,且形式与前向加噪的形式完全一致。

\(\sigma_{t}=0\)时,上述采样公式变为

\[x_{t-1}=\frac{1}{\alpha_{t}}(x_{t}-(\bar{\beta}_{t}-\alpha_{t}\bar{\beta}_{t-1})\epsilon(x_{t}, t, \theta))=\bar{\alpha}_{t-1}x_{0}(x_{t}, t, \theta)+\bar{\beta}_{t-1}\epsilon(x_{t}, t, \theta) \]

此时,\(x_{t-1}\)\(x_{0}\)\(x_{t}\)的线性插值,反向扩散过程\(x_{T} \rightarrow x_{T-1} \rightarrow \dots \rightarrow x_{0}\)完全确定,并且该轨迹只与初始噪声\(x_{T}\)相关。

加速采样

我们假设存在一个关于扩散步数\(\mathcal{T}\)的子集\(\mathcal{S}\),目标是产生采样轨迹\(x_{s} \rightarrow x_{s-1} \rightarrow \dots \rightarrow x_{0}\)。式\eqref{1}给出了反向扩散的数学建模\(q(x_{t-1}|x_{t}, x_{0}) \sim \mathcal{N}(x_{t-1}; m_{t}x_{t}+n_{t}x_{0}(x_{t}, t, \theta), \sigma_{t}^{2}I)\)。因此,在采样过程中,我们可以先通过\(x_{t}\)计算\(x_{0}\),再按照\(q(x_{t-1}|x_{t}, x_{0})\)进行采样,获得\(x_{t-1}\)。这启发我们在可以先通过\(x_{s}\)去噪得到\(x_{0}\),再加噪获得\(x_{s-1}\)

\[x_{s} \rightarrow x_{0}: ~~~~~~~~ x_{0} = \frac{1}{\bar{\alpha}_{s}}(x_{s} - \bar{\beta}_{s}\epsilon(x_s, s ,\theta)) \]

\[\begin{align} x_{s}, x_{0} \rightarrow x_{s-1}: ~~~~~~~~ x_{s-1}&=m_{s}x_{s}+n_{s}x_{0}+\sigma_{s}z \notag\\ &=\bar{\alpha}_{s-1}x_{0}(x_{s}, s, \theta)+\sqrt{\bar{\beta}_{s-1}^{2}-\sigma_{s}^{2}}\epsilon(x_{s}, s, \theta)+\sigma_{s}z \tag{2}\label{2} \end{align} \]

总结

本文回顾了DDIM算法,该算法去掉了传统的DDPM中的马尔可夫假设,使得扩散过程更加灵活。式\eqref{1}表明,在采样\(x_{t-1}\)时,我们可以在保持方差一致的前提下(方差为\(\bar{\beta}_{t-1}\)),修改加噪的形式(一部分已知的噪声\(\epsilon(x_{t}, t, \theta)\)和一部分未知的噪声\(z\)的加权),复用前向扩散的公式。同时,在该算法的视角下,很容易得到对应的加速采样公式\eqref{2}。

参考资料
https://spaces.ac.cn/archives/9181

posted @ 2025-04-06 15:57  久逺61  阅读(42)  评论(0)    收藏  举报