ddpm损失函数
ddpm损失函数
从本文开始正式介绍ddpm损失函数。在扩散模型推导前置中我们首次介绍了最大化似然的目标,通过逆向过程\(p\)计算\(x_0\)概率最大化就可以生成图片
\[\mathbb E[-\log p_\theta(x_0)]\le\mathbb E_q\left[-\log\frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)}\right]
\]
\[p_\theta(x_{0:T}):=p(x_T)\prod_{t=1}^Tp_\theta(x_{t-1}|x_t)
\]
\[q(x_{1:T}|x_0):=\prod_{t=1}^T q(x_t|x_{t-1})
\]
正文
我们将\(p_\theta,q\)带入公式中:
\[\begin{aligned}
-\log\frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)} &= -\log\frac{p(x_T)\prod_{t=1}^Tp_\theta(x_{t-1}|x_t)}{\prod_{t=1}^T q(x_t|x_{t-1})} \\
&=-\log p(x_T)-\log\prod_{t-1}^Tp_\theta(x_{t-1}|x_t)+\log \prod_{t=1}^T q(x_t|x_{t-1})\\
&=-\log p(x_T)-\sum_{t=1}^T\log p_\theta(x_{t-1}|x_t)+\sum_{t=1}^T\log q(x_t|x_{t-1}) \\
&=-\log p(x_T)-\sum_{t=1}^T\log p_\theta(x_{t-1}|x_t)+\sum_{t=1}^T\log \frac{q(x_{t-1}|x_t)q(x_t)}{q(x_{t-1})} \\
&=-\log p(x_T)-\sum_{t=1}^T\log p_\theta(x_{t-1}|x_t)+\sum_{t=1}^T\log q(x_{t-1}|x_t)+\log \frac{q(x_T)}{q(x_0)} \\
&=-\log p(x_T)-\sum_{t=1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t)}+\log \frac{q(x_T)}{q(x_0)}
\end{aligned}
\tag{Eq.1}
\]
Eq.1中\(\frac{q(x_T)}{q(x_0)}\)是常量,最大似然估计前两项即可,和附件中公式(3)的计算结果一致:
\[\mathbb E_q\left[-\log\frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)}\right]=\mathbb E_q\left[ -\log p(x_T)-\sum_{t=1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t)}\right] =:L
\tag{Eq.2}
\]
损失函数引入条件\(x_0\)
我们在前文逆向过程推导中对\(q\)引入了条件\(x_0\)完成计算,我们针对损失也可以同样引入。我们将Eq.1的最终结果进一步整理,将变量相同的放在一起:
\[\begin{aligned}
-\log\frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)} &=-\log p(x_T)-\sum_{t=1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t)}+\log \frac{q(x_T)}{q(x_0)} \\
&= -\log\frac{p(x_T)}{q(x_T)}-\sum_{t=1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t)}-\log(q(x_0)) \\
&\Rightarrow -\log\frac{p(x_T)}{q(x_T|x_0)}-\sum_{t=1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}-\log(q(x_0|x_0)) &&(引入条件x_0) \\
&=-\log\frac{p(x_T)}{q(x_T|x_0)}-\sum_{t=1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}-\log(q(x_0|x_0)) \\
&=-\log\frac{p(x_T)}{q(x_T|x_0)}-\sum_{t=1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)} &&(q(x_0|x_0)=1) \\
&=-\log\frac{p(x_T)}{q(x_T|x_0)}-\sum_{t\gt1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}-\log\frac{p_\theta(x_0|x_1)}{q(x_0|x_t,x_0)} &&(拆分t=1)\\
&=-\log\frac{p(x_T)}{q(x_T|x_0)}-\sum_{t\gt1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}-\log p_\theta(x_0|x_1)
\end{aligned}
\tag{Eq.3}
\]
我们将Eq.3加上期望,最终结果如下:
\[\begin{aligned}
\mathbb E_q\left[-\log\frac{p_\theta(x_{0:T})}{q(x_{1:T}|x_0)}\right]&=\mathbb E_q\left[-\log\frac{p(x_T)}{q(x_T|x_0)}-\sum_{t\gt1}^T\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}-\log p_\theta(x_0|x_1)\right] \\
&=\mathbb E_q\left[-\int q(x_T|x_0)\log\frac{p(x_T)}{q(x_T|x_0)}dx-\sum_{t\gt1}^T\int q(x_{t-1}|x_t,x_0)\log \frac{p_\theta(x_{t-1}|x_t)}{q(x_{t-1}|x_t,x_0)}dx- \log p_\theta(x_0|x_1) \int q(x)dx\right] \\
&=\mathbb E_q\left[D_{KL}[q(x_T|x_0)\|p(x_T)]+D_{KL}[q(x_{t-1}|x_t,x_0)\|p_\theta(x_{t-1}|x_t)]-\log p_\theta(x_0|x_1)\right] &&(KL散度定义)
\end{aligned}
\tag{Eq.4}
\]
Eq.4和附件中的公式(5)结果一致。综上我们通过推导最大似然估计的损失直到使用KL散度判断\(q,p\)的相似度完成训练目标。在这里我们额外提一下第二项中\(q(x_{t-1}|x_t,x_0)\),根据马尔科夫链的性质,这里的\(x_0\)应该是可以去除的,但是论文保留了,为了结果一致性我们也保留了\(x_0\)
至此我们完成了background所有内容的介绍,对ddpm论文阅读基本没有太多障碍了。在扩散模型推导前置文章中我们介绍了附件公式(1)(2)(3)左边的内容,在前向过程推导文章中我们介绍了附件公式(2)右(4)的由来,在逆向过程推导文章中我们介绍了附件公式(6)(7)。最终在本文介绍完了附件公式(3)和(5)。
附件(原文background)


浙公网安备 33010602011771号