平方损失函数为例的BP的关键公式推导

看了刘建平老师的博客https://www.cnblogs.com/pinard/p/6422831.html对如下其中两个公式进行详细推导
损失函数为(大写字母为矩阵,小写字母字母加粗为列向量,其中\(W^L\)的维度为\(M_L*M_{L-1}\),即第\(L\)层神经元个数乘以第\(L-1\)层神经元个数):

\[J(W, \mathbf{b}, \mathbf {x}, \mathbf{y})=\frac{1}{2}\left\|\mathbf{a}^{L}-\mathbf{y}\right\|_{2}^{2}=\frac{1}{2}\left\|\sigma\left(W^{L} \mathbf {a}^{L-1}+\mathbf {b}^{L}\right)-\mathbf {y}\right\|_{2}^{2} \]

推导的两个公式如下:

\[\frac{\partial J(W, \mathbf{b}, \mathbf {x}, \mathbf{y})}{\partial W^{L}}=\left[\left(\mathbf {a}^{L}-\mathbf {y}\right) \odot \sigma^{\prime}\left(\mathbf {z}^{L}\right)\right]\left(\mathbf {a}^{L-1}\right)^{T} \]

\[\frac{\partial J(W, \mathbf{b}, \mathbf {x}, \mathbf{y})}{\partial \mathbf {b}^{L}}=\left(\mathbf {a}^{L}-\mathbf {y}\right) \odot \sigma^{\prime}\left(\mathbf{z}^{L}\right) \]

用到两个链式求导法则如下(都来源于刘建平老师博客,链接在文章末尾)
当标量对n个向量进行链式求导,即\(\mathbf{y}_{1} \rightarrow \mathbf{y}_{2} \rightarrow \ldots \rightarrow \mathbf{y}_{\mathbf{n}} \rightarrow z_{1}\),链式求导法则如下:

\[\begin{equation}\frac{\partial z}{\partial \mathbf{y}_{\mathbf{1}}}=\left(\frac{\partial \mathbf{y}_{\mathbf{n}}}{\partial \mathbf{y}_{\mathbf{n}-\mathbf{1}}} \frac{\partial \mathbf{y}_{\mathbf{n}-\mathbf{1}}}{\partial \mathbf{y}_{\mathbf{n}-\mathbf{2}}} \ldots \frac{\partial \mathbf{y}_{\mathbf{2}}}{\partial \mathbf{y}_{\mathbf{1}}}\right)^{T} \frac{\partial z}{\partial \mathbf{y}_{\mathbf{n}}}\tag {1}\end{equation} \]

\(z=f(\mathbf {y}),\mathbf {y}=X\mathbf {a}+\mathbf {b}\)\(X\rightarrow \mathbf{y}\rightarrow z\) 其中\(X\)为矩阵,\(\mathbf {y}\)为向量,链式求导结果如下:

\[\begin{equation}\frac{\partial z}{\partial X}=\frac {\partial z}{\partial {\mathbf{y}}}a^T\tag {2}\end{equation} \]

先推导第一个公式,考虑如下复合结构(注意最后所求的\(J\)是标量)

\[W^L\rightarrow \mathbf{z}^L\rightarrow \mathbf{u}^L\rightarrow J \]

其中$$J=\frac{1}{2}\Vert \mathbf{u}^L \Vert_2^2 $$

\[\mathbf{u}^L=\mathbf{a}^L-\mathbf{y}=\sigma (\mathbf{z}^L)-\mathbf{y} \]

\[\mathbf{z}^L=W^L\mathbf{a}^{L-1}+\mathbf{b}^L \]

由公式\((2)\)可得

\[\frac{\partial J}{\partial W^{L}}=\frac{\partial J}{\partial \mathbf {z}^{L}}(\mathbf a^{L-1})^T \]

又有公式\((1)\)可得

\[\frac{\partial J}{\partial \mathbf {z}^{L}}=(\frac {\partial \mathbf{u}^L}{\partial \mathbf{z}^L})^T\frac {\partial J}{\partial \mathbf{u}^L} \]

其中后半部分比较简单

\[\frac {\partial J}{\partial \mathbf{u}^L}=\mathbf {u}^L=\mathbf{a}^L-\mathbf{y} \]

前半部分向量对向量求导,布局为雅克比矩阵形式,结果如下:

\[\frac{\partial \mathbf{u}^{L}}{\partial \mathbf{z}^{L}}=\frac{\partial\left(\sigma\left(\mathbf{z}^{L}\right)-\mathbf{y}\right)}{\partial \mathbf{z}^{L}}=\left(\begin{array}{lllc} \frac{\partial \sigma\left(z_{1}^{L}\right)}{\partial z_{1}^{L}} & 0 & \cdots & 0 \\ 0 & \frac{\partial \sigma\left(z_{2}^{L}\right)}{\partial z_{2}^{L}} & \cdots & 0 \\ \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & \cdots & \frac{\partial \sigma\left(z_{M_{L}}^{L}\right)}{\partial z_{M_{L}}^{L}} \end{array}\right)=\operatorname{diag}\left(\sigma^{\prime}\left(\mathbf{z}^{L}\right)\right) \]

得到的矩阵为对称矩阵,带入到上式,结果为:

\[\frac{\partial J}{\partial \mathbf {z}^{L}}=\operatorname{diag}\left(\sigma^{\prime}\left(\mathbf{z}^{L}\right)\right)(\mathbf{a}^L-\mathbf{y})=\sigma^{\prime}\left(\mathbf{z}^{L}\right)\odot (\mathbf{a}^L-\mathbf{y})=(\mathbf{a}^L-\mathbf{y})\odot\sigma^{\prime}\left(\mathbf{z}^{L}\right) \]

\[\frac{\partial J}{\partial W^{L}}=\left[\left(\mathbf {a}^{L}-\mathbf {y}\right) \odot \sigma^{\prime}\left(\mathbf {z}^{L}\right)\right]\left(\mathbf {a}^{L-1}\right)^{T} \]

第二个式子推导就很简单,由公式\((1)\)可得

\[\frac {\partial J}{\partial \mathbf{b}^L}=(\frac {\partial \mathbf{z}^L}{\partial \mathbf{b}^L})^T\frac {\partial J}{\partial \mathbf{z}^L} \]

前半部分为单位矩阵\(E\)后半部分求第一个式子时已经求过,故

\[\frac{\partial J}{\partial \mathbf {b}^{L}}=\left(\mathbf {a}^{L}-\mathbf {y}\right) \odot \sigma^{\prime}\left(\mathbf{z}^{L}\right) \]

参考博客(矩阵向量求导的知识):
求导定义与布局
矩阵向量求导之定义法
矩阵向量求导之微分法法
矩阵向量求导链式法则

posted @ 2022-05-30 16:26  lcxxx  Views(288)  Comments(0)    收藏  举报