微积分笔记05:矩阵求导在深度学习中的应用

微积分笔记05:矩阵求导在深度学习中的应用

5.1 算法简述

设存在一张像素大小为\(\sqrt n \times \sqrt n\)的样本图片,即该图片总像素个数\(=n\)

现需采用神经网络对其进行识别,过程如下:

(1)生成向量\(X_{1\times n}\)

设存在向量\(X_{1\times n}\),则可将图片的\(n\)个像素存入向量\(X\),记为:

\[\tag{1} X_{1\times n}= \begin {bmatrix} x_1&x_2&x_3&...&x_n \end {bmatrix} \]

(2)生成隐层:

生成隐层需将矩阵X变换为t列的矩阵Y(一般地,\(t=512\)

设存在矩阵\(W_{n\times t}\),则有:

\[\tag{2} Y_{1\times t}=X_{1 \times n}\cdot W_{n \times t} = \begin {bmatrix} y_1&y_2&y_3&...&y_t \end {bmatrix} \]

设存在函数:

\[relu(x)= \begin{cases} x&x\geq 0\\ 0&x<0 \end{cases} \]

采用\(relu()\)函数对矩阵\(Y\)进行处理,得到矩阵\(Z\)

\[\tag{3} Z_{1\times t}=relu(Y) \]

(3)创建分类

设样本图片共有\(r\)种可匹配的图片类型,且存在矩阵\(O_{1\times r}\)\(W'_{t \times r}\)

则有:

\[\tag{4} O_{1\times r}=Z_{1\times t} \cdot W'_{t \times r} = \begin{bmatrix} o_1&o_2&o_3&...&o_r \end{bmatrix} \]

(4)生成概率分布

为统计样本图片与每一种图片类型匹配的概率,可使用\(softmax()\)函数对矩阵\(O_{1\times r}\)进行处理,得到对应的概率分布矩阵\(S_{1\times r}\)

\[\tag{5} O_{1\times r}= \begin{bmatrix} o_1&o_2&o_3&...&o_r \end{bmatrix} \stackrel{softmax()} \Longrightarrow S_{1\times r}= \begin{bmatrix} s_1&s_2&s_3&...&s_r \end{bmatrix} \]

对于矩阵\(S_{1\times r}\)中的任一元素\(s_i\),有:

\[\tag{6} 0<s_i=softmax(o_i)<100\% \]

\[(i=1,2,3,...,r) \]

且由概率分布性质可知:

\[\tag{7} \Sigma_{i=1}^r s_i=100\% \]

5.2 偏导数的求解

对于5.1中的算法过程,设采用\(m\)张不同的样本图片进行训练后得到某一概率值\(s_k\),且\(s_k\)为矩阵S中的最大概率值,则矩阵O中的图片类型\(o_k\)即为样本图片的最佳匹配类型。

若需实现对任意图片的识别,则需通过训练得到的\(s_k\)\(o_k\)进一步对矩阵\(W、W'\)进行求解,使矩阵\(W、W'\)能够对任意图片的像素矩阵进行有效变换。

对于5.1 式(2)、式(4)中的矩阵\(W_{n\times 512}、W'_{512\times r}\),设存在以下函数\(J\)

\[\tag{8} J(W,W') \]

则由梯度相关概念可知,可求解函数\(J\)对矩阵\(W、W'\)的偏导数得到相应梯度,再由梯度进一步对\(W、W'\)进行求解。

而训练得到的\(s_k\)\(o_k\)所属的向量S、向量O均由\(W、W'\)变换得到,故需先在S、O中求偏导数。

5.2.1 偏导数求解阶段1:求偏导数\(\frac{\alpha J}{\alpha O}\)

\(softmax()\)算法及概率的基本概念可得:

\[\tag{9} s_k=\frac{e^{o_k}}{\Sigma_{i=1}^r e^{o_i}} \]

设:

\[\tag{10} J=-ln(s_k) \]

\[\qquad\quad =-ln(\frac{e^{o_k}}{\Sigma_{i=1}^r e^{o_i}}) \]

\[\qquad\qquad\qquad =ln(\Sigma_{i=1}^r e^{o_i})-ln(e^{o_k}) \]

则有:

\[\frac{\alpha J}{\alpha O} \]

\[\qquad =\frac{\alpha [ln(\Sigma_{i=1}^r e^{o_i})-ln(e^{o_k})]}{\alpha O} \]

\[= \frac{(\Sigma_{i=1}^r e^{o_i})_O'}{\Sigma_{i=1}^r e^{o_i}}-(o_k)'_O \]

\[\qquad\qquad\qquad\qquad\qquad = \begin{bmatrix} s_1-0&s_2-0&...&s_k-1&...&s_r-0 \end{bmatrix} \]

\[\tag{11} \qquad\qquad\quad = \begin{bmatrix} s_1&s_2&...&s_k-1&...&s_r \end{bmatrix} \]

5.2.2 偏导数求解阶段2:求偏导数\(\frac{\alpha J}{\alpha Z}\)\(\frac{\alpha J}{\alpha W'}\)

在求解偏导数\(\frac{\alpha J}{\alpha O}\)的基础上可进一步回溯,对\(\frac{\alpha J}{\alpha Z}\)\(\frac{\alpha J}{\alpha W'}\)进行求解。

设:

\[矩阵W'_{t\times r}= \begin {bmatrix} w_{11}&w_{12}&w_{13}&...&w_{1r}\\ w_{21}&w_{22}&w_{23}&...&w_{2r}\\ w_{31}&w_{32}&w_{33}&...&w_{3r}\\ &&&...\\ w_{t1}&w_{t2}&w_{t3}&...&w_{tr}\\ \end{bmatrix} \]

由5.1的式(4)\(O_{1\times r}=Z_{1\times t}\cdot W'_{t \times r}\)可知:

\[\tag{12} o_1=\sum^t_{i=1}z_i \cdot w_{i1},o_2=\sum^t_{i=1}z_i \cdot w_{i2},......,o_r=\sum^t_{i=1}z_i \cdot w_{ir} \]

\[(其中,t为矩阵W'的行数) \]

对偏导数\(\frac{\alpha J}{\alpha Z}\)的求解过程如下:

\[\frac{\alpha J}{\alpha Z}=\frac{\alpha J}{\alpha O}\cdot \frac{\alpha O}{\alpha Z}=\frac{\alpha J}{\alpha O}\cdot \begin {bmatrix} (o_1)'_{z_1}&(o_1)'_{z_2}&(o_1)'_{z_3}&...&&(o_1)'_{z_{512}}\\ (o_2)'_{z_1}&(o_2)'_{z_2}&(o_2)'_{z_3}&...&&(o_2)'_{z_{512}}\\ (o_3)'_{z_1}&(o_3)'_{z_2}&(o_3)'_{z_3}&...&&(o_3)'_{z_{512}}\\ &&&...\\ (o_r)'_{z_1}&(o_r)'_{z_2}&(o_r)'_{z_3}&...&&(o_r)'_{z_{512}}\\ \end{bmatrix} \]

则有:

\[\qquad\qquad\qquad\qquad\qquad\qquad \frac{\alpha J}{\alpha Z}= \frac{\alpha J}{\alpha O}\cdot \begin {bmatrix} (o_1)'_{z_1}&(o_1)'_{z_2}&(o_1)'_{z_3}&...&&(o_1)'_{z_t}\\ (o_2)'_{z_1}&(o_2)'_{z_2}&(o_2)'_{z_3}&...&&(o_2)'_{z_t}\\ (o_3)'_{z_1}&(o_3)'_{z_2}&(o_3)'_{z_3}&...&&(o_3)'_{z_t}\\ &&&...\\ (o_r)'_{z_1}&(o_r)'_{z_2}&(o_r)'_{z_e}&...&&(o_r)'_{z_t}\\ \end{bmatrix} \]

\[\qquad\qquad\qquad\qquad\qquad = \frac{\alpha J}{\alpha O}\cdot \begin {bmatrix} w_{11}&w_{21}&w_{31}&...&w_{t1}\\ w_{12}&w_{22}&w_{32}&...&w_{t2}\\ w_{13}&w_{23}&w_{33}&...&w_{t3}\\ &&&...\\ w_{1r}&w_{2r}&w_{3r}&...&w_{tr} \end{bmatrix} \]

\[\tag{13} =\frac{\alpha J}{\alpha O}\cdot W'^T \]

对偏导数\(\frac{\alpha J}{\alpha W'}\)的求解过程如下:

\[\frac{\alpha J}{\alpha W'}=\frac{\alpha J}{\alpha O}\cdot \frac{\alpha O}{\alpha W'}=\frac{\alpha J}{\alpha O}\cdot \begin {bmatrix} O'_{w_{11}}&O'_{w_{12}}&O'_{w_{13}}&...&&O'_{w_{1r}}\\ O'_{w_{21}}&O'_{w_{22}}&O'_{w_{23}}&...&&O'_{w_{2r}}\\ O'_{w_{31}}&O'_{w_{32}}&O'_{w_{33}}&...&&O'_{w_{3r}}\\ &&&...\\ O'_{w_{t1}}&O'_{w_{t2}}&O'_{w_{t3}}&...&&O'_{w_{tr}}\\ \end{bmatrix} \]

由式(12),舍去求导后为0的项,可得:

\[\qquad\qquad\qquad\qquad\qquad\qquad \frac{\alpha J}{\alpha W'}= \frac{\alpha J}{\alpha O}\cdot \begin {bmatrix} {(o_1)}'_{w_{11}}&{(o_2)}'_{w_{12}}&{(o_3)}'_{w_{13}}&...&&{(o_r)}'_{w_{1r}}\\ {(o_1)}'_{w_{21}}&{(o_2)}'_{w_{22}}&{(o_3)}'_{w_{23}}&...&&{(o_r)}'_{w_{2r}}\\ {(o_1)}'_{w_{31}}&{(o_2)}'_{w_{32}}&{(o_3)}'_{w_{33}}&...&&{(o_r)}'_{w_{3r}}\\ &&&...\\ {(o_1)}'_{w_{t1}}&{(o_2)}'_{w_{t2}}&{(o_3)}'_{w_{t3}}&...&&{(o_r)}'_{w_{tr}} \end{bmatrix} \]

\[\qquad\qquad\qquad\qquad = \frac{\alpha J}{\alpha O}\cdot \begin {bmatrix} z_1&z_1&z_1&...&z_1\\ z_2&z_2&z_2&...&z_2\\ z_3&z_3&z_3&...&z_3\\ &&&...\\ z_t&z_t&z_t&...&z_t\\ \end{bmatrix} \]

\[\qquad\qquad\qquad\qquad\qquad\qquad = \begin {bmatrix} z_1&z_1&z_1&...&z_1\\ z_2&z_2&z_2&...&z_2\\ z_3&z_3&z_3&...&z_3\\ &&&...\\ z_t&z_t&z_t&...&z_t\\ \end{bmatrix}_{t\times r} \cdot \begin{bmatrix} s_1\\ s_2\\ ...\\ s_k-1\\ ...\\ s_r \end{bmatrix}_{r\times 1} \]

\[\qquad\qquad\qquad\qquad\qquad\qquad\qquad = \begin{bmatrix} z_1\\ z_2\\ z_3\\ ...\\ z_t \end{bmatrix}_{t\times 1} \cdot \begin{bmatrix} s_1& s_2& ...& s_k-1& ...& s_r \end{bmatrix}_{1\times r} \]

\[\tag{14} =Z^T \cdot \frac{\alpha J}{\alpha O} \]

5.3.3 偏导数求解阶段3:求解偏导数\(\frac{\alpha J}{\alpha Y}、\frac{\alpha J}{\alpha X}、\frac{\alpha J}{\alpha W}\)

对偏导数\(\frac{\alpha J}{\alpha Y}\)的求解过程如下:

对于偏导数\(\frac{\alpha J}{\alpha Y}\)的求解,可通过\(\frac{\alpha J}{\alpha Z}\)进行回溯,由5.1 式(3):\(Z=relu(Y)\)可得:

\[\tag{15} z_i=relu(y_i) \]

则有:

\[\tag{16} \frac{\alpha z_i}{\alpha y_i}= \begin {cases} 1,y_i\geq 0\\ 0,y_i<0 \end{cases} \]

\[(i=1,2,3,...,t) \]

对偏导数\(\frac{\alpha J}{\alpha X}、\frac{\alpha J}{\alpha W}\)的求解过程如下:

对于偏导数\(\frac{\alpha J}{\alpha X}、\frac{\alpha J}{\alpha W}\)的求解,可在偏导数\(\frac{\alpha J}{\alpha Z}、\frac{\alpha J}{\alpha W'}\)的基础上可进一步回溯,设:

\[矩阵W_{n\times t}= \begin {bmatrix} w_{11}&w_{12}&w_{13}&...&w_{1t}\\ w_{21}&w_{22}&w_{23}&...&w_{2t}\\ w_{31}&w_{32}&w_{33}&...&w_{3t}\\ &&&...\\ w_{n1}&w_{n2}&w_{n3}&...&w_{nt}\\ \end{bmatrix} \]

由5.1 式(2)\(Y_{1\times t}=X_{1 \times n}\cdot W_{n \times t}\)可知:

\[\tag{17} y_1=\Sigma_{i=1}^n x_i \cdot w_{i1},y_2=\Sigma_{i=1}^n x_i \cdot w_{i2},...,y_t=\Sigma_{i=1}^n x_i \cdot w_{it} \]

则可由5.3.2 式(13)、式(14)进行回溯,得:

\[\tag{18} \frac{\alpha J}{\alpha X}=\frac{\alpha J}{\alpha Y} \cdot W^T \]

\[\tag{19} \frac{\alpha J}{\alpha W}=X^T\cdot\frac{\alpha J}{\alpha Y} \]

则有:

\[\tag{20} \frac{\alpha J}{\alpha Y} = \frac{\alpha J}{\alpha Z} \cdot \frac{\alpha Z}{\alpha Y}=\frac{\alpha J}{\alpha O}\cdot \begin {bmatrix} (o_1)'_{z_1}\cdot(z_1)'_{y_1}&(o_1)'_{z_2}\cdot(z_2)'_{y_2}&(o_1)'_{z_3}\cdot(z_3)'_{y_3}&...&&(o_1)'_{z_t}\cdot(z_t)'_{y_t}\\ (o_2)'_{z_1}\cdot(z_1)'_{y_1}&(o_2)'_{z_2}\cdot(z_2)'_{y_2}&(o_2)'_{z_3}\cdot(z_3)'_{y_3}&...&&(o_2)'_{z_t}\cdot(z_t)'_{y_t}\\ (o_3)'_{z_1}\cdot(z_1)'_{y_1}&(o_3)'_{z_2}\cdot(z_2)'_{y_2}&(o_3)'_{z_3}\cdot(z_3)'_{y_3}&...&&(o_3)'_{z_t}\cdot(z_t)'_{y_t}\\ &&&...\\ (o_r)'_{z_1}\cdot(z_1)'_{y_1}&(o_r)'_{z_2}\cdot(z_2)'_{y_2}&(o_r)'_{z_3}\cdot(z_3)'_{y_3}&...&&(o_r)'_{z_t}\cdot(z_t)'_{y_t}\\ \end{bmatrix} \]

其中:

\[(z_i)'_{y_i}= \frac{\alpha z_i}{\alpha y_i}= \begin {cases} 1,y_i\geq 0\\ 0,y_i<0 \end{cases} \]

\[(i=1,2,3,...,t) \]

posted @ 2025-03-14 21:49  nafe  阅读(53)  评论(0)    收藏  举报