微积分笔记05:矩阵求导在深度学习中的应用
5.1 算法简述
设存在一张像素大小为\(\sqrt n \times \sqrt n\)的样本图片,即该图片总像素个数\(=n\)
现需采用神经网络对其进行识别,过程如下:
(1)生成向量\(X_{1\times n}\):
设存在向量\(X_{1\times n}\),则可将图片的\(n\)个像素存入向量\(X\),记为:
\[\tag{1}
X_{1\times n}=
\begin {bmatrix}
x_1&x_2&x_3&...&x_n
\end {bmatrix}
\]
(2)生成隐层:
生成隐层需将矩阵X变换为t列的矩阵Y(一般地,\(t=512\))
设存在矩阵\(W_{n\times t}\),则有:
\[\tag{2}
Y_{1\times t}=X_{1 \times n}\cdot W_{n \times t}
=
\begin {bmatrix}
y_1&y_2&y_3&...&y_t
\end {bmatrix}
\]
设存在函数:
\[relu(x)=
\begin{cases}
x&x\geq 0\\
0&x<0
\end{cases}
\]
采用\(relu()\)函数对矩阵\(Y\)进行处理,得到矩阵\(Z\):
\[\tag{3}
Z_{1\times t}=relu(Y)
\]
(3)创建分类
设样本图片共有\(r\)种可匹配的图片类型,且存在矩阵\(O_{1\times r}\)、\(W'_{t \times r}\)
则有:
\[\tag{4}
O_{1\times r}=Z_{1\times t} \cdot W'_{t \times r}
=
\begin{bmatrix}
o_1&o_2&o_3&...&o_r
\end{bmatrix}
\]
(4)生成概率分布
为统计样本图片与每一种图片类型匹配的概率,可使用\(softmax()\)函数对矩阵\(O_{1\times r}\)进行处理,得到对应的概率分布矩阵\(S_{1\times r}\):
\[\tag{5}
O_{1\times r}=
\begin{bmatrix}
o_1&o_2&o_3&...&o_r
\end{bmatrix}
\stackrel{softmax()}
\Longrightarrow
S_{1\times r}=
\begin{bmatrix}
s_1&s_2&s_3&...&s_r
\end{bmatrix}
\]
对于矩阵\(S_{1\times r}\)中的任一元素\(s_i\),有:
\[\tag{6}
0<s_i=softmax(o_i)<100\%
\]
\[(i=1,2,3,...,r)
\]
且由概率分布性质可知:
\[\tag{7}
\Sigma_{i=1}^r s_i=100\%
\]
5.2 偏导数的求解
对于5.1中的算法过程,设采用\(m\)张不同的样本图片进行训练后得到某一概率值\(s_k\),且\(s_k\)为矩阵S中的最大概率值,则矩阵O中的图片类型\(o_k\)即为样本图片的最佳匹配类型。
若需实现对任意图片的识别,则需通过训练得到的\(s_k\)、\(o_k\)进一步对矩阵\(W、W'\)进行求解,使矩阵\(W、W'\)能够对任意图片的像素矩阵进行有效变换。
对于5.1 式(2)、式(4)中的矩阵\(W_{n\times 512}、W'_{512\times r}\),设存在以下函数\(J\):
\[\tag{8}
J(W,W')
\]
则由梯度相关概念可知,可求解函数\(J\)对矩阵\(W、W'\)的偏导数得到相应梯度,再由梯度进一步对\(W、W'\)进行求解。
而训练得到的\(s_k\)、\(o_k\)所属的向量S、向量O均由\(W、W'\)变换得到,故需先在S、O中求偏导数。
5.2.1 偏导数求解阶段1:求偏导数\(\frac{\alpha J}{\alpha O}\)
由\(softmax()\)算法及概率的基本概念可得:
\[\tag{9}
s_k=\frac{e^{o_k}}{\Sigma_{i=1}^r e^{o_i}}
\]
设:
\[\tag{10}
J=-ln(s_k)
\]
\[\qquad\quad
=-ln(\frac{e^{o_k}}{\Sigma_{i=1}^r e^{o_i}})
\]
\[\qquad\qquad\qquad
=ln(\Sigma_{i=1}^r e^{o_i})-ln(e^{o_k})
\]
则有:
\[\frac{\alpha J}{\alpha O}
\]
\[\qquad
=\frac{\alpha [ln(\Sigma_{i=1}^r e^{o_i})-ln(e^{o_k})]}{\alpha O}
\]
\[=
\frac{(\Sigma_{i=1}^r e^{o_i})_O'}{\Sigma_{i=1}^r e^{o_i}}-(o_k)'_O
\]
\[\qquad\qquad\qquad\qquad\qquad
=
\begin{bmatrix}
s_1-0&s_2-0&...&s_k-1&...&s_r-0
\end{bmatrix}
\]
\[\tag{11}
\qquad\qquad\quad
=
\begin{bmatrix}
s_1&s_2&...&s_k-1&...&s_r
\end{bmatrix}
\]
5.2.2 偏导数求解阶段2:求偏导数\(\frac{\alpha J}{\alpha Z}\)、\(\frac{\alpha J}{\alpha W'}\)
在求解偏导数\(\frac{\alpha J}{\alpha O}\)的基础上可进一步回溯,对\(\frac{\alpha J}{\alpha Z}\)、\(\frac{\alpha J}{\alpha W'}\)进行求解。
设:
\[矩阵W'_{t\times r}=
\begin {bmatrix}
w_{11}&w_{12}&w_{13}&...&w_{1r}\\
w_{21}&w_{22}&w_{23}&...&w_{2r}\\
w_{31}&w_{32}&w_{33}&...&w_{3r}\\
&&&...\\
w_{t1}&w_{t2}&w_{t3}&...&w_{tr}\\
\end{bmatrix}
\]
由5.1的式(4)\(O_{1\times r}=Z_{1\times t}\cdot W'_{t \times r}\)可知:
\[\tag{12}
o_1=\sum^t_{i=1}z_i \cdot w_{i1},o_2=\sum^t_{i=1}z_i \cdot w_{i2},......,o_r=\sum^t_{i=1}z_i \cdot w_{ir}
\]
\[(其中,t为矩阵W'的行数)
\]
对偏导数\(\frac{\alpha J}{\alpha Z}\)的求解过程如下:
\[\frac{\alpha J}{\alpha Z}=\frac{\alpha J}{\alpha O}\cdot \frac{\alpha O}{\alpha Z}=\frac{\alpha J}{\alpha O}\cdot
\begin {bmatrix}
(o_1)'_{z_1}&(o_1)'_{z_2}&(o_1)'_{z_3}&...&&(o_1)'_{z_{512}}\\
(o_2)'_{z_1}&(o_2)'_{z_2}&(o_2)'_{z_3}&...&&(o_2)'_{z_{512}}\\
(o_3)'_{z_1}&(o_3)'_{z_2}&(o_3)'_{z_3}&...&&(o_3)'_{z_{512}}\\
&&&...\\
(o_r)'_{z_1}&(o_r)'_{z_2}&(o_r)'_{z_3}&...&&(o_r)'_{z_{512}}\\
\end{bmatrix}
\]
则有:
\[\qquad\qquad\qquad\qquad\qquad\qquad
\frac{\alpha J}{\alpha Z}=
\frac{\alpha J}{\alpha O}\cdot
\begin {bmatrix}
(o_1)'_{z_1}&(o_1)'_{z_2}&(o_1)'_{z_3}&...&&(o_1)'_{z_t}\\
(o_2)'_{z_1}&(o_2)'_{z_2}&(o_2)'_{z_3}&...&&(o_2)'_{z_t}\\
(o_3)'_{z_1}&(o_3)'_{z_2}&(o_3)'_{z_3}&...&&(o_3)'_{z_t}\\
&&&...\\
(o_r)'_{z_1}&(o_r)'_{z_2}&(o_r)'_{z_e}&...&&(o_r)'_{z_t}\\
\end{bmatrix}
\]
\[\qquad\qquad\qquad\qquad\qquad
=
\frac{\alpha J}{\alpha O}\cdot
\begin {bmatrix}
w_{11}&w_{21}&w_{31}&...&w_{t1}\\
w_{12}&w_{22}&w_{32}&...&w_{t2}\\
w_{13}&w_{23}&w_{33}&...&w_{t3}\\
&&&...\\
w_{1r}&w_{2r}&w_{3r}&...&w_{tr}
\end{bmatrix}
\]
\[\tag{13}
=\frac{\alpha J}{\alpha O}\cdot W'^T
\]
对偏导数\(\frac{\alpha J}{\alpha W'}\)的求解过程如下:
\[\frac{\alpha J}{\alpha W'}=\frac{\alpha J}{\alpha O}\cdot \frac{\alpha O}{\alpha W'}=\frac{\alpha J}{\alpha O}\cdot
\begin {bmatrix}
O'_{w_{11}}&O'_{w_{12}}&O'_{w_{13}}&...&&O'_{w_{1r}}\\
O'_{w_{21}}&O'_{w_{22}}&O'_{w_{23}}&...&&O'_{w_{2r}}\\
O'_{w_{31}}&O'_{w_{32}}&O'_{w_{33}}&...&&O'_{w_{3r}}\\
&&&...\\
O'_{w_{t1}}&O'_{w_{t2}}&O'_{w_{t3}}&...&&O'_{w_{tr}}\\
\end{bmatrix}
\]
由式(12),舍去求导后为0的项,可得:
\[\qquad\qquad\qquad\qquad\qquad\qquad
\frac{\alpha J}{\alpha W'}=
\frac{\alpha J}{\alpha O}\cdot
\begin {bmatrix}
{(o_1)}'_{w_{11}}&{(o_2)}'_{w_{12}}&{(o_3)}'_{w_{13}}&...&&{(o_r)}'_{w_{1r}}\\
{(o_1)}'_{w_{21}}&{(o_2)}'_{w_{22}}&{(o_3)}'_{w_{23}}&...&&{(o_r)}'_{w_{2r}}\\
{(o_1)}'_{w_{31}}&{(o_2)}'_{w_{32}}&{(o_3)}'_{w_{33}}&...&&{(o_r)}'_{w_{3r}}\\
&&&...\\
{(o_1)}'_{w_{t1}}&{(o_2)}'_{w_{t2}}&{(o_3)}'_{w_{t3}}&...&&{(o_r)}'_{w_{tr}}
\end{bmatrix}
\]
\[\qquad\qquad\qquad\qquad
=
\frac{\alpha J}{\alpha O}\cdot
\begin {bmatrix}
z_1&z_1&z_1&...&z_1\\
z_2&z_2&z_2&...&z_2\\
z_3&z_3&z_3&...&z_3\\
&&&...\\
z_t&z_t&z_t&...&z_t\\
\end{bmatrix}
\]
\[\qquad\qquad\qquad\qquad\qquad\qquad
=
\begin {bmatrix}
z_1&z_1&z_1&...&z_1\\
z_2&z_2&z_2&...&z_2\\
z_3&z_3&z_3&...&z_3\\
&&&...\\
z_t&z_t&z_t&...&z_t\\
\end{bmatrix}_{t\times r}
\cdot
\begin{bmatrix}
s_1\\
s_2\\
...\\
s_k-1\\
...\\
s_r
\end{bmatrix}_{r\times 1}
\]
\[\qquad\qquad\qquad\qquad\qquad\qquad\qquad
=
\begin{bmatrix}
z_1\\
z_2\\
z_3\\
...\\
z_t
\end{bmatrix}_{t\times 1}
\cdot
\begin{bmatrix}
s_1&
s_2&
...&
s_k-1&
...&
s_r
\end{bmatrix}_{1\times r}
\]
\[\tag{14}
=Z^T \cdot \frac{\alpha J}{\alpha O}
\]
5.3.3 偏导数求解阶段3:求解偏导数\(\frac{\alpha J}{\alpha Y}、\frac{\alpha J}{\alpha X}、\frac{\alpha J}{\alpha W}\)
对偏导数\(\frac{\alpha J}{\alpha Y}\)的求解过程如下:
对于偏导数\(\frac{\alpha J}{\alpha Y}\)的求解,可通过\(\frac{\alpha J}{\alpha Z}\)进行回溯,由5.1 式(3):\(Z=relu(Y)\)可得:
\[\tag{15}
z_i=relu(y_i)
\]
则有:
\[\tag{16}
\frac{\alpha z_i}{\alpha y_i}=
\begin {cases}
1,y_i\geq 0\\
0,y_i<0
\end{cases}
\]
\[(i=1,2,3,...,t)
\]
对偏导数\(\frac{\alpha J}{\alpha X}、\frac{\alpha J}{\alpha W}\)的求解过程如下:
对于偏导数\(\frac{\alpha J}{\alpha X}、\frac{\alpha J}{\alpha W}\)的求解,可在偏导数\(\frac{\alpha J}{\alpha Z}、\frac{\alpha J}{\alpha W'}\)的基础上可进一步回溯,设:
\[矩阵W_{n\times t}=
\begin {bmatrix}
w_{11}&w_{12}&w_{13}&...&w_{1t}\\
w_{21}&w_{22}&w_{23}&...&w_{2t}\\
w_{31}&w_{32}&w_{33}&...&w_{3t}\\
&&&...\\
w_{n1}&w_{n2}&w_{n3}&...&w_{nt}\\
\end{bmatrix}
\]
由5.1 式(2)\(Y_{1\times t}=X_{1 \times n}\cdot W_{n \times t}\)可知:
\[\tag{17}
y_1=\Sigma_{i=1}^n x_i \cdot w_{i1},y_2=\Sigma_{i=1}^n x_i \cdot w_{i2},...,y_t=\Sigma_{i=1}^n x_i \cdot w_{it}
\]
则可由5.3.2 式(13)、式(14)进行回溯,得:
\[\tag{18}
\frac{\alpha J}{\alpha X}=\frac{\alpha J}{\alpha Y} \cdot W^T
\]
\[\tag{19}
\frac{\alpha J}{\alpha W}=X^T\cdot\frac{\alpha J}{\alpha Y}
\]
则有:
\[\tag{20}
\frac{\alpha J}{\alpha Y} = \frac{\alpha J}{\alpha Z} \cdot \frac{\alpha Z}{\alpha Y}=\frac{\alpha J}{\alpha O}\cdot
\begin {bmatrix}
(o_1)'_{z_1}\cdot(z_1)'_{y_1}&(o_1)'_{z_2}\cdot(z_2)'_{y_2}&(o_1)'_{z_3}\cdot(z_3)'_{y_3}&...&&(o_1)'_{z_t}\cdot(z_t)'_{y_t}\\
(o_2)'_{z_1}\cdot(z_1)'_{y_1}&(o_2)'_{z_2}\cdot(z_2)'_{y_2}&(o_2)'_{z_3}\cdot(z_3)'_{y_3}&...&&(o_2)'_{z_t}\cdot(z_t)'_{y_t}\\
(o_3)'_{z_1}\cdot(z_1)'_{y_1}&(o_3)'_{z_2}\cdot(z_2)'_{y_2}&(o_3)'_{z_3}\cdot(z_3)'_{y_3}&...&&(o_3)'_{z_t}\cdot(z_t)'_{y_t}\\
&&&...\\
(o_r)'_{z_1}\cdot(z_1)'_{y_1}&(o_r)'_{z_2}\cdot(z_2)'_{y_2}&(o_r)'_{z_3}\cdot(z_3)'_{y_3}&...&&(o_r)'_{z_t}\cdot(z_t)'_{y_t}\\
\end{bmatrix}
\]
其中:
\[(z_i)'_{y_i}=
\frac{\alpha z_i}{\alpha y_i}=
\begin {cases}
1,y_i\geq 0\\
0,y_i<0
\end{cases}
\]
\[(i=1,2,3,...,t)
\]