Softmax偏导及BP过程的推导
Softmax求导
其实BP过程在pytorch中可以自动进行,这里进行推导只是强迫症
A
Apart证明softmax求导和softmax的BP过程
本来像手打公式的,想想还是算了,引用部分给出latex公式说明。
A.1
softmax导数

A.2
softmax梯度下降

B
基本上都是拾人牙慧,在此给出引用和参考。
参考:
\(引用几个定理B.15和B.16\)
\((B.15)\)
\[
\begin{aligned}
& \vec{x} \in k^{M \times 1}, y \in R, \vec{z} \in R^{N \times 1},\quad 则: \\
& \frac{\partial y \vec{z}}{\partial \vec{x}}=y \frac{\partial \vec{z}}{\partial \vec{x}}+\frac{\partial y}{\partial \vec{x}} \cdot \vec{z}^{\top} \in R^{M \times N}
\end{aligned}
\]
\[\begin{aligned}
& \text{[证明]:} \\
& dy\vec{z} \\
& =d y \cdot \vec{z}+y \cdot d \vec{z} \\
&=\vec{z} \cdot d y+y \cdot d \vec{z} \\
&=\vec{z} \cdot \left(\frac{\partial y}{\partial \vec{x}}\right)^{\top} d \vec{x}+y \cdot\left(\frac{\partial \vec{z}}{\partial \vec{x}}\right)^{\top} d \vec{x} \\
& \therefore \frac{\partial y \vec{z}}{\partial \vec{x}}=y \cdot \frac{\partial \vec{z}}{\partial \vec{x}}+\frac{\partial y}{\partial \vec{x}} \cdot \vec{z}^{\top}
\end{aligned}
\]
\((B.26)\)
\[\begin{aligned}
& \vec{x} \in R^N, \quad \vec{f}(\vec{x})=\left[f\left(x_1\right), f\left(x_2\right) \ldots f\left(x_n\right)\right] \in R^N, 则 \\
& \frac{\partial \vec{f}(\vec{x})}{\partial \vec{x}}=\operatorname{diag}\left(\vec{f}^{\prime}(\vec{x})\right)
\end{aligned}
\]
\[\begin{aligned}
& \text { [证明]: }
\frac{\partial \vec{f}(\vec{x})}{\partial \vec{x}}=\left[\begin{array}{cccc}
\frac{\partial f_1}{\partial x_1} & \frac{\partial f_2}{\partial x_1} & \cdots & \frac{\partial f_n}{\partial \eta_n} \\
\vdots & \vdots & & \vdots \\
\frac{\partial f_1}{\partial x_n} & \frac{\partial f_1}{\partial x_n} & \cdots & -\frac{\partial f_n}{\partial x_n}
\end{array}\right]=\left[\begin{array}{llll}
f^{\prime}\left(x_1\right) & & \\
& f^{\prime}\left(x_2\right) & & \\
& & \ddots & \\
& & & f^{\prime}\left(x_n\right)
\end{array}\right]=\operatorname{diag}\left(\vec{f}^{\prime}(\vec{x})\right)
\end{aligned}
\]
\(Apart中必须说明的两个推导:\)
\((1)\)
\[\begin{aligned}
& \vec{x} \in R^n, \exp (\vec{x})=\left[\begin{array}{c}
\exp \left(x_1\right) \\
\vdots \\
\exp \left(x_n\right)
\end{array}\right] \in R^n\\
& 故存在偏导:\frac{\partial \exp (\vec{x})}{\partial \vec{x}}=\left[\begin{array}{ccc}
\frac{\partial \exp \left(x_1\right)}{\partial x_1} & \cdots & \frac{\partial \exp \left(x_n\right)}{\partial x_1} \\
\vdots & & \\
\frac{\partial \exp \left(x_1\right)}{\partial x_n} & \cdots & \frac{\partial \exp \left(x_n\right)}{\partial x_n}
\end{array}\right]=\operatorname{diag}(\exp (\vec{x}))
\end{aligned}
\]
\((2)\)
\[\begin{aligned}
& d\vec{1}^{\top} \exp (\vec{x}) \\
& =\vec{1}^{\top} d \exp (\vec{x}) \\
&=\vec{1}^{\top}\left(\exp ^{\prime}(\vec{x}) \odot d \vec{x}\right) \\
&=\left(\vec{1} \odot \exp ^{\prime}(\vec{x})\right)^{\top} d \vec{x} \\
& \text { 有: } \frac{\partial \vec{1}^{\top} \exp (\vec{x})}{\partial \vec{x}}=\vec{1} \odot \exp ^{\prime}(\vec{x})=\exp ^{\prime}(\vec{x})=\exp (\vec{x})
\end{aligned}
\]
C
理解可能有偏颇。

浙公网安备 33010602011771号