softmax回归推导
向量\(y\)(为one-hot编码,只有一个值为1,其他的值为0)真实类别标签(维度为\(m\),表示有\(m\)类别):
\[y=\begin{bmatrix}y_1\\ y_2\\ ...\\y_m\end{bmatrix}
\]
向量\(z\)为softmax函数的输入,和标签向量\(y\)的维度一样,为\(m\):
\[z=\begin{bmatrix}z_1\\ z_2\\ ...\\z_m\end{bmatrix}
\]
向量\(s\)为softmax函数的输出,和标签向量\(y\)的维度一样,为\(m\):
\[s=\begin{bmatrix}s_1\\ s_2\\ ...\\s_m\end{bmatrix}
\]
\[s_{i}=\frac{e^{z_{i}}}{\sum_{k=1}^{m}e^{z_{k}}}
\]
交叉熵损失函数:
\[c=-\sum_{j=1}^{m}y_jlns_j
\]
损失函数对向量\(z\)中的每个\(z_i\)求偏导:
\[\frac{\partial c}{\partial z_i}=-\sum_{j=1}^{m}\frac{\partial (y_jlns_j)}{\partial s_j}*\frac{\partial s_j}{\partial z_i}
=-\sum_{j=1}^{m}\frac{y_j}{s_j}*\frac{\partial s_j}{\partial z_i}
\]
当j=i时:
\[\frac{\partial s_j}{\partial z_i}=\frac{\partial (\frac{e^{z_{i}}}{\sum_{k=1}^{m}e^{z_{k}}})}{\partial z_i}
=\frac{e^{z_i}*\sum_{k=1}^{m}e^{z_k}-e^{z_i}*e^{z_i}}{(\sum_{k=1}^{m}e^{z_k})^2}
=\frac{e^{z_i}}{\sum_{k=1}^{m}e^{z_k}}*\frac{\sum_{k=1}^{m}e^{z_k}-e^{z_i}}{\sum_{k=1}^{m}e^{z_k}}
=\frac{e^{z_i}}{\sum_{k=1}^{m}e^{z_k}}*(1-\frac{e^{z_i}}{\sum_{k=1}^{m}e^{z_k}})
=s_i*(1-s_i)
\]
当j!=i时:
\[\frac{\partial s_j}{\partial z_i}=\frac{\partial (\frac{e^{z_{j}}}{\sum_{k=1}^{m}e^{z_{k}}})}{\partial z_i}
=\frac{0*\sum_{k=1}^{m}e^{z_k}-e^{z_j}*e^{z_i}}{(\sum_{k=1}^{m}e^{z_k})^2}
=-\frac{e^{z_j}}{\sum_{k=1}^{m}e^{z_k}}*\frac{e^{z_i}}{\sum_{k=1}^{m}e^{z_k}}
=-s_js_i
\]
所以:
\[\frac{\partial s_j}{\partial z_i}=\begin{cases}s_i(1-s_i)& j=i \\ -s_js_i& j\neq{i} \end{cases}
\]
损失函数对向量\(z\)中的每个\(z_i\)求偏导:
\[\frac{\partial c}{\partial z_i}
=-\sum_{j=1}^{m}\frac{y_j}{s_j}*\frac{\partial s_j}{\partial z_i}
=-(\frac{y_i}{s_i}*\frac{\partial s_i}{\partial z_i}+\sum_{j\neq{i}}^{m}\frac{y_j}{s_j}*\frac{\partial s_j}{\partial z_i})
=-(\frac{y_i}{s_i}*s_i(1-s_i)+\sum_{j\neq{i}}^{m}\frac{y_j}{s_j}*(-s_js_i))
\]
\[=-y_i(1-s_i)+\sum_{j\neq{i}}^{m}y_js_i
=-y_i+s_iy_i+\sum_{j\neq{i}}^{m}y_js_i
=-y_i+\sum_{j=1}^{m}y_js_i
=s_i-y_i
\]