softmax回归推导

向量\(y\)(为one-hot编码,只有一个值为1,其他的值为0)真实类别标签(维度为\(m\),表示有\(m\)类别):

\[y=\begin{bmatrix}y_1\\ y_2\\ ...\\y_m\end{bmatrix} \]

向量\(z\)为softmax函数的输入,和标签向量\(y\)的维度一样,为\(m\):

\[z=\begin{bmatrix}z_1\\ z_2\\ ...\\z_m\end{bmatrix} \]

向量\(s\)为softmax函数的输出,和标签向量\(y\)的维度一样,为\(m\):

\[s=\begin{bmatrix}s_1\\ s_2\\ ...\\s_m\end{bmatrix} \]

\[s_{i}=\frac{e^{z_{i}}}{\sum_{k=1}^{m}e^{z_{k}}} \]

交叉熵损失函数:

\[c=-\sum_{j=1}^{m}y_jlns_j \]

损失函数对向量\(z\)中的每个\(z_i\)求偏导:

\[\frac{\partial c}{\partial z_i}=-\sum_{j=1}^{m}\frac{\partial (y_jlns_j)}{\partial s_j}*\frac{\partial s_j}{\partial z_i} =-\sum_{j=1}^{m}\frac{y_j}{s_j}*\frac{\partial s_j}{\partial z_i} \]

当j=i时:

\[\frac{\partial s_j}{\partial z_i}=\frac{\partial (\frac{e^{z_{i}}}{\sum_{k=1}^{m}e^{z_{k}}})}{\partial z_i} =\frac{e^{z_i}*\sum_{k=1}^{m}e^{z_k}-e^{z_i}*e^{z_i}}{(\sum_{k=1}^{m}e^{z_k})^2} =\frac{e^{z_i}}{\sum_{k=1}^{m}e^{z_k}}*\frac{\sum_{k=1}^{m}e^{z_k}-e^{z_i}}{\sum_{k=1}^{m}e^{z_k}} =\frac{e^{z_i}}{\sum_{k=1}^{m}e^{z_k}}*(1-\frac{e^{z_i}}{\sum_{k=1}^{m}e^{z_k}}) =s_i*(1-s_i) \]

当j!=i时:

\[\frac{\partial s_j}{\partial z_i}=\frac{\partial (\frac{e^{z_{j}}}{\sum_{k=1}^{m}e^{z_{k}}})}{\partial z_i} =\frac{0*\sum_{k=1}^{m}e^{z_k}-e^{z_j}*e^{z_i}}{(\sum_{k=1}^{m}e^{z_k})^2} =-\frac{e^{z_j}}{\sum_{k=1}^{m}e^{z_k}}*\frac{e^{z_i}}{\sum_{k=1}^{m}e^{z_k}} =-s_js_i \]

所以:

\[\frac{\partial s_j}{\partial z_i}=\begin{cases}s_i(1-s_i)& j=i \\ -s_js_i& j\neq{i} \end{cases} \]

损失函数对向量\(z\)中的每个\(z_i\)求偏导:

\[\frac{\partial c}{\partial z_i} =-\sum_{j=1}^{m}\frac{y_j}{s_j}*\frac{\partial s_j}{\partial z_i} =-(\frac{y_i}{s_i}*\frac{\partial s_i}{\partial z_i}+\sum_{j\neq{i}}^{m}\frac{y_j}{s_j}*\frac{\partial s_j}{\partial z_i}) =-(\frac{y_i}{s_i}*s_i(1-s_i)+\sum_{j\neq{i}}^{m}\frac{y_j}{s_j}*(-s_js_i)) \]

\[=-y_i(1-s_i)+\sum_{j\neq{i}}^{m}y_js_i =-y_i+s_iy_i+\sum_{j\neq{i}}^{m}y_js_i =-y_i+\sum_{j=1}^{m}y_js_i =s_i-y_i \]

posted @ 2019-06-18 20:43  JohnRed  阅读(1296)  评论(0编辑  收藏  举报