手搓交叉熵损失函数
1 交叉熵损失函数
\[\begin{aligned}
L_{\mathrm{CE}}(\hat{y},y)& =-\log p(y|x)~=~-[y\log\hat{y}+(1-y)\log(1-\hat{y})] \\
&=-[y\log\sigma(w\cdot x+b)+(1-y)\log{(1-\sigma(w\cdot x+b))}]
\end{aligned}
\]
2 对权重 \(w_j\) 求梯度
令 \(z = w \cdot x + b\) 得
\[\begin{aligned}
\frac{\partial L_{CE}(\hat {y},y)}{\partial w_j}& =\left.-\left(\frac y{\sigma(z)}-\frac{(1-y)}{1-\sigma(z)}\right)\frac{\partial\sigma}{\partial w_j}\right. \\
&=-\left(\frac y{\sigma(z)}-\frac{(1-y)}{1-\sigma(z)}\right)\sigma^{\prime}(z)x_j \\
&=\frac{\sigma^{\prime}(z)x_j}{\sigma(z)(1-\sigma(z))}(\sigma(z)-y) \\
&=x_j(\sigma(z)-y)
\end{aligned}
\]
其中
\[\sigma^{\prime}(z)=\sigma(z)(1-\sigma(z))
\]
证明如下:
\[\begin{aligned}
\sigma^{\prime}(z)& =(\frac1{1+e^{-z}})^{\prime} \\
&=(-1)(1+e^{-z})^{(-1)-1}\cdot(e^{-z})^{\prime} \\
&=\frac{1}{\left(1+e^{-z}\right)^{2}}\cdot(e^{-z}) \\
&=\frac{1}{1+e^{-z}}\cdot\frac{e^{-z}}{1+e^{-z}} \\
&=\frac{1}{1+e^{-z}}\cdot(1-\frac{1}{1+e^{-z}}) \\
&=\sigma(z)(1-\sigma(z))
\end{aligned}
\]
最近考试,后期会补充细节,倘若大佬发现错误,敬请斧正,感谢感谢!