交叉熵
公式推导
sigmoid激活函数
\[\begin{aligned}
\frac{\partial C}{\partial b} &=\frac{\partial C}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial b} \\
&=\frac{\partial C}{\partial a} \cdot \sigma^{\prime}(z) \cdot \frac{\partial(w x+b)}{\partial b} \\
&=\frac{\partial C}{\partial a} \cdot \sigma^{\prime}(z) \\
&=\frac{\partial C}{\partial a} \cdot a(1-a)
\end{aligned}
\]
二次代价函数
\[\begin{array}{l}
\frac{\partial C}{\partial b}=(a-y) \sigma^{\prime}(z) \\
\frac{\partial C}{\partial b}=(a-y) \\
\frac{\partial C}{\partial a} \cdot a(1-a)=(a-y) \\
C=-[y \ln a+(1-y) \ln (1-a)]+\text { constant }
\end{array}
\]
函数求导
\[C=-\frac{1}{n} \sum_{x}[y \ln a+(1-y) \ln (1-a)]
\]
对W求导
\[\begin{aligned}
\frac{\partial C}{\partial w_{j}} &=-\frac{1}{n} \sum_{x}\left(\frac{y}{\sigma(z)}-\frac{(1-y)}{1-\sigma(z)}\right) \frac{\partial \sigma}{\partial w_{j}} \\
&=-\frac{1}{n} \sum_{x}\left(\frac{y}{\sigma(z)}-\frac{(1-y)}{1-\sigma(z)}\right) \sigma^{\prime}(z) x_{j} \\
&=\frac{1}{n} \sum_{x} \frac{\sigma^{\prime}(z) x_{j}}{\sigma(z)(1-\sigma(z))}(\sigma(z)-y) \\
&=\frac{1}{n} \sum_{x} x_{j}(\sigma(z)-y)
\end{aligned}
\]
\[\sigma^{\prime}(z)=\sigma(z)(1-\sigma(z))
\]
对b求导
\[\frac{\partial C}{\partial b}=\frac{1}{n} \sum_{x}(\sigma(z)-y)
\]
为什么不用二次代价函数
\[C=\frac{1}{2 n} \sum_{x}\left\|y(x)-a^{L}(x)\right\|^{2}
\]
如果使用二次代价函数训练ANN,看到的实际效果是,如果误差越大,参数调整的幅度可能更小,训练更缓慢。

浙公网安备 33010602011771号