交叉熵

公式推导

sigmoid激活函数

\[\begin{aligned} \frac{\partial C}{\partial b} &=\frac{\partial C}{\partial a} \cdot \frac{\partial a}{\partial z} \cdot \frac{\partial z}{\partial b} \\ &=\frac{\partial C}{\partial a} \cdot \sigma^{\prime}(z) \cdot \frac{\partial(w x+b)}{\partial b} \\ &=\frac{\partial C}{\partial a} \cdot \sigma^{\prime}(z) \\ &=\frac{\partial C}{\partial a} \cdot a(1-a) \end{aligned} \]

二次代价函数

\[\begin{array}{l} \frac{\partial C}{\partial b}=(a-y) \sigma^{\prime}(z) \\ \frac{\partial C}{\partial b}=(a-y) \\ \frac{\partial C}{\partial a} \cdot a(1-a)=(a-y) \\ C=-[y \ln a+(1-y) \ln (1-a)]+\text { constant } \end{array} \]

函数求导

\[C=-\frac{1}{n} \sum_{x}[y \ln a+(1-y) \ln (1-a)] \]

对W求导

\[\begin{aligned} \frac{\partial C}{\partial w_{j}} &=-\frac{1}{n} \sum_{x}\left(\frac{y}{\sigma(z)}-\frac{(1-y)}{1-\sigma(z)}\right) \frac{\partial \sigma}{\partial w_{j}} \\ &=-\frac{1}{n} \sum_{x}\left(\frac{y}{\sigma(z)}-\frac{(1-y)}{1-\sigma(z)}\right) \sigma^{\prime}(z) x_{j} \\ &=\frac{1}{n} \sum_{x} \frac{\sigma^{\prime}(z) x_{j}}{\sigma(z)(1-\sigma(z))}(\sigma(z)-y) \\ &=\frac{1}{n} \sum_{x} x_{j}(\sigma(z)-y) \end{aligned} \]

\[\sigma^{\prime}(z)=\sigma(z)(1-\sigma(z)) \]

对b求导

\[\frac{\partial C}{\partial b}=\frac{1}{n} \sum_{x}(\sigma(z)-y) \]

为什么不用二次代价函数

\[C=\frac{1}{2 n} \sum_{x}\left\|y(x)-a^{L}(x)\right\|^{2} \]

如果使用二次代价函数训练ANN,看到的实际效果是,如果误差越大,参数调整的幅度可能更小,训练更缓慢。

posted @ 2020-10-15 16:59  庵摩罗果  阅读(116)  评论(0)    收藏  举报