为什么神经网络使用互熵而不是分类误差

Intro

分类神经网络使用互熵（cross entropy）而不是分类误差来计算代价。这是为什么呢？我从 google 找到了一篇文章。

分析

假定存在神经网络，用于预测政治派别。使用 softmax 作为激励，输出为 3 个类别的概率。举例如下：

computed       | targets              | correct?
-----------------------------------------------
0.3  0.3  0.4  | 0  0  1 (democrat)   | yes
0.3  0.4  0.3  | 0  1  0 (republican) | yes
0.1  0.2  0.7  | 1  0  0 (other)      | no

错误率为 $ \frac{1}{3} $，正确率 $ \frac{2}{3} $。这个神经网络第一个输入的互熵误差为 $ -\ln{0.3} \times 0 - \ln{0.3} \times 0 - \ln{0.4} \times 1 = -\ln{0.4}$。平均互熵误差为 $ -\ln{0.3} \times 0 - \ln{0.3} \times 0 - \ln{0.4} \times 1 -\ln{0.3} \times 0 - \ln{0.4} \times 1 - \ln{0.3} \times 0 -\ln{0.1} \times 1 - \ln{0.2} \times 0 - \ln{0.7} \times 0 = 1.38 $。

computed       | targets              | correct?
-----------------------------------------------
0.1  0.2  0.7  | 0  0  1 (democrat)   | yes
0.1  0.7  0.2  | 0  1  0 (republican) | yes
0.3  0.4  0.3  | 1  0  0 (other)      | no

同理，对于该网络，错误率为 $ \frac{1}{3} $，正确率 $ \frac{2}{3} $；互熵误差为 $ -\ln{0.1} \times 0 - \ln{0.2} \times 0 - \ln{0.7} \times 1 -\ln{0.1} \times 0 - \ln{0.7} \times 1 - \ln{0.2} \times 0 -\ln{0.3} \times 1 - \ln{0.4} \times 0 - \ln{0.3} \times 0 = 0.64 $。虽然分类误差相同，这两个互熵误差存在这区别，第二个小于第一个。

其实 MSE（mean squared error），或者说 L2 距离，也是不错的。第一个神经网络误差为 $(0.54 + 0.54 + 1.34) / 3 = 0.81$，第二个网络神经网络误差为 $(0.14 + 0.14 + 0.74) / 3 = 0.34$。然而 MSE 过分强调那些 error 的例子，cross-entropy 没有这个现象，只处理 false-positive。

Ref

why-you-should-use-cross-entropy-error-instead-of-classification-error-or-mean-squared-error-for-neural-network-classifier-training

posted @ 2015-09-01 15:59 nrail 阅读(450) 评论(0) 收藏举报

刷新页面返回顶部

nrail

为什么神经网络使用互熵而不是分类误差

Intro

分析

Ref

公告