为什么batchnormalize 有效

The popular belief is that this effectiveness stems from controlling the change of the layers’ input distributions during training to reduce the so-called“internal covariate shift”. In this work, we demonstrate that such distributionalstability of layer inputs has little to do with the success of BatchNorm. Instead,we uncover a more fundamental impact of BatchNorm on the training process: it makes the optimization landscape significantly smoother. This smoothness inducesa more predictive and stable behavior of the gradients, allowing for faster training.

普遍的看法是，这种有效性源于在训练期间控制层输入分布的变化以减少所谓的“内部协方差偏移”。在这项工作中，我们证明了这种分布式层输入的稳定性与 BatchNorm 的成功无关。
我们发现了 BatchNorm 对训练过程的一个更根本的影响：它使优化环境更加顺畅。这种平滑性导致梯度的更具预测性和稳定性的行为，允许更快的训练。

posted @ 2022-08-19 22:49 luoganttcc 阅读(13) 评论(0) 收藏举报

刷新页面返回顶部