论文学习4——Layer Normalization

Abstract

2025-03-09 09:54:04 星期日

In order to overcome the limitation of batch normalization, which is not easily applicable to recurrent neural network, the authors proposed layer normalization.

Introduction

The layer normalization directly estimates the normalization statistics from the summed inputs to the neurons within a hidden layer.

Layer normalization

When applying batch normalization to an RNN, they need to compute and store separate statics for each step in a sequence. But layer normalization does not have such problem.

The layer normalized RNN make it invariant to re-scaling all if the summed inputs to a layer.

Analysis

The batch and weight normalization is are invariant to the re-scaling of the single weights. Layer normalization is invariant to scaling of the entire weight matrix.

The authors showed that the gain parameters for the batch normalized and layer normalized models depends only on the magnitude of the prediction error.

Experimental results

The author use LN and CNMeM to get better performance in both speed and final results.

They also add LN into DRAW which achieve the state-of-the-art in MNIST classification, and get a better result.

Conclusion

They find that RNN benfit the most from the LN

posted on 2025-02-27 07:21  bnbncch  阅读(48)  评论(0)    收藏  举报