Caffe Batch Normalization推导2

Caffe Batch Normalization推导

首先仔细看过Caffe的BN层实现的话会发现caffe的BN层与论文的是不太一致的。

没有了 γ 和 β 。（因为如果有需要的话可以再加一个scale layer。）我这里将推导出给caffe的backward注释一样的diff公式。
那么我们先来定义一下问题。若 L 是网络的损失函数，那么就是已知 ∂L∂yi ， yi=xi−x¯σ2+ϵ√ ， x¯=1m∑mi=1xi ， σ2=1m∑mi=1(xi−x¯)2 ，求 ∂L∂xi 。
推导的过程基本上就是链式规则的应用了。具体过程如下：

\partial L \partial x i = \sum j = 1 m \partial L \partial y j \cdot \partial y j \partial x i

\partial x ¯ \partial x i = 1 m

\partial σ 2 \partial x i = 1 m ( \sum j = 1 m ( 2 ( x j - x ¯ ) \cdot ( - 1

由于 ∑mj=1(2(xj−x¯)⋅(−1m)))=0 ，所以 ∂σ2∂xi=2m(xi−x¯) 。

\partial y j \partial x i = - 1 2 ( σ 2 + ϵ ) - 3 2 ( x j - x ¯ ) \cdot

\partial y j \partial x i = - 1 2 ( σ 2 + ϵ ) - 3 2 ( x j - x ¯ ) \cdot

然后，我们将 ∂yj∂xi 代入 ∂L∂xi=∑mj=1∂L∂yj⋅∂yj∂xi

\partial L \partial x i = \sum j = 1 & j \neq i m \partial L \partial y j \cdot ( - 1 2 ( σ 2 +

我们注意到 j=i 的情况和 j≠i 的情况只是多了一个 (σ2+ϵ)−12 。因此我们可以合并这两个情况，变成以下式子

\partial L \partial x i = \partial L \partial y i \cdot ( σ 2 + ϵ ) - 1 2 + \sum j = 1

然后我们提出公因式 (σ2+ϵ)−12 ，则

\partial L \partial x i = ( σ 2 + ϵ ) - 1 2 ( \partial L \partial y i - 1 m \sum j

我故意的拆分了 (σ2+ϵ) ，这是因为 yi=xi−x¯σ2+ϵ√ ，我们可以将它代入到这个式子中，这样，这个式子就变成了

\partial L \partial x i = ( σ 2 + ϵ ) - 1 2 ( \partial L \partial y i - 1 m \sum j

至此我们得到了跟caffe的batch_norm_layer.cpp第187行一样的式子（只不过它用的矩阵的形式）

1
// if Y = (X-mean(X))/(sqrt(var(X)+eps)), then

2
//

3
// dE(Y)/dX =

4
//   (dE/dY - mean(dE/dY) - mean(dE/dY \cdot Y) \cdot Y)

5
//     ./ sqrt(var(X) + eps)

6
//

7
// where \cdot and ./ are hadamard product and elementwise division,

8
// respectively, dE/dY is the top diff, and mean/var/sum are all computed

9
// along all dimensions except the channels dimension.  In the above

10
// equation, the operations allow for expansion (i.e. broadcast) along all

11
// dimensions except the channels dimension where required.

posted @ 2018-03-19 11:20 菜鸡一枚阅读(148) 评论(0) 收藏举报

刷新页面返回顶部

菜鸡一枚

Caffe Batch Normalization推导2

Caffe Batch Normalization推导

公告