# 神经网络中的批标准化

#### 本文将讨论以下内容

• 批标准化如何减少内部协方差移位，如何改进神经网络的训练。
• 如何在PyTorch中实现批标准化层。
• 一些简单的实验显示了使用批标准化的优点。

### 减少内部协方差移位

• 我们将独立地对每个标量特征进行归一化(通过设置均值为0和方差为1)，而不是对层的输入和输出的特征进行白化。
• 我们不使用整个数据集来进行标准化，而是使用mini-batch，每个mini-batch生成每个激活层的平均值和方差的估计值。

### 批标准化算法

#### 训练时

##### 全连接层

mean = torch.mean(X, axis=0)
variance = torch.mean((X-mean)**2, axis=0)
X_hat = (X-mean) * 1.0 /torch.sqrt(variance + eps)
out = gamma * X_hat + beta

##### 卷积层

N, C, H, W = X.shape
mean = torch.mean(X, axis = (0, 2, 3))
variance = torch.mean((X - mean.reshape((1, C, 1, 1))) ** 2, axis=(0, 2, 3))
X_hat = (X - mean.reshape((1, C, 1, 1))) * 1.0 / torch.sqrt(variance.reshape((1, C, 1, 1)) + eps)
out = gamma.reshape((1, C, 1, 1)) * X_hat + beta.reshape((1, C, 1, 1))


### 最后一个模块

class CustomBatchNorm(nn.Module):

def __init__(self, in_size, momentum=0.9, eps = 1e-5):
super(CustomBatchNorm, self).__init__()

self.momentum = momentum
self.insize = in_size
self.eps = eps

U = uniform.Uniform(torch.tensor([0.0]), torch.tensor([1.0]))
self.gamma = nn.Parameter(U.sample(torch.Size([self.insize])).view(self.insize))
self.beta = nn.Parameter(torch.zeros(self.insize))

self.register_buffer('running_mean', torch.zeros(self.insize))
self.register_buffer('running_var', torch.ones(self.insize))

self.running_mean.zero_()
self.running_var.fill_(1)

def forward(self, input):

X = input

if len(X.shape) not in (2, 4):
raise ValueError("only support dense or 2dconv")

#全连接层
elif len(X.shape) == 2:
if self.training:
mean = torch.mean(X, axis=0)
variance = torch.mean((X-mean)**2, axis=0)

self.running_mean = (self.momentum * self.running_mean) + (1.0-self.momentum) * mean
self.running_var = (self.momentum * self.running_var) + (1.0-self.momentum) * (input.shape[0]/(input.shape[0]-1)*variance)

else:
mean = self.running_mean
variance = self.running_var

X_hat = (X-mean) * 1.0 /torch.sqrt(variance + self.eps)
out = self.gamma * X_hat + self.beta

# 卷积层
elif len(X.shape) == 4:
if self.training:
N, C, H, W = X.shape
mean = torch.mean(X, axis = (0, 2, 3))
variance = torch.mean((X - mean.reshape((1, C, 1, 1))) ** 2, axis=(0, 2, 3))

self.running_mean = (self.momentum * self.running_mean) + (1.0-self.momentum) * mean
self.running_var = (self.momentum * self.running_var) + (1.0-self.momentum) * (input.shape[0]/(input.shape[0]-1)*variance)
else:
mean = self.running_mean
var = self.running_var

X_hat = (X - mean.reshape((1, C, 1, 1))) * 1.0 / torch.sqrt(variance.reshape((1, C, 1, 1)) + self.eps)
out = self.gamma.reshape((1, C, 1, 1)) * X_hat + self.beta.reshape((1, C, 1, 1))

return out


### 实验MNIST

class SimpleNet(nn.Module):
def __init__(self):
super(SimpleNet, self).__init__()
self.classifier = nn.Sequential(
nn.Linear(28 * 28, 64),
nn.ReLU(),
nn.Linear(64, 128),
nn.ReLU(),
nn.Linear(128, 10)
)

def forward(self, x):
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x


class SimpleNetBN(nn.Module):
def __init__(self):
super(SimpleNetBN, self).__init__()
self.classifier = nn.Sequential(
nn.Linear(28 * 28, 64),
CustomBatchNorm(64),
nn.ReLU(),
nn.Linear(64, 128),
CustomBatchNorm(128),
nn.ReLU(),
nn.Linear(128, 10)
)

def forward(self, x):
x = x.view(x.size(0), -1)
x = self.classifier(x)
return x


### 结论

#### 使用批标准化进行训练的优点

• 一个mini-batch处理的损失梯度是对训练集的梯度的估计，训练的质量随着批处理大小的增加而提高。
• 由于gpu提供的并行性，批处理大小上的计算要比单个示例的多次计算效率高得多。
• 在每一层使用批处理归一化来减少内部方差的移位，大大提高了网络的学习效率。

http://panchuang.net/

sklearn机器学习中文官方文档：
http://sklearn123.com/

http://docs.panchuang.net/

posted @ 2020-06-04 12:57  人工智能遇见磐创  阅读(173)  评论(0编辑  收藏