L1Penalty
为了实现稀疏性(使中间层一定比例的系数为0),加入L1Penalty。pytorch实现如下:
class L1Penalty(torch.autograd.Function):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
@staticmethod
def forward(ctx, input, l1weight):
ctx.save_for_backward(input)
ctx.l1weight = l1weight
return input
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
@staticmethod
def backward(ctx, grad_output):
input, = ctx.saved_tensors
grad_input = input.clone().sign().mul(ctx.l1weight)
grad_input += grad_output
return grad_input, None
class Autoencoder(nn.Module):
def __init__(self):
super(Autoencoder, self).__init__()
self.encoder = nn.Sequential(nn.Linear(28*28, 400),
nn.Tanh())
self.decoder = nn.Sequential(nn.Linear(400, 28*28),
nn.Sigmoid())
def forward(self, x):
x = self.encoder(x)
x = L1Penalty.apply(x, 0.1) # 10% of the weights are supposed to be zero
x = self.decoder(x)
return x
net = Autoencoder()
print(net)
if GPU:
net = net.cuda()
init_weightsE = copy.deepcopy(net.encoder[0].weight.data)
init_weightsD = copy.deepcopy(net.decoder[0].weight.data)
Sparse AutoEncoder
(Andrew Ng)
在AE中,我们希望中间层维数尽量少,实现数据的“压缩”表达。实际应用中,这样学到的隐层通常跟PCA效果类似。
通过引入一些限制,即使隐层维度较大也能发现数据中有趣的结构。引入稀疏性,使神经元大多数时间不被激活(sigmoid接近0或tanh接近-1)。
\[\hat{\rho}_j=\frac{1}{m}\sum_{i=1}^m[a_j^{(2)}(x^{(i)})]
\]
\(\hat{\rho}_j\)为隐层单元\(j\)的活跃度,\(a_j^{(2)}(\cdot )\)为激活函数。使\(\hat{\rho}_j=\rho\),其中\(\rho\)为稀疏参数,通常为接近0的小值(如\(\rho=0.05\)),即隐层单元在95%的时间里不被激活。实际上为了满足约束,隐层单元的活跃度会接近0,通过添加惩罚项使\(\hat{\rho}_j\)从\(\rho\)偏离。
\[\sum_{j=1}^{s_2}\rho\log\frac{\rho}{\hat{\rho}_j}+(1-\rho)\log\frac{1-\rho}{1-\hat{\rho}_j}
\]
即KL散度
\[\sum_{j=1}^{s_2}\text{KL}(\rho||\hat{\rho}_j)
\]
当\(\rho=0.2\)时,KL散度图为
浙公网安备 33010602011771号