L1Penalty

为了实现稀疏性(使中间层一定比例的系数为0),加入L1Penalty。pytorch实现如下:

class L1Penalty(torch.autograd.Function):
  
    """
    In the forward pass we receive a Tensor containing the input and return
    a Tensor containing the output. ctx is a context object that can be used
    to stash information for backward computation. You can cache arbitrary
    objects for use in the backward pass using the ctx.save_for_backward method.
    """
    @staticmethod
    def forward(ctx, input, l1weight):
        ctx.save_for_backward(input)
        ctx.l1weight = l1weight
        return input

    """
    In the backward pass we receive a Tensor containing the gradient of the loss
    with respect to the output, and we need to compute the gradient of the loss
    with respect to the input.
    """
    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = input.clone().sign().mul(ctx.l1weight)
        grad_input += grad_output
        return grad_input, None
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Sequential(nn.Linear(28*28, 400),
                                     nn.Tanh())
        self.decoder = nn.Sequential(nn.Linear(400, 28*28),
                                     nn.Sigmoid())
  
    def forward(self, x):
        x = self.encoder(x)
        x = L1Penalty.apply(x, 0.1)  # 10% of the weights are supposed to be zero
        x = self.decoder(x)
        return x
    
net = Autoencoder()
print(net)

if GPU:
      net = net.cuda()

init_weightsE = copy.deepcopy(net.encoder[0].weight.data)
init_weightsD = copy.deepcopy(net.decoder[0].weight.data)

Sparse AutoEncoder

(Andrew Ng)

在AE中,我们希望中间层维数尽量少,实现数据的“压缩”表达。实际应用中,这样学到的隐层通常跟PCA效果类似。

image-20200412171131529

通过引入一些限制,即使隐层维度较大也能发现数据中有趣的结构。引入稀疏性,使神经元大多数时间不被激活(sigmoid接近0或tanh接近-1)。

\[\hat{\rho}_j=\frac{1}{m}\sum_{i=1}^m[a_j^{(2)}(x^{(i)})] \]

\(\hat{\rho}_j\)为隐层单元\(j\)的活跃度,\(a_j^{(2)}(\cdot )\)为激活函数。使\(\hat{\rho}_j=\rho\),其中\(\rho\)为稀疏参数,通常为接近0的小值(如\(\rho=0.05\)),即隐层单元在95%的时间里不被激活。实际上为了满足约束,隐层单元的活跃度会接近0,通过添加惩罚项使\(\hat{\rho}_j\)\(\rho\)偏离。

\[\sum_{j=1}^{s_2}\rho\log\frac{\rho}{\hat{\rho}_j}+(1-\rho)\log\frac{1-\rho}{1-\hat{\rho}_j} \]

即KL散度

\[\sum_{j=1}^{s_2}\text{KL}(\rho||\hat{\rho}_j) \]

\(\rho=0.2\)时,KL散度图为

image-20200412174422758
posted @ 2020-04-17 12:41  iFzh  阅读(746)  评论(0)    收藏  举报