L1Penalty

为了实现稀疏性（使中间层一定比例的系数为0），加入L1Penalty。pytorch实现如下：

class L1Penalty(torch.autograd.Function):
  
    """
    In the forward pass we receive a Tensor containing the input and return
    a Tensor containing the output. ctx is a context object that can be used
    to stash information for backward computation. You can cache arbitrary
    objects for use in the backward pass using the ctx.save_for_backward method.
    """
    @staticmethod
    def forward(ctx, input, l1weight):
        ctx.save_for_backward(input)
        ctx.l1weight = l1weight
        return input

    """
    In the backward pass we receive a Tensor containing the gradient of the loss
    with respect to the output, and we need to compute the gradient of the loss
    with respect to the input.
    """
    @staticmethod
    def backward(ctx, grad_output):
        input, = ctx.saved_tensors
        grad_input = input.clone().sign().mul(ctx.l1weight)
        grad_input += grad_output
        return grad_input, None

class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        self.encoder = nn.Sequential(nn.Linear(28*28, 400),
                                     nn.Tanh())
        self.decoder = nn.Sequential(nn.Linear(400, 28*28),
                                     nn.Sigmoid())
  
    def forward(self, x):
        x = self.encoder(x)
        x = L1Penalty.apply(x, 0.1)  # 10% of the weights are supposed to be zero
        x = self.decoder(x)
        return x
    
net = Autoencoder()
print(net)

if GPU:
      net = net.cuda()

init_weightsE = copy.deepcopy(net.encoder[0].weight.data)
init_weightsD = copy.deepcopy(net.decoder[0].weight.data)

Sparse AutoEncoder

(Andrew Ng)

在AE中，我们希望中间层维数尽量少，实现数据的“压缩”表达。实际应用中，这样学到的隐层通常跟PCA效果类似。

通过引入一些限制，即使隐层维度较大也能发现数据中有趣的结构。引入稀疏性，使神经元大多数时间不被激活（sigmoid接近0或tanh接近-1）。

\[\hat{\rho}_j=\frac{1}{m}\sum_{i=1}^m[a_j^{(2)}(x^{(i)})] \]

\(\hat{\rho}_j\)为隐层单元\(j\)的活跃度，\(a_j^{(2)}(\cdot )\)为激活函数。使\(\hat{\rho}_j=\rho\)，其中\(\rho\)为稀疏参数，通常为接近0的小值（如\(\rho=0.05\)），即隐层单元在95%的时间里不被激活。实际上为了满足约束，隐层单元的活跃度会接近0，通过添加惩罚项使\(\hat{\rho}_j\)从\(\rho\)偏离。

\[\sum_{j=1}^{s_2}\rho\log\frac{\rho}{\hat{\rho}_j}+(1-\rho)\log\frac{1-\rho}{1-\hat{\rho}_j} \]

即KL散度

\[\sum_{j=1}^{s_2}\text{KL}(\rho||\hat{\rho}_j) \]

当\(\rho=0.2\)时，KL散度图为

posted @ 2020-04-17 12:41 iFzh 阅读(746) 评论(0) 收藏举报

刷新页面返回顶部

L1Penalty

Sparse AutoEncoder

公告