Exercise:Learning color features with Sparse Autoencoders
参考网页:
http://deeplearning.stanford.edu/wiki/index.php/Linear_Decoders
实验介绍:彩色图片的特征学习,采用STL-10,包含100,000个大小为8*8的patch 块。由于参数求解过程中对内存需要比较大,作者仅使得5000个patch块进行实验。
环境:matlab 2010a
实验模型:
Linear Decoder模型,与三层的autoencoders模型类似,不同之处在于输出层的激活函数为线性激活函数,即f(z) = z 。
理论基础:
Sparse Autoencoder Recap: 在稀疏自编码中,我们使用三层神经网络:输出层,隐含层,输出层。在隐含层、输出层我们使用相同的激活函数(sigmoid)。

在输出层a表示输出,在autoencoder中,a3表示对输入x的重构。由于在该模型中使用sigmoid激活函数,它的输出在[0 , 1]之间,因此我们需要将输入缩放到[0 , 1]之间。这种对输出的缩放模型对于MNIST数据集非常适合,但他对其它的应用场合并非都适合。比如,我们对输入样本执行PCA 白化,那么输入不再属于[0 , 1],因此autoencoder不能解决该问题。
Linear Decoder:结果以上问题的简单的方法就是在输出层使:a(3) = z(3),即在输出层的激活函数为identity function f(z) = z 。这种激活函数称为:线性激活函数。这样输出a(3)可以是任意值,没有[0 , 1]的限制,有篮球稀疏自编码模型的扩展。
因此在该模型中同样包含三层神经网络:输入层,隐含层,输出层。隐含层使用sigmoid激活函数,输出层使用线性激活函数。由于输出层的激活函数发生变化,因此梯度计算模块也随之发生变化。

实验步骤:
0、以前实验中都使用灰度图片,在该实验中我们使用彩色图片,即包含RGB三个通道,输入即把三个通道值串联起来,那么输入的大小为8*8*3
1、简单使用Linear Decoder模型进行特征学习
2、首先对输入作ZCA whitening,然后再使用白化后的结果作为Linear Decoder的输入进行特征学习。
实验结果:
原始样本:

白化:

权重可视化:

可以看到学习到的特征为一些边缘。
实验主要代码:
sparseAutoencoderLinearCost.m
注意:作者给的源码中,该函数的返回值有三个[const,grad ,features],在调用的时候仅使用前两个值,把features去掉即可,
function [cost,grad] = sparseAutoencoderLinearCost(theta, visibleSize, hiddenSize, ... lambda, sparsityParam, beta, data) % visibleSize: the number of input units (probably 64) % hiddenSize: the number of hidden units (probably 25) % lambda: weight decay parameter % sparsityParam: The desired average activation for the hidden units (denoted in the lecture % notes by the greek alphabet rho, which looks like a lower-case "p"). % beta: weight of sparsity penalty term % data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example. % The input theta is a vector (because minFunc expects the parameters to be a vector). % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this % follows the notation convention of the lecture notes. % W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize); W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize); b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize); b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end); % Cost and gradient variables (your code needs to compute these values). % Here, we initialize them to zeros. cost = 0; W1grad = zeros(size(W1)); W2grad = zeros(size(W2)); b1grad = zeros(size(b1)); b2grad = zeros(size(b2)); %% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder, % and the corresponding gradients W1grad, W2grad, b1grad, b2grad. % % W1grad, W2grad, b1grad and b2grad should be computed using backpropagation. % Note that W1grad has the same dimensions as W1, b1grad has the same dimensions % as b1, etc. Your code should set W1grad to be the partial derivative of J_sparse(W,b) with % respect to W1. I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b) % with respect to the input parameter W1(i,j). Thus, W1grad should be equal to the term % [(1/m) \Delta W^{(1)} + \lambda W^{(1)}] in the last block of pseudo-code in Section 2.2 % of the lecture notes (and similarly for W2grad, b1grad, b2grad). % % Stated differently, if we were using batch gradient descent to optimize the parameters, % the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2. % Jcost = 0;% Jweight = 0;% Jsparse = 0;% [n m] = size(data);%mn %修改部分 z2 = W1*data+repmat(b1,1,m);%b1m a2 = sigmoid(z2); z3 = W2*a2+repmat(b2,1,m); a3 = z3; % Jcost = (0.5/m)*sum(sum((a3-data).^2)); % Jweight = (1/2)*(sum(sum(W1.^2))+sum(sum(W2.^2))); % rho = (1/m).*sum(a2,2);% Jsparse = sum(sparsityParam.*log(sparsityParam./rho)+ ... (1-sparsityParam).*log((1-sparsityParam)./(1-rho))); % cost = Jcost+lambda*Jweight+beta*Jsparse; %修改部分 d3 = -(data-a3); sterm = beta*(-sparsityParam./rho+(1-sparsityParam)./(1-rho));% % d2 = (W2'*d3+repmat(sterm,1,m)).*sigmoidInv(z2); %W1grad W1grad = W1grad+d2*data'; W1grad = (1/m)*W1grad+lambda*W1; %W2grad W2grad = W2grad+d3*a2'; W2grad = (1/m).*W2grad+lambda*W2; %b1grad b1grad = b1grad+sum(d2,2); b1grad = (1/m)*b1grad;%b %b2grad b2grad = b2grad+sum(d3,2); b2grad = (1/m)*b2grad; %------------------------------------------------------------------- % After computing the cost and gradient, we will convert the gradients back % to a vector format (suitable for minFunc). Specifically, we will unroll % your gradient matrices into a vector. grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)]; end %------------------------------------------------------------------- % Here's an implementation of the sigmoid function, which you may find useful % in your computation of the costs and the gradients. This inputs a (row or % column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). function sigm = sigmoid(x) sigm = 1 ./ (1 + exp(-x)); end %sigmoid function sigmInv = sigmoidInv(x) sigmInv = sigmoid(x).*(1-sigmoid(x)); end

浙公网安备 33010602011771号