Exercise : sparse coding

参考网页：

http://deeplearning.stanford.edu/wiki/index.php/Exercise:Sparse_Coding

前言：

　　使用IMAGES数据集，在10张图片上采样20000个patch块，学习特征并对特征进行可视化表示。

实验流程:

　　1、采样20000patch块

　　2、对特征进行分组，分组代码没看太明白，但大概意思基本了解

　　3、随机初始化 A, s (使用高斯函数)

　　4、采用迭代方法求最优解，具体过程参考作者博文 sparse coding 理论

理论基础：

目标函数：

$J = \frac{1}{m}\left \| As-x \right \|_{2}^{2} + \lambda\sum \sqrt{VS\cdot^2 + \epsilon} + \gamma \left \| A \right \|_{2}^{2}$ （Ng网页中的公式有误，这是修改后的）

对s求梯度：

$\bigtriangledown _{s}F = \frac{2}{m}A^{T}(As-x) + \lambda V^{T}(Vss^{T}+\epsilon)^{-\frac{1}{2}}\cdot s$ （注意点乘）

具体参考作者博文使用BP算法计算sparse coding目标函数的梯度

注意：

1、作者对拓扑与非拓扑sparse coding作统一处理，在非拓扑sparse coding中，V为单位阵

2、按照tornadomeet的博文，使用LBGFS方法训练出的效果并不好，改用cg方法。

3、验证sparseCodingWeightCost.m 及sparseCodingFeature.m后，注释掉验证程序

4、程序包含的文件，本节网页中只给了一部分代码，其它代码去Exercise:sparse autoencoder中下载

实验结果：

　　可以看出非拓扑型特征间没有相关性，而拓扑型相邻特征具有相似性，符合我们的预设。tornadomeet的博文中介绍说用8*8的patch块比16*16的效果好，作者使用的是16*16的patch块，有兴趣的可以试下8*8的情况

非拓扑sparse coding:

拓扑sparse coding

Future work:

　　1、尝试8*8patch块的结果

　　2、尝试不同优化算法的结果，本实验中只使用cg方法，下次可尝试lbgfs方法

主要代码：

sparseCodingExercise:

%% CS294A/CS294W Sparse Coding Exercise

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  sparse coding exercise. In this exercise, you will need to modify
%  sparseCodingFeatureCost.m and sparseCodingWeightCost.m. You will also
%  need to modify this file, sparseCodingExercise.m slightly.

% Add the paths to your earlier exercises if necessary
% addpath /path/to/solution

%%======================================================================
%% STEP 0: Initialization
%  Here we initialize some parameters used for the exercise.

numPatches = 20000;   % number of patches
numFeatures = 121;    % number of features to learn
patchDim = 8;         % patch dimension
visibleSize = patchDim * patchDim; 

% dimension of the grouping region (poolDim x poolDim) for topographic sparse coding
poolDim = 3;
% number of patches per batch
batchNumPatches = 2000; 

lambda = 5e-5;  % L1-regularisation parameter (on features)
epsilon = 1e-5; % L1-regularisation epsilon |x| ~ sqrt(x^2 + epsilon)
gamma = 1e-2;   % L2-regularisation parameter (on basis)

%%======================================================================
%% STEP 1: Sample patches

images = load('IMAGES.mat');
images = images.IMAGES;

patches = sampleIMAGES(images, patchDim, numPatches);
display_network(patches(:, 1:64));

%%======================================================================
% %% STEP 2: Implement and check sparse coding cost functions
% %  Implement the two sparse coding cost functions and check your gradients.
% %  The two cost functions are
% %  1) sparseCodingFeatureCost (in sparseCodingFeatureCost.m) for the features 
% %     (used when optimizing for s, which is called featureMatrix in this exercise) 
% %  2) sparseCodingWeightCost (in sparseCodingWeightCost.m) for the weights
% %     (used when optimizing for A, which is called weightMatrix in this exericse)
% 
% % We reduce the number of features and number of patches for debugging
% numFeatures = 5;
% patches = patches(:, 1:5);
% numPatches = 5;
% %initialize the weightMatrix and featureMatrix
% weightMatrix = randn(visibleSize, numFeatures) * 0.005;
% featureMatrix = randn(numFeatures, numPatches) * 0.005;
% 
% %% STEP 2a: Implement and test weight cost
% %  Implement sparseCodingWeightCost in sparseCodingWeightCost.m and check
% %  the gradient
% 
% [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon);
% 
% numgrad = computeNumericalGradient( @(x) sparseCodingWeightCost(x, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon), weightMatrix(:) );
% % Uncomment the blow line to display the numerical and analytic gradients side by side
% % disp([numgrad grad]);     
% diff = norm(numgrad-grad)/norm(numgrad+grad);
% fprintf('Weight difference: %g\n', diff);
% assert(diff < 1e-8, 'Weight difference too large. Check your weight cost function. ');
% 
% %% STEP 2b: Implement and test feature cost (non-topographic)
% %  Implement sparseCodingFeatureCost in sparseCodingFeatureCost.m and check
% %  the gradient. You may wish to implement the non-topographic version of
% %  the cost function first, and extend it to the topographic version later.
% 
% % Set epsilon to a larger value so checking the gradient numerically makes sense
% epsilon = 1e-2;
% 
% % We pass in the identity matrix as the grouping matrix, putting each
% % feature in a group on its own.
% groupMatrix = eye(numFeatures);
% 
% [cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix);
% 
% numgrad = computeNumericalGradient( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix), featureMatrix(:) );
% % Uncomment the blow line to display the numerical and analytic gradients side by side
% % disp([numgrad grad]); 
% diff = norm(numgrad-grad)/norm(numgrad+grad);
% fprintf('Feature difference (non-topographic): %g\n', diff);
% assert(diff < 1e-8, 'Feature difference too large. Check your feature cost function. ');
% 
% %% STEP 2c: Implement and test feature cost (topographic)
% %  Implement sparseCodingFeatureCost in sparseCodingFeatureCost.m and check
% %  the gradient. This time, we will pass a random grouping matrix in to
% %  check if your costs and gradients are correct for the topographic
% %  version.
% 
% % Set epsilon to a larger value so checking the gradient numerically makes sense
% epsilon = 1e-2;
% 
% % This time we pass in a random grouping matrix to check if the grouping is
% % correct.
% groupMatrix = rand(100, numFeatures);
% 
% [cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix);
% 
% numgrad = computeNumericalGradient( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix), featureMatrix(:) );
% % Uncomment the blow line to display the numerical and analytic gradients side by side
% % disp([numgrad grad]); 
% diff = norm(numgrad-grad)/norm(numgrad+grad);
% fprintf('Feature difference (topographic): %g\n', diff);
% assert(diff < 1e-8, 'Feature difference too large. Check your feature cost function. ');

%%======================================================================
%% STEP 3: Iterative optimization
%  Once you have implemented the cost functions, you can now optimize for
%  the objective iteratively. The code to do the iterative optimization 
%  using mini-batching and good initialization of the features has already
%  been included for you. 
% 
%  However, you will still need to derive and fill in the analytic solution 
%  for optimizing the weight matrix given the features. 
%  Derive the solution and implement it in the code below, verify the
%  gradient as described in the instructions below, and then run the
%  iterative optimization.

% Initialize options for minFunc
options.Method = 'cg';
options.display = 'off';
options.verbose = 0;

% Initialize matrices
weightMatrix = rand(visibleSize, numFeatures);
featureMatrix = rand(numFeatures, batchNumPatches);

% Initialize grouping matrix
assert(floor(sqrt(numFeatures)) ^2 == numFeatures, 'numFeatures should be a perfect square');
donutDim = floor(sqrt(numFeatures));
assert(donutDim * donutDim == numFeatures,'donutDim^2 must be equal to numFeatures');

groupMatrix = zeros(numFeatures, donutDim, donutDim);

groupNum = 1;
for row = 1:donutDim
    for col = 1:donutDim
        groupMatrix(groupNum, 1:poolDim, 1:poolDim) = 1;
        groupNum = groupNum + 1;
        groupMatrix = circshift(groupMatrix, [0 0 -1]);
    end
    groupMatrix = circshift(groupMatrix, [0 -1, 0]);
end

groupMatrix = reshape(groupMatrix, numFeatures, numFeatures);
if isequal(questdlg('Initialize grouping matrix for topographic or non-topographic sparse coding?', 'Topographic/non-topographic?', 'Non-topographic', 'Topographic', 'Non-topographic'), 'Non-topographic')
    groupMatrix = eye(numFeatures);
end

% Initial batch
indices = randperm(numPatches);
indices = indices(1:batchNumPatches);
batchPatches = patches(:, indices);                           

fprintf('%6s%12s%12s%12s%12s\n','Iter', 'fObj','fResidue','fSparsity','fWeight');
addpath minFunc/
for iteration = 1:20                      
    error = weightMatrix * featureMatrix - batchPatches;
    error = sum(error(:) .^ 2) / batchNumPatches;
    
    fResidue = error;
    
    R = groupMatrix * (featureMatrix .^ 2);
    R = sqrt(R + epsilon);    
    fSparsity = lambda * sum(R(:));    
    
    fWeight = gamma * sum(weightMatrix(:) .^ 2);
    
    fprintf('  %4d  %10.4f  %10.4f  %10.4f  %10.4f\n', iteration, fResidue+fSparsity+fWeight, fResidue, fSparsity, fWeight)
               
    % Select a new batch
    indices = randperm(numPatches);
    indices = indices(1:batchNumPatches);
    batchPatches = patches(:, indices);                    
    
    % Reinitialize featureMatrix with respect to the new batch
    
    % x = weightMatrix * featureMatrix , so we initialize 
    % featureMatirx =weightMatrix' * s , the second step : nomalize make
    % the sparsity panelty small
    featureMatrix = weightMatrix' * batchPatches;
    normWM = sum(weightMatrix .^ 2)';
    featureMatrix = bsxfun(@rdivide, featureMatrix, normWM); 
    
    % Optimize for feature matrix    
    options.maxIter = 20;
    [featureMatrix, cost] = minFunc( @(x) sparseCodingFeatureCost(weightMatrix, x, visibleSize, numFeatures, batchPatches, gamma, lambda, epsilon, groupMatrix), ...
                                           featureMatrix(:), options);
    featureMatrix = reshape(featureMatrix, numFeatures, batchNumPatches);                                      
       
    % Optimize for weight matrix  
    weightMatrix = zeros(visibleSize, numFeatures);     
   
    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Fill in the analytic solution for weightMatrix that minimizes 
    %   the weight cost here.     
    %   Once that is done, use the code provided below to check that your
    %   closed form solution is correct.
    %   Once you have verified that your closed form solution is correct,
    %   you should comment out the checking code before running the
    %   optimization.
   % 当featureMatrix已知，损失函数最小化为凸优化为问题（二次函数），具有解析解，因此直接带公式计算

     weightMatrix = (batchPatches *  featureMatrix') / (featureMatrix * featureMatrix' + gamma *batchNumPatches* eye( size(featureMatrix , 1) ) );
    
    [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures, batchPatches, gamma, lambda, epsilon, groupMatrix);
%      assert(norm(grad) < 1e-12, 'Weight gradient should be close to 0. Check your closed form solution for weightMatrix again.')
%      error('Weight gradient is okay. Comment out checking code before running optimization.');
    % -------------------- YOUR CODE HERE --------------------   
      figure(1);
       display_network(weightMatrix);   
end
save( 'weightMatrix.mat' , 'weightMatrix');
    % Visualize learned basis
    
fprintf('congratulations! sparseCoding Finished');

sparseCodingWeightCost.m

function [cost, grad] = sparseCodingWeightCost(weightMatrix, featureMatrix, visibleSize, numFeatures,  patches, gamma, lambda, epsilon, groupMatrix)
%sparseCodingWeightCost - given the features in featureMatrix, 
%                         computes the cost and gradient with respect to
%                         the weights, given in weightMatrix
% parameters
%   weightMatrix  - the weight matrix. weightMatrix(:, c) is the cth basis
%                   vector.
%   featureMatrix - the feature matrix. featureMatrix(:, c) is the features
%                   for the cth example
%   visibleSize   - number of pixels in the patches
%   numFeatures   - number of features
%   patches       - patches
%   gamma         - weight decay parameter (on weightMatrix)
%   lambda        - L1 sparsity weight (on featureMatrix)
%   epsilon       - L1 sparsity epsilon
%   groupMatrix   - the grouping matrix. groupMatrix(r, :) indicates the
%                   features included in the rth group. groupMatrix(r, c)
%                   is 1 if the cth feature is in the rth group and 0
%                   otherwise.

    if exist('groupMatrix', 'var')
        assert(size(groupMatrix, 2) == numFeatures, 'groupMatrix has bad dimension');
    else
        groupMatrix = eye(numFeatures);
    end

    numExamples = size(patches, 2);

    weightMatrix = reshape(weightMatrix, visibleSize, numFeatures);
    featureMatrix = reshape(featureMatrix, numFeatures, numExamples);
    
    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   weights given in weightMatrix.     
    % -------------------- YOUR CODE HERE --------------------    
　　%偷懒了，直接从sparseCodingExercise.m拷贝作者的代码过来
     error = weightMatrix * featureMatrix - patches;
    error = sum(error(:) .^ 2) / numExamples;
    fResidue = error;
    R = groupMatrix * (featureMatrix .^ 2);
    R = sqrt(R + epsilon);    
    fSparsity = lambda * sum(R(:)); 
    fWeight = gamma * sum(weightMatrix(:) .^ 2);
    cost = fResidue + fSparsity + fWeight;
    
    grad = 1 / numExamples *( 2*weightMatrix*featureMatrix*featureMatrix' - 2*patches*featureMatrix' )+2*gamma*weightMatrix;
    grad = grad(:); %grad = reshape( grad , size(grad,1)*size(grad,2) ,1);

end

sparseCodingFeatureCost.m

function [cost, grad] = sparseCodingFeatureCost(weightMatrix, featureMatrix, visibleSize, numFeatures, patches, gamma, lambda, epsilon, groupMatrix)
%sparseCodingFeatureCost - given the weights in weightMatrix,
%                          computes the cost and gradient with respect to
%                          the features, given in featureMatrix
% parameters
%   weightMatrix  - the weight matrix. weightMatrix(:, c) is the cth basis
%                   vector.
%   featureMatrix - the feature matrix. featureMatrix(:, c) is the features
%                   for the cth example
%   visibleSize   - number of pixels in the patches
%   numFeatures   - number of features
%   patches       - patches
%   gamma         - weight decay parameter (on weightMatrix)
%   lambda        - L1 sparsity weight (on featureMatrix)
%   epsilon       - L1 sparsity epsilon
%   groupMatrix   - the grouping matrix. groupMatrix(r, :) indicates the
%                   features included in the rth group. groupMatrix(r, c)
%                   is 1 if the cth feature is in the rth group and 0
%                   otherwise.

    if exist('groupMatrix', 'var')
        assert(size(groupMatrix, 2) == numFeatures, 'groupMatrix has bad dimension');
    else
        groupMatrix = eye(numFeatures);
    end

    numExamples = size(patches, 2);

    weightMatrix = reshape(weightMatrix, visibleSize, numFeatures);
    featureMatrix = reshape(featureMatrix, numFeatures, numExamples);

    % -------------------- YOUR CODE HERE --------------------
    % Instructions:
    %   Write code to compute the cost and gradient with respect to the
    %   features given in featureMatrix.     
    %   You may wish to write the non-topographic version, ignoring
    %   the grouping matrix groupMatrix first, and extend the 
    %   non-topographic version to the topographic version later.
    % -------------------- YOUR CODE HERE --------------------
   error = weightMatrix * featureMatrix - patches;
    error = sum(error(:) .^ 2) / numExamples;
    fResidue = error;
    R = groupMatrix * (featureMatrix .^ 2);
    R = sqrt(R + epsilon);    
    fSparsity = lambda * sum(R(:));    
    
    fWeight = gamma * sum(weightMatrix(:) .^ 2);
    
    cost = fResidue + fSparsity + fWeight;
    %cal grad
    grad =1 / numExamples * 2*weightMatrix'*( weightMatrix*featureMatrix - patches ) + lambda * groupMatrix' * ( groupMatrix*featureMatrix.^2 + epsilon).^(-1/2) .* featureMatrix;
    grad = grad(:);
   
end

posted @ 2014-12-08 17:23 dupuleng 阅读(625) 评论(0) 收藏举报

刷新页面返回顶部

dupuleng

Exercise : sparse coding

公告