Exercise : Softmax Regression

　　上一节介绍了softmax regression的理论知识，这一节介绍它的具体实现。具体参考http://deeplearning.stanford.edu/wiki/index.php/Exercise:Softmax_Regression

实验介绍：

　　　　完成手写体识别，采用MNIST手写数据库，数字从0-9，训练样本6万个，测试样本1万个。每个样本由一个大小为28*28的图片表示

　　　　环境：matlab 2010a

理论基础：

　　该实验使用softmax模型，由于直接使用像素作为特征，因此不存在特征提取的过程，故只包含输入层与输出层，输入层神经元的个数为n(样本长度28*28)，输出层神经

　　元数目为类别数K。损失函数与相应的梯度如下：

　　注：已加入正则项，lambda为衰减因子

相关函数：

　　bsxfun :可以对两个矩阵A , B进行element-wise操作，可以执行的操作有plus , minus , rdivide , ldivede , or , and ...... ，要求两个矩阵的大小一致，或者A是与B某一个维度大小一致的行或列向量，然后将其扩展为与A大小相同的矩阵

　　sparse: S = sparse(i,j,s,m,n,nzmax) 由i , j ,s生成一个大小为m*n包含nzmax个非零元素的矩阵S，并且S( i(k）, j(k) ) = s(k)。有时候一个矩阵非常稀疏，即只有极少数元素为非0，因为使用（i,j,k)三个值即可表示一个元素，这样可以省去极大的空间。

　　full: 生成一个正常矩阵，一般是用稀疏矩阵的结果来还原，使用 S = sparse(i,j,s,m,n,nzmax) 也可以达到类似的效果

一些问题：

　　1、根据作者提供的代码：loadMNISTImage.m 导入样本，有几句句代码需要注意：

images = fread(fp, numCols*numRows*numImages, 'unsigned char');
images = reshape(images, numCols, numRows, numImages);
images = permute(images,[2 1 3]);

首先将图片数据读入，由于刚开始读入的图片是倾斜的，因此使用permute进行旋转，但由于数据量太大，对于32位机器，内存不足以支持其进行旋转操作，因此会提示：Error using permute out of memory, Type HELP MEMORY for your options。

为了解决这个问题，我们可以只读取一部分数据进行训练，代码如下

numImages = 10000;
images = fread(fp, numCols*numRows*numImages, 'unsigned char');
images = reshape(images, numCols, numRows, numImages);
images = permute(images,[2 1 3]);

　即修改根据自己机器的具体情况何必numImages的值，直到没有内存问题即可，我使用10000张训练样本

2、代码中有使用computeNumericalGradient 与minFunc 代码，但代码包中没有提供相应代码，可以去andrel ng Sparse_Autoencoder 那一节下载。

实验结果：

　　由于只使用了部分样本进行训练，因此准确率只有：

　　　　Accuracy: 90.300%

　　低于作者的　92.6%

实验代码：

softmaxExercise.m

%% CS294A/CS294W Softmax Exercise 

%  Instructions
%  ------------
% 
%  This file contains code that helps you get started on the
%  softmax exercise. You will need to write the softmax cost function 
%  in softmaxCost.m and the softmax prediction function in softmaxPred.m. 
%  For this exercise, you will not need to change any code in this file,
%  or any other files other than those mentioned above.
%  (However, you may be required to do so in later exercises)

%%======================================================================
%% STEP 0: Initialise constants and parameters
%
%  Here we define and initialise some constants which allow your code
%  to be used more generally on any arbitrary input. 
%  We also initialise some parameters used for tuning the model.

inputSize = 28 * 28; % Size of input vector (MNIST images are 28x28)
numClasses = 10;     % Number of classes (MNIST images fall into 10 classes)

lambda = 1e-4; % Weight decay parameter

%%======================================================================
%% STEP 1: Load data
%
%  In this section, we load the input and output data.
%  For softmax regression on MNIST pixels, 
%  the input data is the images, and 
%  the output data is the labels.
%

% Change the filenames if you've saved the files under different names
% On some platforms, the files might be saved as 
% train-images.idx3-ubyte / train-labels.idx1-ubyte

images = loadMNISTImages('train-images.idx3-ubyte');
labels = loadMNISTLabels('train-labels.idx1-ubyte');
labels(labels==0) = 10; % Remap 0 to 10

inputData = images;

% For debugging purposes, you may wish to reduce the size of the input data
% in order to speed up gradient checking. 
% Here, we create synthetic dataset using random data for testing

DEBUG = false; % Set DEBUG to true when debugging.
if DEBUG
    inputSize = 8;
    inputData = randn(8, 100);
    labels = randi(10, 100, 1);
end

% Randomly initialise theta
theta = 0.005 * randn(numClasses * inputSize, 1);

%%======================================================================
%% STEP 2: Implement softmaxCost
%
%  Implement softmaxCost in softmaxCost.m. 

[cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, inputData, labels);
                                     
%%======================================================================
%% STEP 3: Gradient checking
%
%  As with any learning algorithm, you should always check that your
%  gradients are correct before learning the parameters.
% 

if DEBUG
    numGrad = computeNumericalGradient( @(x) softmaxCost(x, numClasses, ...
                                    inputSize, lambda, inputData, labels), theta);

    % Use this to visually compare the gradients side by side
    disp([numGrad grad]); 

    % Compare numerically computed gradients with those computed analytically
    diff = norm(numGrad-grad)/norm(numGrad+grad);
    disp(diff); 
    % The difference should be small. 
    % In our implementation, these values are usually less than 1e-7.

    % When your gradients are correct, congratulations!
end

%%======================================================================
%% STEP 4: Learning parameters
%
%  Once you have verified that your gradients are correct, 
%  you can start training your softmax regression code using softmaxTrain
%  (which uses minFunc).

options.maxIter = 100;
softmaxModel = softmaxTrain(inputSize, numClasses, lambda, ...
                            inputData, labels, options);
                          
% Although we only use 100 iterations here to train a classifier for the 
% MNIST data set, in practice, training for more iterations is usually
% beneficial.

%%======================================================================
%% STEP 5: Testing
%
%  You should now test your model against the test images.
%  To do this, you will first need to write softmaxPredict
%  (in softmaxPredict.m), which should return predictions
%  given a softmax model and the input data.

images = loadMNISTImages('t10k-images.idx3-ubyte');
labels = loadMNISTLabels('t10k-labels.idx1-ubyte');
labels(labels==0) = 10; % Remap 0 to 10

inputData = images;

% You will have to implement softmaxPredict in softmaxPredict.m
[pred] = softmaxPredict(softmaxModel, inputData);

acc = mean(labels(:) == pred(:));
fprintf('Accuracy: %0.3f%%\n', acc * 100);

% Accuracy is the proportion of correctly classified images
% After 100 iterations, the results for our implementation were:
%
% Accuracy: 92.200%
%
% If your values are too low (accuracy less than 0.91), you should check 
% your code for errors, and make sure you are training on the 
% entire data set of 60000 28x28 training images 
% (unless you modified the loading code, this should be the case)

　　softmaxCost.mfunction [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)

% numClasses - the number of classes 
% inputSize - the size N of the input vector
% lambda - weight decay parameter
% data - the N x M input matrix, where each column data(:, i) corresponds to
%        a single test set
% labels - an M x 1 matrix containing the labels corresponding for the input data
%

% Unroll the parameters from theta
theta = reshape(theta, numClasses, inputSize);%将输入的参数列向量变成一个矩阵

numCases = size(data, 2);%输入样本的个数
groundTruth = full(sparse(labels, 1:numCases, 1));%这里sparse是生成一个稀疏矩阵，该矩阵中的值都是第三个值1
                                                    %稀疏矩阵的下标由labels和1:numCases对应值构成
                                                    %numClasses * M
cost = 0;

thetagrad = zeros(numClasses, inputSize); % numClasses * N

%% ---------- YOUR CODE HERE --------------------------------------
%  Instructions: Compute the cost and gradient for softmax regression.
%                You need to compute thetagrad and cost.
%                The groundTruth matrix might come in handy.

M = bsxfun(@minus,theta*data,max(theta*data, [], 1)); % numClasses * N * N * M = numClasses * M
M = exp(M); % numClasses * M
p = bsxfun(@rdivide, M, sum(M));%numClasses * M
cost = -1/numCases * groundTruth(:)' * log(p(:)) + lambda/2 * sum(theta(:) .^ 2);           %这里使用a(:)将矩阵转换为列向量，然后再进行计算，非常机智，减少一个sum操作
                                                                                               
　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　   %因为这里是矩阵的element-wise的计算，最后再对矩阵所有元素求和，如果按照传统方式来做

　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　%需要两次sum操作，而如果先将矩阵转换为列向量只需要一次sum操作

% cost = -1/numCases * sum( sum(groundTruth .* log(p)) ) + lambda/2 * sum( sum( theta.^2) ) ;%两次sum操作                                                                              
thetagrad = -1/numCases * (groundTruth - p) * data' + lambda * theta;



% ------------------------------------------------------------------
% Unroll the gradient matrices into a vector for minFunc
grad = [thetagrad(:)];
end

softmaxPredict.m:

function [pred] = softmaxPredict(softmaxModel, data)

% softmaxModel - model trained using softmaxTrain
% data - the N x M input matrix, where each column data(:, i) corresponds to
%        a single test set
%
% Your code should produce the prediction matrix 
% pred, where pred(i) is argmax_c P(y(c) | x(i)).
 
% Unroll the parameters from theta
theta = softmaxModel.optTheta;  % this provides a numClasses x inputSize matrix
pred = zeros(1, size(data, 2));

%% ---------- YOUR CODE HERE --------------------------------------
%  Instructions: Compute pred using theta assuming that the labels start 
%                from 1.


M = theta * data;
[preValue pred] = max( theta * data );


% ---------------------------------------------------------------------

end

posted @ 2014-11-24 12:34 dupuleng 阅读(385) 评论(0) 收藏举报

刷新页面返回顶部

dupuleng

Exercise : Softmax Regression

公告