Exercise : Softmax Regression
上一节介绍了softmax regression的理论知识,这一节介绍它的具体实现。具体参考http://deeplearning.stanford.edu/wiki/index.php/Exercise:Softmax_Regression
实验介绍:
完成手写体识别,采用MNIST手写数据库,数字从0-9,训练样本6万个,测试样本1万个。每个样本由一个大小为28*28的图片表示
环境:matlab 2010a
理论基础:
该实验使用softmax模型,由于直接使用像素作为特征,因此不存在特征提取的过程,故只包含输入层与输出层,输入层神经元的个数为n(样本长度28*28),输出层神经
元数目为类别数K。损失函数与相应的梯度如下:


注:已加入正则项,lambda为衰减因子
相关函数:
bsxfun :可以对两个矩阵A , B进行element-wise操作,可以执行的操作有plus , minus , rdivide , ldivede , or , and ...... , 要求两个矩阵的大小一致,或者A是与B某一个维度大小一致的行或列向量,然后将其扩展为与A大小相同的矩阵
sparse: S = sparse(i,j,s,m,n,nzmax) 由i , j ,s生成一个大小为m*n包含nzmax个非零元素的矩阵S,并且S( i(k), j(k) ) = s(k)。 有时候一个矩阵非常稀疏,即只有极少数元素为非0,因为使用(i,j,k)三个值即可表示一个元素,这样可以省去极大的空间。
full: 生成一个正常矩阵,一般是用稀疏矩阵的结果来还原,使用 S = sparse(i,j,s,m,n,nzmax) 也可以达到类似的效果
一些问题:
1、根据作者提供的代码:loadMNISTImage.m 导入样本,有几句句代码需要注意:
images = fread(fp, numCols*numRows*numImages, 'unsigned char');
images = reshape(images, numCols, numRows, numImages);
images = permute(images,[2 1 3]);
首先将图片数据读入,由于刚开始读入的图片是倾斜的,因此使用permute进行旋转,但由于数据量太大,对于32位机器,内存不足以支持其进行旋转操作,因此会提示:Error using permute out of memory, Type HELP MEMORY for your options。
为了解决这个问题,我们可以只读取一部分数据进行训练,代码如下
numImages = 10000;
images = fread(fp, numCols*numRows*numImages, 'unsigned char');
images = reshape(images, numCols, numRows, numImages);
images = permute(images,[2 1 3]);
即修改根据自己机器的具体情况何必numImages的值,直到没有内存问题即可,我使用10000张训练样本
2、代码中有使用computeNumericalGradient 与minFunc 代码,但代码包中没有提供相应代码,可以去andrel ng Sparse_Autoencoder 那一节下载。
实验结果:
由于只使用了部分样本进行训练,因此准确率只有:
Accuracy: 90.300%
低于作者的 92.6%
实验代码:
softmaxExercise.m
%% CS294A/CS294W Softmax Exercise % Instructions % ------------ % % This file contains code that helps you get started on the % softmax exercise. You will need to write the softmax cost function % in softmaxCost.m and the softmax prediction function in softmaxPred.m. % For this exercise, you will not need to change any code in this file, % or any other files other than those mentioned above. % (However, you may be required to do so in later exercises) %%====================================================================== %% STEP 0: Initialise constants and parameters % % Here we define and initialise some constants which allow your code % to be used more generally on any arbitrary input. % We also initialise some parameters used for tuning the model. inputSize = 28 * 28; % Size of input vector (MNIST images are 28x28) numClasses = 10; % Number of classes (MNIST images fall into 10 classes) lambda = 1e-4; % Weight decay parameter %%====================================================================== %% STEP 1: Load data % % In this section, we load the input and output data. % For softmax regression on MNIST pixels, % the input data is the images, and % the output data is the labels. % % Change the filenames if you've saved the files under different names % On some platforms, the files might be saved as % train-images.idx3-ubyte / train-labels.idx1-ubyte images = loadMNISTImages('train-images.idx3-ubyte'); labels = loadMNISTLabels('train-labels.idx1-ubyte'); labels(labels==0) = 10; % Remap 0 to 10 inputData = images; % For debugging purposes, you may wish to reduce the size of the input data % in order to speed up gradient checking. % Here, we create synthetic dataset using random data for testing DEBUG = false; % Set DEBUG to true when debugging. if DEBUG inputSize = 8; inputData = randn(8, 100); labels = randi(10, 100, 1); end % Randomly initialise theta theta = 0.005 * randn(numClasses * inputSize, 1); %%====================================================================== %% STEP 2: Implement softmaxCost % % Implement softmaxCost in softmaxCost.m. [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, inputData, labels); %%====================================================================== %% STEP 3: Gradient checking % % As with any learning algorithm, you should always check that your % gradients are correct before learning the parameters. % if DEBUG numGrad = computeNumericalGradient( @(x) softmaxCost(x, numClasses, ... inputSize, lambda, inputData, labels), theta); % Use this to visually compare the gradients side by side disp([numGrad grad]); % Compare numerically computed gradients with those computed analytically diff = norm(numGrad-grad)/norm(numGrad+grad); disp(diff); % The difference should be small. % In our implementation, these values are usually less than 1e-7. % When your gradients are correct, congratulations! end %%====================================================================== %% STEP 4: Learning parameters % % Once you have verified that your gradients are correct, % you can start training your softmax regression code using softmaxTrain % (which uses minFunc). options.maxIter = 100; softmaxModel = softmaxTrain(inputSize, numClasses, lambda, ... inputData, labels, options); % Although we only use 100 iterations here to train a classifier for the % MNIST data set, in practice, training for more iterations is usually % beneficial. %%====================================================================== %% STEP 5: Testing % % You should now test your model against the test images. % To do this, you will first need to write softmaxPredict % (in softmaxPredict.m), which should return predictions % given a softmax model and the input data. images = loadMNISTImages('t10k-images.idx3-ubyte'); labels = loadMNISTLabels('t10k-labels.idx1-ubyte'); labels(labels==0) = 10; % Remap 0 to 10 inputData = images; % You will have to implement softmaxPredict in softmaxPredict.m [pred] = softmaxPredict(softmaxModel, inputData); acc = mean(labels(:) == pred(:)); fprintf('Accuracy: %0.3f%%\n', acc * 100); % Accuracy is the proportion of correctly classified images % After 100 iterations, the results for our implementation were: % % Accuracy: 92.200% % % If your values are too low (accuracy less than 0.91), you should check % your code for errors, and make sure you are training on the % entire data set of 60000 28x28 training images % (unless you modified the loading code, this should be the case)
softmaxCost.mfunction [cost, grad] = softmaxCost(theta, numClasses, inputSize, lambda, data, labels)
% numClasses - the number of classes % inputSize - the size N of the input vector % lambda - weight decay parameter % data - the N x M input matrix, where each column data(:, i) corresponds to % a single test set % labels - an M x 1 matrix containing the labels corresponding for the input data % % Unroll the parameters from theta theta = reshape(theta, numClasses, inputSize);%将输入的参数列向量变成一个矩阵 numCases = size(data, 2);%输入样本的个数 groundTruth = full(sparse(labels, 1:numCases, 1));%这里sparse是生成一个稀疏矩阵,该矩阵中的值都是第三个值1 %稀疏矩阵的下标由labels和1:numCases对应值构成 %numClasses * M cost = 0; thetagrad = zeros(numClasses, inputSize); % numClasses * N %% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute the cost and gradient for softmax regression. % You need to compute thetagrad and cost. % The groundTruth matrix might come in handy. M = bsxfun(@minus,theta*data,max(theta*data, [], 1)); % numClasses * N * N * M = numClasses * M M = exp(M); % numClasses * M p = bsxfun(@rdivide, M, sum(M));%numClasses * M cost = -1/numCases * groundTruth(:)' * log(p(:)) + lambda/2 * sum(theta(:) .^ 2); %这里使用a(:)将矩阵转换为列向量,然后再进行计算,非常机智,减少一个sum操作
%因为这里是矩阵的element-wise的计算,最后再对矩阵所有元素求和,如果按照传统方式来做
%需要两次sum操作,而如果先将矩阵转换为列向量只需要一次sum操作
% cost = -1/numCases * sum( sum(groundTruth .* log(p)) ) + lambda/2 * sum( sum( theta.^2) ) ;%两次sum操作 thetagrad = -1/numCases * (groundTruth - p) * data' + lambda * theta; % ------------------------------------------------------------------ % Unroll the gradient matrices into a vector for minFunc grad = [thetagrad(:)]; end
softmaxPredict.m:
function [pred] = softmaxPredict(softmaxModel, data) % softmaxModel - model trained using softmaxTrain % data - the N x M input matrix, where each column data(:, i) corresponds to % a single test set % % Your code should produce the prediction matrix % pred, where pred(i) is argmax_c P(y(c) | x(i)). % Unroll the parameters from theta theta = softmaxModel.optTheta; % this provides a numClasses x inputSize matrix pred = zeros(1, size(data, 2)); %% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute pred using theta assuming that the labels start % from 1. M = theta * data; [preValue pred] = max( theta * data ); % --------------------------------------------------------------------- end

浙公网安备 33010602011771号