Softmax Regression Review
前言:
今天重新看了andrel ng ufldl上关于softmax regression的另一篇教程,主要内容基本和上一篇基本一致。但在编程实现过程中有一些差异,可能两部分内容代码实现是由不同的人实现的。原来只是想巩固一下的,不过通过不同的代码实现也发现了一些问题。
理论基础:
为了描述方便,下面将原来教程中的方法称为方法1,本节的方法称为方法2
softmax regreesion的参数有一个特性:冗余。即当我们学习到一组参数,如果同时减去一个值,不影响最后的预测值。

在方法1中,作者通过给目标函数增加正则项来解决这个:
![]()
而在方法2中,作者在编程中通过设,没有加正则项

实验结果:
方法1准确率: Accuracy: 90.300%
方法2准确率: Accuracy: 87.5%
方法2+正则项: Accuracy: 89.5%
实验分析: 可以看出,加正则项的方法稍微有效一些,这也是深度学习中我们解决些类问题经常采用的方法,如SVM,sparse coding等。正则项中可以通过衰减因子 控制误差项与正则项的重要程度,更加灵活;而将
的方法比较简单粗暴。
实验主要代码:
方法1代码见作者另一篇博文
方法2代码:
stater code见:http://ufldl.stanford.edu/tutorial/StarterCode/
ex1c_softmax.m
addpath ../common addpath ../common/minFunc_2012/minFunc addpath ../common/minFunc_2012/minFunc/compiled % Load the MNIST data for this exercise. % train.X and test.X will contain the training and testing images. % Each matrix has size [n,m] where: % m is the number of examples. % n is the number of pixels in each image. % train.y and test.y will contain the corresponding labels (0 to 9). binary_digits = false; num_classes = 10; [train,test] = ex1_load_mnist(binary_digits); % Add row of 1s to the dataset to act as an intercept term. train.X = [ones(1,size(train.X,2)); train.X]; test.X = [ones(1,size(test.X,2)); test.X]; train.y = train.y+1; % make labels 1-based. test.y = test.y+1; % make labels 1-based. % Training set info m=size(train.X,2); n=size(train.X,1); % Train softmax classifier using minFunc options = struct('MaxIter', 200); % Initialize theta. We use a matrix where each column corresponds to a class, % and each row is a classifier coefficient for that class. % Inside minFunc, theta will be stretched out into a long vector (theta(:)). % We only use num_classes-1 columns, since the last column is always assumed 0. theta = rand(n,num_classes-1)*0.001; % Call minFunc with the softmax_regression_vec.m file as objective. % % TODO: Implement batch softmax regression in the softmax_regression_vec.m % file using a vectorized implementation. % tic; theta(:)=minFunc(@softmax_regression_vec, theta(:), options, train.X, train.y); fprintf('Optimization took %f seconds.\n', toc); theta=[theta, zeros(n,1)]; % expand theta to include the last class. % Print out training accuracy. tic; accuracy = multi_classifier_accuracy(theta,train.X,train.y); fprintf('Training accuracy: %2.1f%%\n', 100*accuracy); % Print out test accuracy. accuracy = multi_classifier_accuracy(theta,test.X,test.y); fprintf('Test accuracy: %2.1f%%\n', 100*accuracy); % % for learning curves % global test % global train % test.err{end+1} = multi_classifier_accuracy(theta,test.X,test.y); % train.err{end+1} = multi_classifier_accuracy(theta,train.X,train.y);
softmax_regression_vec.m
function [f,g] = softmax_regression(theta, X,y) % % Arguments: % theta - A vector containing the parameter values to optimize. % In minFunc, theta is reshaped to a long vector. So we need to % resize it to an n-by-(num_classes-1) matrix. % Recall that we assume theta(:,num_classes) = 0. % % X - The examples stored in a matrix. % X(i,j) is the i'th coordinate of the j'th example. % y - The label for each example. y(j) is the j'th example's label. % m=size(X,2); n=size(X,1); % theta is a vector; need to reshape to n x num_classes. % last column is all zeros theta=reshape(theta, n, []); num_classes=size(theta,2)+1; % initialize objective value and gradient. f = 0; g = zeros(size(theta)); % % TODO: Compute the softmax objective function and gradient using vectorized code. % Store the objective function value in 'f', and the gradient in 'g'. % Before returning g, make sure you form it back into a vector with g=g(:); % %%% YOUR CODE HERE %%% lambda = 1e-4; theta = [ theta zeros( n,1) ]; % last column is zeros h = exp( theta' * X );; p = bsxfun(@rdivide , h , sum(h , 1) ); % normalize log_p = log( p ); gt = full( sparse( y , 1:m , 1) ); %with weight decay term % f = -1/m * gt(:)' * log_p(:) + lambda/2 * sum( theta(:) .^ 2 ); % g = -1/m * X * (gt - p )' + lambda * theta; %without weight decay term f = -1/m * gt(:)' * log_p(:); g = -1/m * X * (gt - p )'; g = g( : , 1:end-1); % take last column away g=g(:); % make gradient a vector for minFunc
参考资料:
http://ufldl.stanford.edu/tutorial/StarterCode/
http://ufldl.stanford.edu/tutorial/supervised/SoftmaxRegression/
http://deeplearning.stanford.edu/wiki/index.php/Softmax_Regression
http://www.cnblogs.com/dupuleng/articles/4118178.html
http://www.cnblogs.com/dupuleng/articles/4118387.html

浙公网安备 33010602011771号