Exercise : Logistic Regression 2

前言：

上一篇博文 Logistic Regression 通LBFGS方法求解，本篇通过牛顿法求解，通过学习http://openclassroom.stanford.edu/MainFolder/DocumentPage.php?course=DeepLearning&doc=exercises/ex4/ex4.html 完成。

理论知识：

hypthesis function:

目标函数：

目标函数为什么选择交叉熵（cross entropy）而不选择平方差（squared error）呢，这是由于前者的函数是凸函数，而后者是非凸问题，在优化时容易陷入局部最优。

因为 $h_{\theta}(x) \in{[0,1]}$ ，因此只需要研究log函数在[0,1]上的图像特点。

梯度：

海森矩阵：

牛顿方法迭代方程：

梯度下降法迭代方程：

　　 $\theta^{(t+1)} = \theta^t - \alpha \bigtriangledown J_{\theta}$

牛顿法：

　　优点：收敛速度快，不需要多余参数

　　缺点：由于每一次迭代需要计算海森矩阵，因此消耗比较大 o( n^3 ) （计算H复杂度O(n^2)，计算梯度复杂度O(n) )

梯度下降法：

　　优点：每一次迭代计算简单 o(n)

　　缺点：收敛速度慢，需要多余参数：学习速率

那如何来选择这两种不同的方法呢：

当数据维度较小时，计算海森矩阵没那么复杂，因此选择牛顿法；当数据维度较大时，海森矩阵的计算非常耗时且复杂，因此选择梯度下降法。

一般认为维度小于1000时较小，大于10000时维度较大。

求解出最优的 $\theta$ ，如果画出分界面呢。我们的分界面是 $\theta^{T}x = 0$ ，实验中我们的x的维度为3. 即

$\theta(1) + \theta(2)*x(1) + \theta(3) * x(2) = 0 ;$

$x(2) = \frac{-1}{\theta(3)} * ( \theta(1) + \theta(2)*x(1))$

两点确定一条直线，只需要求出x(2)的极大值与极小值，并计算相应的y，就可以画出分界面。

实验结果：

只需要6次迭代就可以收敛，可以看出牛顿法的收敛速度非常快。

目标函数随迭代次数的变化

实验代码：

data = load('ex4x.dat');
label = load('ex4y.dat');
data = [ones( size(data,1) , 1 ) data ]; % add intercept term
positiveIndex = find( label == 1 );
negativeIndex = find( label == 0 );

positive = data( positiveIndex , : );
negative = data( negativeIndex , : );

positiveLabel = label( positiveIndex , : );
negativeLabel = label( negativeIndex , : );
% plot original data
figure;
plot( positive(: , 2) , positive(: , 3 ) , 'r+'); hold on;
plot( negative(: , 2) , negative(: , 3 ) , 'g*');

% Newton's Method
MAX_ITER = 20;
m = size( data , 1 );
theta = zeros( size(data , 2) , 1 );
grad = zeros( size( theta ) );
J = inf;
x = data;
y = label;
for i = 1 : MAX_ITER
    h = sigmoid( data * theta );
    J_new =  -1/m * ( y' * log( h ) + (1-y)' * log( 1-h ) );
    if abs( J_new - J(1,end) ) < 1e-14 
        break;
    else
        J = [ J J_new ];
    end

    grad = 1/m *  x' * ( h - y );
    H = 1/m * x' * diag( h ) * diag( 1-h ) * x;
    theta = theta - H \ grad;
end



if i < MAX_ITER
  res =  sprintf( 'J - J_new < 1e-14  \nIteration times = %d\n' , i-1 );
  disp(res);
else
    sprintf(' Exceeded Maximum Number of Iterations\n');
end

%plot decision boundray
plot_x = [min( x(:,2) ) - 4   , max( x(: , 2) ) + 4 ];
plot_y = (-1/theta(2) ) .* ( theta(2) .* plot_x  + theta(1) );
plot( plot_x , plot_y );
legend('Admitted' , 'Not admitted','Decision Boundry' );
hold off;
%plot J and iteration
figure;
plot( J , 1:size(J , 2) , 'o--' , 'MarkerFaceColor' , 'r' , 'MarkerSize' , 8 );
xlabel('#iteration');
ylabel('J( theta )');

posted @ 2014-12-19 18:15 dupuleng 阅读(278) 评论(0) 收藏举报

刷新页面返回顶部

dupuleng

Exercise : Logistic Regression 2

公告