# 机器学习之二：分类算法 之 逻辑回归

## 原理

### 1、模型假设

#### 对g(z)函数的理解

g(z)函数是来源于最大熵原理，通过拉格朗日乘数法（寻找变量受一个或多个条件限制的多元函数极值的方法）求偏导得出，故而 $h_\theta(x)$的值，其实是系统 "认为" 样本为 "1" 的概率值P，即：

$h_\theta(x) = P(y=1|x)$

#### 使用 sigmoid 函数的优点

（1） $t_0$ 时的值应当认为 0 还是 1？

（2） $t_0$ 发生了跃变，数学上求导麻烦。

### 2、误差函数

$J = \frac{1}{m} \sum_{i=1}^{m}[-y^{(i)}log(h_\theta(x^{(i)})) - (1-y^{(i)})log(1-h_\theta(x^{(i)}))]$

$grad = \frac{1}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$

## 一、最简单的二分类，一阶特征，直线边界

### 1、误差函数及偏导数

#### 1.1 误差函数实现

function [J, grad] = costFunction(theta, X, y)

m = length(y); % number of training examples
J = 0;

h = sigmoid(X*theta);

J = ((-y' * log(h)) - (1-y)' * log(1-h))/m;

grad = 1/m .* X' * (h-y);

% =============================================================

end


#### 1.2 误差函数测试

[m, n] = size(X);

% Add intercept term to x and X_test
X = [ones(m, 1) X];

% Initialize fitting parameters
initial_theta = zeros(n + 1, 1);

% Compute and display initial cost and gradient
[cost, grad] = costFunction(initial_theta, X, y);

fprintf('Cost at initial theta (zeros): %f\n', cost);
fprintf('Expected cost (approx): 0.693\n');
fprintf('Gradient at initial theta (zeros): \n');
fprintf('Expected gradients (approx):\n -0.1000\n -12.0092\n -11.2628\n');

% Compute and display cost and gradient with non-zero theta
test_theta = [-24; 0.2; 0.2];
[cost, grad] = costFunction(test_theta, X, y);

fprintf('\nCost at test theta: %f\n', cost);
fprintf('Expected cost (approx): 0.218\n');
fprintf('Expected gradients (approx):\n 0.043\n 2.566\n 2.647\n');

### 2、梯度下降算法

%  Set options for fminunc
options = optimset('GradObj', 'on', 'MaxIter', 400);

%  Run fminunc to obtain the optimal theta
%  This function will return theta and the cost
[theta, cost] = fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);

% Print theta to screen
fprintf('Cost at theta found by fminunc: %f\n', cost);
fprintf('Expected cost (approx): 0.203\n');
fprintf('theta: \n');
fprintf(' %f \n', theta);
fprintf('Expected theta (approx):\n');
fprintf(' -25.161\n 0.206\n 0.201\n');


### 3、预测

prob = sigmoid([1 45 85] * theta);
fprintf(['For a student with scores 45 and 85, we predict an admission ' ...
'probability of %f\n'], prob);
fprintf('Expected value: 0.775 +/- 0.002\n\n');

% Compute accuracy on our training set
p = predict(theta, X);

fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
fprintf('Expected accuracy (approx): 89.0\n');

## 二、多边形边界

### 1、数据预处理，生成多项式特征

X = mapFeature(X(:,1), X(:,2));

#### 生成多项式的方法如下：

function out = mapFeature(X1, X2)
degree = 6;
out = ones(size(X1(:,1)));
for i = 1:degree
for j = 0:i
out(:, end+1) = (X1.^(i-j)).*(X2.^j);
end
end

end

### 2、训练

% Initialize fitting parameters
initial_theta = zeros(size(X, 2), 1);

% Set Options
options = optimset('GradObj', 'on', 'MaxIter', 400);

% Optimize
[theta, J, exit_flag] = ...
fminunc(@(t)(costFunction(t, X, y)), initial_theta, options);

### 3、预测

p = predict(theta, X);
fprintf('Train Accuracy: %f\n', mean(double(p == y)) * 100);
function p = predict(theta, X)
m = size(X, 1); % Number of training examples
p = zeros(m, 1);
p_medium = sigmoid(X*theta);
pos = find(p_medium >= 0.5);
p(pos,1)=1;
end


## 三、多分类问题

X 为每张图的像素数值，即特征数 n = 400

y 为图上的数字，即y = {0,1,2,3,4,5,6,7,8,9}

### 1、训练

[all_theta] = oneVsAll(X, y, num_labels);


function [all_theta] = oneVsAll(X, y, num_labels)
m = size(X, 1);
n = size(X, 2);
all_theta = zeros(num_labels, n + 1);

% Add ones to the X data matrix
X = [ones(m, 1) X];

initial_thata  = zeros(n+1,1);
for c = 1:num_labels
[all_theta(c,:)] =  fmincg(@(t)(costFunction(t,X,(y ==c))),initial_thata,options);
end
end

### 2、预测

pred = predictOneVsAll(all_theta, X);

fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);

function p = predictOneVsAll(all_theta, X)

m = size(X, 1);
num_labels = size(all_theta, 1);
p = zeros(size(X, 1), 1);
X = [ones(m, 1) X];

% 计算得出X中，可能的所有概率值，取最大
h  = sigmoid(X* all_theta');
[~,p] = max(h,[],2);
end

