机器学习——逻辑回归（分类）

前言：真的是改了很多次！细节真的很多！

机器学习专栏：

文章目录

逻辑回归（分类）

1、基本原理
4、梯度下降法
4、sklearn实现逻辑回归
5、多分类问题

5.1多分类原理
5.2sklearn实现多分类

逻辑回归（分类）

1、基本原理

逻辑回归用于分类，是对样本属于某一类的概率进行预测，对数几率函数：
$g(z)=\frac{1}{1+e^{-z}}$

给定数据集 $D={(x^{(1)},y^{(1)});(x^{(2)},y^{(2)});...;(x^{(m)},y^{(i)} )}$ ,其中 $x^{(i)}$ 表示第 $i$ 个样本点 $x^{(i)}\in{R^n}$ (表示有n个属性值)。
考虑到 $y=\theta_0+\theta_1 x^{(i)}_1+...+\theta_nx^{(i)}_n$ 取值是连续的，因此它不能拟合离散变量。可以考虑用它来拟合条件概率，因为概率的取值也是连续的。但是其取值为 R ，不符合概率取值为 0 到 1，因此考虑采用广义线性模型。
对于一个简单的二分类问题，我们用logistics函数来代替理想的阶跃函数来作为连接函数：
$h_\theta(x^{(i)})=\frac{1}{1+e^{-\theta^Tx^{(i)}}}$
令 $z=\theta^Tx^{(i)}$

于是有：
$ln\frac{h_\theta(x^{(i)})}{1-h_\theta(x^{(i)})}=\theta^T x^{(i)}$
事件发生与不发生的概率比值称为几率（odds）, $h_\theta(x^{(i)})$ 表示发生的概率，即：
$\left\{\begin{matrix} P(y=1|x^{(i)},\theta)=h_\theta(x^{(i)})\\ P(y=0|x^{(i)},\theta)=1-h_\theta(x^{(i)}) \end{matrix}\right.$
综合两式可得：
$P(y|x^{(i)};\theta)=(h_\theta(x^{(i)}))^y(1-h_\theta(x^{(i)}))^{1-y}$
因此逻辑回归的思路是，先拟合决策边界(不局限于线性，还可以是多项式，这个过程可以理解为感知机)，再建立这个边界与分类的概率联系（通过对数几率函数），从而得到了二分类情况下的概率。

关于对数似然估计的概念我这里就不作过多介绍了，可参考浙江大学的《概率论与数理统计》，我们由“最大似然估计法”去得出代价函数，我们要求每个样本属于其真实标记的概率越大越好，所以：
$max\quad L(\theta)=\prod_{i=1}^{m}P(y^{(i)}|x^{(i)},\theta)$
取“对数似然”得：
$max\quad logL(\theta)=\sum_{i=1}^{m}logP(y^{(i)}|x^{(i)},\theta)$
由上，我们将代价函数定为：
$J(\theta)=\frac{1}{m}\sum_{i=1}^{m}C(h_\theta(x^{(i)}),y^{(i)})=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)}))]$
一次性计算出所有样本的预测值（是个概率值）:
$h=g(X\theta)$
其中， $X=\begin{bmatrix} x_0^1 & x_1^1 &... &x_n^1 \\ x_0^2 & x_1^1 &... &x_n^1 \\ : & : &... &:\\ x_0^m & x_1^m &... &x_n^m \end{bmatrix}$ 表示训练集， $\theta=\begin{bmatrix} \theta_0\\ \theta_1\\ :\\ \theta_n\end{bmatrix}$
将代价函数写成矩阵形式：
$J(\theta)=-\frac{1}{m}(Y^Tlog(h)-(1-Y)^Tlog(1-h))$
其中， $Y=\begin{bmatrix} y^{(1)}\\ y^{(2)}\\ :\\ y^{(m)}\end{bmatrix}$ 表示由所有训练样本输出构成的向量， $h=\begin{bmatrix} h(1)\\ h(2)\\ :\\ h(m) \end{bmatrix}$ 表示计算得出所有样本的预测值（是个概率值）

4、梯度下降法

梯度下降公式：
$\theta_j:=\theta_j-\frac{\alpha}{m}\frac{\partial}{\partial\theta_j}J(\theta) \\ \theta_j:=\theta_j-\frac{\alpha}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$
【logistics回归梯度下降公式的简单推导】
$\theta_j:=\theta_j-\frac{\alpha}{m}\frac{\partial}{\partial\theta_j}J(\theta) \\J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}log(h_\theta(x^{(i)}))+(1-y^{(i)})log(1-h_\theta(x^{(i)})) \\ J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}log(g(\theta^Tx^{(i)}))+(1-y^{(i)})log(1-g(\theta^Tx^{(i)})) \\ \frac{\partial J(\theta)}{\partial \theta_j}=-\frac{1}{m}\sum_{i=1}^{m}[\frac{y^{(i)}}{g(\theta^Tx^{(i)})}-(1-y^{(i)})\frac{1}{1-g(\theta^Tx^{(i)})}]\frac{\partial g(\theta^Tx^{(i)})}{\partial \theta_j} \\ 先求：\frac{\partial g(\theta^Tx^{(i)})}{\partial \theta_j}=\frac{\frac{\partial (1+e^{-\theta^Tx^{(i)}})}{\partial \theta_j}}{(1+e^{-\theta^Tx^{(i)}})^2}=-\frac{e^{-\theta^Tx^{(i)}}x_j^{(i)}}{(1+e^{-\theta^Tx^{(i)}})^2} \\即：\frac{\partial g(\theta^Tx^{(i)})}{\partial \theta_j}=h_\theta(x^{(i)})(1-h_\theta(x^{(i)}))x_j^{(i)} \\ 代入\frac{\partial J(\theta)}{\partial \theta_j}中，得：\theta_j:=\theta_j-\frac{\alpha}{m}\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})x_j^{(i)}$

4、sklearn实现逻辑回归

# -*- coding: utf-8 -*-
"""
Created on Tue Nov 12 19:28:12 2019

@author: 1
"""

from sklearn.model_selection import train_test_split
#导入logistics回归模型
from sklearn.linear_model import LogisticRegression
import numpy as np
import pandas as pd


df=pd.read_csv('D:\\workspace\\python\machine learning\\data\\breast_cancer.csv',sep=',',header=None,skiprows=1)
X = df.iloc[:,0:29]
y = df.iloc[:,30]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression(solver='liblinear')
model.fit(X_train, y_train)
train_score = model.score(X_train, y_train)#R2值越接近1越好
cv_score = model.score(X_test, y_test)
print('train_score:{0:.6f}, cv_score:{1:.6f}'.format(train_score, cv_score))

y_pre = model.predict(X_test)
y_pre_proba = model.predict_proba(X_test)#输出概率

print('matchs:{0}/{1}'.format(np.equal(y_pre, y_test).shape[0], y_test.shape[0]))#shape[0]列，shape[1]行
#print('y_pre:{}, \ny_pre_proba:{}'.format(y_pre, y_pre_proba))#输出概率预测值

5、多分类问题

5.1多分类原理

为了实现多分类，我们将多个类(D)中的一个类标记为正向类(y=1），然后将其他所有类都标记为负向类，这个模型记作 $h_\theta^{(1)}(X)$ 。接着，类似地第我们选择另一个类标记为正向类(y=2)，再将其它类都标记为负向类，将这个模型记作 $h_\theta^{(2)}(X)$ 依此类推。最后我们得到一系列的模型简记为：
$h_\theta^{(k)}(X)=P(y=k|X,\theta)$ 其中 $k=1,2,...,D$
最后，在做预测时，对每一个输入的测试变量，我们将所有的分类机都运行一遍，选择可能性最高的分类机的输出结果作为分类结果：
$max\;h_\theta^{(k)}(x^{(i)})$

5.2sklearn实现多分类

# -*- coding: utf-8 -*-
"""
Created on Tue Nov 12 22:07:34 2019

@author: 1
"""

from sklearn.model_selection import train_test_split
#导入logistics回归模型
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score#预测准确率
import pandas as pd
import matplotlib.pyplot as plt


df=pd.read_csv('D:\\workspace\\python\machine learning\\data\\iris.csv',sep=',')
X = df.iloc[:,0:1]
y = df.iloc[:,4]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LogisticRegression(solver='liblinear')
model.fit(X_train, y_train)
y_pre=model.predict(X_test)
print('accuracy_score：{}'.format(accuracy_score(y_test,y_pre)))#预测准确率
y_pre_proba = model.predict_proba(X_test)
print('y_pre:{}, \ny_pre_proba:{}'.format(y_pre, y_pre_proba))#输出概率预测值

#画原始数据图
colors = ['blue', 'red','green']
plt.figure(1)
for i in range(3):
    plt.scatter(df.loc[df['virginica']==i].iloc[:,0],df.loc[df['virginica']==i].iloc[:,1],c=colors[i])
plt.title('原始数据分类结果')

#画分类结果图
colors = ['blue', 'red','green']
plt.figure(2)
df['virginica_pre']=model.predict(X)
for i in range(3):
    plt.scatter(df.loc[df['virginica_pre']==i].iloc[:,0],df.loc[df['virginica_pre']==i].iloc[:,1],c=colors[i])
plt.title('预测数据分类结果')

结果可视化：

给大家推荐一个博客：一文详尽讲解什么是逻辑回归

posted @ 2019-11-17 14:19 Tao_RY 阅读(206) 评论(0) 收藏举报

刷新页面返回顶部

Tao_RY的博客