机器学习100天-day4,5,6,8逻辑回归

机器学习100天-day4,5,6,8逻辑回归

在这里插入图片描述

 

 一,数据导入

 

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

dataset = pd.read_csv('D:\\100Days\datasets\Social_Network_Ads.csv')
#print(dataset.head(5))

 

    User ID  Gender  Age  EstimatedSalary  Purchased
0  15624510    Male   19            19000          0
1  15810944    Male   35            20000          0
2  15668575  Female   26            43000          0
3  15603246  Female   27            57000          0
4  15804002    Male   19            76000          0

 

 

将类别变量转为哑变量

dataset = pd.get_dummies(dataset,columns=['Gender'])
print(dataset.head())
    User ID  Age  EstimatedSalary  Purchased  Gender_Female  Gender_Male
0  15624510   19            19000          0              0            1
1  15810944   35            20000          0              0            1
2  15668575   26            43000          0              1            0

 

 

检测是否有nan值

print(dataset.isnull().sum())
User ID            0
Age                0
EstimatedSalary    0
Purchased          0
Gender_Female      0
Gender_Male        0
dtype: int64

 

 

 划分数据集

#划分数据集
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
X = dataset[['Age','EstimatedSalary','Gender_Female','Gender_Male']]
ss = StandardScaler()
X = ss.fit_transform(X)
Y = dataset['Purchased']
X_train,X_test,Y_train,Y_test = train_test_split(X,Y,test_size=0.25,random_state=0)

 

将X的数据进行归一化处理 

 

二,逻辑回归模型

 

from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(X_train,Y_train)
y_pred = logistic.predict(X_test)

 

 

三,评估预测

 生成混淆矩阵

from sklearn import metrics
cm = metrics.confusion_matrix(Y_test,y_pred)
print(cm)
print(metrics.accuracy_score(Y_test,y_pred))

 

 

[[65  3]
 [ 6 26]]
0.91

 

 

混淆矩阵(confusion matrix)是机器学习尤其是统计分类中常用的用以判断分类好坏的方法,如下:

TP(True Positive): 真实为0,预测也为0

FN(False Negative): 真实为0,预测为1

FP(False Positive): 真实为1,预测为0

TN(True Negative): 真实为0,预测也为0

 矩阵:

总体准确率:

 

 由此可理解示例中混淆矩阵和准确率的含义

 

四、逻辑回归详解-day8

 推荐阅读文章

翻译,https://blog.csdn.net/Neuf_Soleil/article/details/81712097,链接里有原文链接

 

 

posted @ 2019-01-16 11:30  forthlss  阅读(316)  评论(1编辑  收藏  举报