Python for Data Science - Naive Bayes Classifiers

Chapter 6 - Other Popular Machine Learning Methods

Segment 5 - Naive Bayes Classifiers

Naive Bayes Classifiers

Naive Bayes is a machine learning method you can use to predict the likelihood that an event will occur given evidence that's present in your data.

Conditional Probability

\[P(B|A) = \frac{P(A and B)}{P(A)} \]

Tree Types of Naive Bayes Model

  • Multinomial
  • Bernoulli
  • Gaussian

Naive Bayes Use Cases

  • Spam Detection
  • Customer Classification
  • Credit Risk Protection
  • Health Risk Protection

Naive Bayes Assumptions

Predictors are independent of each other.

A proiri assumption: the assumption the past conditions still hold true; when we make predictions from historical values we will get incorrect results if present circumstances have changed.

  • All regression models maintain a priori assumption as well
import numpy as np
import pandas as pd
import urllib
import sklearn

from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import accuracy_score
from sklearn.naive_bayes import BernoulliNB
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB

Naive Bayes

Using Naive Bayes to predict spam

url = "https://archive.ics.uci.edu/ml/machine-learning-databases/spambase/spambase.data"

import urllib.request

raw_data = urllib.request.urlopen(url)
dataset = np.loadtxt(raw_data, delimiter=',')
print(dataset[0])
[  0.      0.64    0.64    0.      0.32    0.      0.      0.      0.
   0.      0.      0.64    0.      0.      0.      0.32    0.      1.29
   1.93    0.      0.96    0.      0.      0.      0.      0.      0.
   0.      0.      0.      0.      0.      0.      0.      0.      0.
   0.      0.      0.      0.      0.      0.      0.      0.      0.
   0.      0.      0.      0.      0.      0.      0.778   0.      0.
   3.756  61.    278.      1.   ]
X = dataset[:,0:48]

y = dataset[:,-1]
X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=.2, random_state=17)
BernNB = BernoulliNB(binarize=True)
BernNB.fit(X_train, y_train)
print(BernNB)

y_expect = y_test
y_pred = BernNB.predict(X_test)

print(accuracy_score(y_expect, y_pred))
BernoulliNB(binarize=True)
0.8577633007600435
MultiNB = MultinomialNB()
MultiNB.fit(X_train, y_train)
print(MultiNB)

y_pred = MultiNB.predict(X_test)

print(accuracy_score(y_expect, y_pred))
MultinomialNB()
0.8816503800217155
GausNB = GaussianNB()
GausNB.fit(X_train, y_train)
print(GausNB)

y_pred = GausNB.predict(X_test)

print(accuracy_score(y_expect, y_pred))
GaussianNB()
0.8197611292073833
BernNB = BernoulliNB(binarize=0.1)
BernNB.fit(X_train, y_train)
print(BernNB)

y_expect = y_test
y_pred = BernNB.predict(X_test)

print(accuracy_score(y_expect, y_pred))
BernoulliNB(binarize=0.1)
0.9109663409337676
posted @ 2021-01-30 16:11  晨风_Eric  阅读(65)  评论(0编辑  收藏  举报