入门系列之Scikit-learn在Python中构建机器学习分类器

准备

• Python 3 本地编程环境
• 在virtualenv中安装Jupyter Notebook。Jupyter Notebooks在运行机器学习实验时非常有用。您可以运行短代码块并快速查看结果，从而轻松测试和调试代码。

第一步 - 导入Scikit-learn

$. my_env/bin/activate  激活我们的编程环境后，检查是否已安装Sckikit-learn模块： (my_env)$ python -c "import sklearn"


Traceback (most recent call last): File "<string>", line 1, in <module> ImportError: No module named 'sklearn'


(my_env) $pip install scikit-learn[alldeps]  安装完成后，启动Jupyter Notebook： (my_env)$ jupyter notebook


ML Tutorial

import sklearn


Notebook

第二步 - 导入Scikit-learn的数据集

Scikit-learn安装了各种数据集，我们可以将其加载到Python中，并包含我们想要的数据集。导入并加载数据集：

ML Tutorial

...



ML Tutorial

...

# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']


ML Tutorial

...

# Look at our data
print(label_names)
print(labels[0])
print(feature_names[0])
print(features[0])


第三步 - 将数据组织到集合中

ML Tutorial

...

from sklearn.model_selection import train_test_split

# Split our data
train, test, train_labels, test_labels = train_test_split(features,
labels,
test_size=0.33,
random_state=42)


第四步 - 构建和评估模型

ML Tutorial

...

from sklearn.naive_bayes import GaussianNB

# Initialize our classifier
gnb = GaussianNB()

# Train our classifier
model = gnb.fit(train, train_labels)


ML Tutorial

...

# Make predictions
preds = gnb.predict(test)
print(preds)


第五步 - 评估模型的准确性

ML Tutorial

...

from sklearn.metrics import accuracy_score

# Evaluate accuracy
print(accuracy_score(test_labels, preds))


ML教程

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score

# Organize our data
label_names = data['target_names']
labels = data['target']
feature_names = data['feature_names']
features = data['data']

# Look at our data
print(label_names)
print('Class label = ', labels[0])
print(feature_names)
print(features[0])

# Split our data
train, test, train_labels, test_labels = train_test_split(features,
labels,
test_size=0.33,
random_state=42)

# Initialize our classifier
gnb = GaussianNB()

# Train our classifier
model = gnb.fit(train, train_labels)

# Make predictions
preds = gnb.predict(test)
print(preds)

# Evaluate accuracy
print(accuracy_score(test_labels, preds))


相关阅读

posted @ 2018-08-02 10:42  腾讯云+社区  阅读(3910)  评论(0编辑  收藏