鸢尾花数据集的分类

基础知识:
  特征抽取:

   

 

 

      

 

    文本特征抽取:

       

 

   鸢尾花数据集:

    

iris = load_iris()
# 查看描述:
print(iris["DESCR"])
# 查看特征值名字:
print(iris.feature_names)
# 数据集类型
print(iris.data.shape)
# 数据集的划分:构建,检验模型:
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)
print(x_test, x_train.shape)

 

 

 利用决策树分类:

  

def decision_iris():
# 1)获取数据集
iris = load_iris()

# 2)划分数据集
x_train, x_test, y_train, y_test = train_test_split(iris.data, iris.target, random_state=22)

# 3)决策树预估器
estimator = DecisionTreeClassifier(criterion="entropy")
estimator.fit(x_train, y_train)

# 4)模型评估
# 方法1:直接比对真实值和预测值
y_predict = estimator.predict(x_test)
print("y_predict:\n", y_predict)
print("直接比对真实值和预测值:\n", y_test == y_predict)

# 计算准确率
score = estimator.score(x_test, y_test)
print("准确率为:\n", score)

# 可视化决策树
export_graphviz(estimator, out_file="iris_tree.dot", feature_names=iris.feature_names)

return None

准确率为:
0.8947368421052632


dot类型:
digraph Tree {
node [shape=box, fontname="helvetica"] ;
edge [fontname="helvetica"] ;
0 [label="petal length (cm) <= 2.45\nentropy = 1.584\nsamples = 112\nvalue = [39, 37, 36]"] ;
1 [label="entropy = 0.0\nsamples = 39\nvalue = [39, 0, 0]"] ;
0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;
2 [label="petal width (cm) <= 1.75\nentropy = 1.0\nsamples = 73\nvalue = [0, 37, 36]"] ;
0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] ;
3 [label="petal length (cm) <= 5.05\nentropy = 0.391\nsamples = 39\nvalue = [0, 36, 3]"] ;
2 -> 3 ;
4 [label="petal width (cm) <= 1.65\nentropy = 0.183\nsamples = 36\nvalue = [0, 35, 1]"] ;
3 -> 4 ;
5 [label="entropy = 0.0\nsamples = 34\nvalue = [0, 34, 0]"] ;
4 -> 5 ;
6 [label="sepal width (cm) <= 2.75\nentropy = 1.0\nsamples = 2\nvalue = [0, 1, 1]"] ;
4 -> 6 ;
7 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 0, 1]"] ;
6 -> 7 ;
8 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1, 0]"] ;
6 -> 8 ;
9 [label="sepal length (cm) <= 6.05\nentropy = 0.918\nsamples = 3\nvalue = [0, 1, 2]"] ;
3 -> 9 ;
10 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1, 0]"] ;
9 -> 10 ;
11 [label="entropy = 0.0\nsamples = 2\nvalue = [0, 0, 2]"] ;
9 -> 11 ;
12 [label="petal length (cm) <= 4.85\nentropy = 0.191\nsamples = 34\nvalue = [0, 1, 33]"] ;
2 -> 12 ;
13 [label="entropy = 0.0\nsamples = 1\nvalue = [0, 1, 0]"] ;
12 -> 13 ;
14 [label="entropy = 0.0\nsamples = 33\nvalue = [0, 0, 33]"] ;
12 -> 14 ;
}
Graphviz Online (dreampuf.github.io)运行后结果如下:

  可视化如图:

 利用KNN算法分类(K-紧邻算法):

 

 

  

准确率为:
0.9736842105263158

posted @ 2022-02-22 19:05  GIPV  阅读(31)  评论(0编辑  收藏  举报