机器学习——数据准备与模型评估

一、实验目的
熟悉Python的基本操作,掌握对数据集的读写实现、对模型性能的评估实现的能力;
加深对训练集、测试集、N折交叉验证、模型评估标准的理解。

二、实验内容
(1)利用pandas库从本地读取 iris数据集;
(2)从scikit-learn库中直接加载iris数据集;
(3)实现五折交叉验证进行模型训练;
(4)计算并输出模型的准确度、精度、召回率和F1值。

# 关键代码提取
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score

# 1. 数据加载
iris = load_iris()  # 加载鸢尾花数据集
X = iris.data       # 特征数据 (150, 4)
y = iris.target     # 目标标签 (150,)

# 2. 模型创建
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)

# 3. 评估指标定义
scoring = {
    'accuracy': 'accuracy',
    'precision': make_scorer(precision_score, average='weighted'),
    'recall': make_scorer(recall_score, average='weighted'),
    'f1': make_scorer(f1_score, average='weighted')
}

# 4. 五折交叉验证
cv_results = cross_validate(
    estimator=rf_classifier,
    X=X, 
    y=y,
    cv=5,
    scoring=scoring,
    return_train_score=False
)

# 5. 结果提取
accuracy_scores = cv_results['test_accuracy']
precision_scores = cv_results['test_precision']
recall_scores = cv_results['test_recall']
f1_scores = cv_results['test_f1']

# 6. 结果统计
print(f"准确度: {np.mean(accuracy_scores):.4f} (±{np.std(accuracy_scores):.4f})")
print(f"精度: {np.mean(precision_scores):.4f} (±{np.std(precision_scores):.4f})")
print(f"召回率: {np.mean(recall_scores):.4f} (±{np.std(recall_scores):.4f})")
print(f"F1值: {np.mean(f1_scores):.4f} (±{np.std(f1_scores):.4f})")
posted @ 2025-11-18 21:51  Look_Back  阅读(3)  评论(0)    收藏  举报