机器学习——数据准备与模型评估
一、实验目的
熟悉Python的基本操作,掌握对数据集的读写实现、对模型性能的评估实现的能力;
加深对训练集、测试集、N折交叉验证、模型评估标准的理解。
二、实验内容
(1)利用pandas库从本地读取 iris数据集;
(2)从scikit-learn库中直接加载iris数据集;
(3)实现五折交叉验证进行模型训练;
(4)计算并输出模型的准确度、精度、召回率和F1值。
# 关键代码提取
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer, accuracy_score, precision_score, recall_score, f1_score
# 1. 数据加载
iris = load_iris() # 加载鸢尾花数据集
X = iris.data # 特征数据 (150, 4)
y = iris.target # 目标标签 (150,)
# 2. 模型创建
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
# 3. 评估指标定义
scoring = {
'accuracy': 'accuracy',
'precision': make_scorer(precision_score, average='weighted'),
'recall': make_scorer(recall_score, average='weighted'),
'f1': make_scorer(f1_score, average='weighted')
}
# 4. 五折交叉验证
cv_results = cross_validate(
estimator=rf_classifier,
X=X,
y=y,
cv=5,
scoring=scoring,
return_train_score=False
)
# 5. 结果提取
accuracy_scores = cv_results['test_accuracy']
precision_scores = cv_results['test_precision']
recall_scores = cv_results['test_recall']
f1_scores = cv_results['test_f1']
# 6. 结果统计
print(f"准确度: {np.mean(accuracy_scores):.4f} (±{np.std(accuracy_scores):.4f})")
print(f"精度: {np.mean(precision_scores):.4f} (±{np.std(precision_scores):.4f})")
print(f"召回率: {np.mean(recall_scores):.4f} (±{np.std(recall_scores):.4f})")
print(f"F1值: {np.mean(f1_scores):.4f} (±{np.std(f1_scores):.4f})")

浙公网安备 33010602011771号