9.23 - 柟 - 博客园

9.23

算法伪代码

BEGIN

// (1) 加载Iris数据集并将其拆分为训练集和测试集

加载iris数据集

设置 X 为 iris 特征

设置 y 为 iris 标签

将 (X, y) 拆分为 (X_train, y_train) 和 (X_test, y_test)，测试集占1/3，按 y 进行分层，随机种子设为 42

// (2) 训练带有预剪枝和后剪枝的决策树模型

初始化分类器 clf 为 DecisionTreeClassifier，最大深度 max_depth = 3，最小叶子样本数 min_samples_leaf = 2，随机种子设为 42

训练 clf 以拟合 (X_train, y_train)

// (3) 使用五折交叉验证评估模型性能

初始化 cv 为 StratifiedKFold，n_splits = 5

scores_accuracy = 进行交叉验证(clf, X_train, y_train, cv, 评分标准 = '准确率')

scores_precision = 进行交叉验证(clf, X_train, y_train, cv, 评分标准 = '精确率_宏')

scores_recall = 进行交叉验证(clf, X_train, y_train, cv, 评分标准 = '召回率_宏')

scores_f1 = 进行交叉验证(clf, X_train, y_train, cv, 评分标准 = 'F1_宏')

// 打印训练集的性能评估结果

打印 '训练集准确率: ' + 平均值(scores_accuracy)

打印 '训练集精确率: ' + 平均值(scores_precision)

打印 '训练集召回率: ' + 平均值(scores_recall)

打印 '训练集F1值: ' + 平均值(scores_f1)

// (4) 使用测试集测试模型性能

y_pred = 预测(clf, X_test)

test_accuracy = 计算准确率(y_test, y_pred)

test_precision = 计算精确率(y_test, y_pred, 平均值 = '宏')

test_recall = 计算召回率(y_test, y_pred, 平均值 = '宏')

test_f1 = 计算F1值(y_test, y_pred, 平均值 = '宏')

// 打印测试集的性能评估结果

打印 '测试集准确率: ' + test_accuracy

打印 '测试集精确率: ' + test_precision

打印 '测试集召回率: ' + test_recall

打印 '测试集F1值: ' + test_f1

END

2. 算法主要代码

完整源代码\调用库方法（函数参数说明）

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# （1）加载iris数据集，并留出1/3作为测试集

iris = load_iris()

X, y = iris.data, iris.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/3, random_state=42, stratify=y)

# （2）使用训练集训练带有预剪枝和后剪枝的决策树模型

# 这里我们设置max_depth为3来进行预剪枝，min_samples_leaf为2来进行后剪枝

clf = DecisionTreeClassifier(max_depth=3, min_samples_leaf=2, random_state=42)

clf.fit(X_train, y_train)

# （3）使用五折交叉验证对模型性能进行评估

cv = StratifiedKFold(n_splits=5)

scores_accuracy = cross_val_score(clf, X_train, y_train, cv=cv, scoring='accuracy')

scores_precision = cross_val_score(clf, X_train, y_train, cv=cv, scoring='precision_macro')

scores_recall = cross_val_score(clf, X_train, y_train, cv=cv, scoring='recall_macro')

scores_f1 = cross_val_score(clf, X_train, y_train, cv=cv, scoring='f1_macro')

# 打印训练集的性能评估结果

print(f'训练集准确率: {scores_accuracy.mean():.2f}')

print(f'训练集精确率: {scores_precision.mean():.2f}')

print(f'训练集召回率: {scores_recall.mean():.2f}')

print(f'训练集F1值: {scores_f1.mean():.2f}')

# （4）使用测试集测试模型性能

y_pred = clf.predict(X_test)

test_accuracy = accuracy_score(y_test, y_pred)

test_precision = precision_score(y_test, y_pred, average='macro')

test_recall = recall_score(y_test, y_pred, average='macro')

test_f1 = f1_score(y_test, y_pred, average='macro')

# 打印测试集的性能评估结果

print(f'测试集准确率: {test_accuracy:.2f}')

print(f'测试集精确率: {test_precision:.2f}')

print(f'测试集召回率: {test_recall:.2f}')

print(f'测试集F1值: {test_f1:.2f}')

3. 训练结果截图（包括：准确率、精度（查准率）、召回率（查全率）、F1）

四、实验结果分析

1. 测试结果截图（包括：准确率、精度（查准率）、召回率（查全率）、F1）

2. 对比分析

一致性分析：如果训练集和测试集的性能指标（如准确率、精确率、召回率和F1值）都比较接近，说明模型泛化能力良好。即它不仅在训练集上表现良好，也能在未见过的数据上保持较高的性能。

过拟合的检查：如果训练集的性能远高于测试集，可能意味着模型过拟合，因此需要进一步调整模型参数（如增加或减少树的深度、尝试不同的剪枝方法等）。

精确率与召回率：需要根据任务的需求来判断精确率和召回率的平衡。比如在一些应用中，可能更侧重于提高召回率（如疾病筛查），而在其他情况下，则可能更注重提高精确率（如垃圾邮件检测）。

posted on 2025-01-06 15:21 柟阅读(6) 评论(0) 收藏举报

刷新页面返回顶部

公告