可解释故障诊断的研究：热力学仿真辅助随机森林

原始文献📜:
Thermodynamic Simulation-assisted Random Forest: Towards explainable fault diagnosis of combustion chamber components of marine diesel engines, Measurement, 2025.

引言

在柴油机燃烧室故障诊断领域，工程实践通常会遇到三类共性瓶颈：

样本不足：真实故障样本获取难、成本高，限制了数据驱动模型的泛化能力；
机理模型难实时应用：纯物理建模计算量大，难以直接支撑在线监测与快速判别；
模型解释弱：传统黑箱方法给出“结果”较多，却难解释“为什么”。

针对上述问题，《Measurement》中的这项研究提出了一个关键思路：让物理仿真成为机器学习的“先验增强器”，而非仅依赖数据拟合全部机理。该路径兼顾了工程可用性与诊断可信性。

由此，作者构建了热力学仿真辅助随机森林框架。其核心优势在于：在小样本条件下仍保持较高识别精度，并通过可解释分析将诊断结果与热力学机理建立对应关系，提升结论的工程可采信度。

热力学仿真辅助随机森林整体框架

论文提出的热力学仿真辅助随机森林（Thermodynamic Simulation-assisted Random Forest, TSRF）框架，本质上是将“机理建模—数据学习—结果解释”串联成统一闭环。

数据生成与预处理：依托热力学仿真生成覆盖正常/故障工况的数据，并完成标准化处理；
模型训练与验证：采用随机森林开展分类训练，并以交叉验证评估稳定性与准确性；
可解释性分析：借助SHAP解释模型输出
，识别关键热力学变量及其对故障判别的贡献方向与强度。

热力学仿真辅助随机森林(TSRF)整体框架图

一维热力学模型构建与校准

研究首先搭建了一维热力学模型，用于刻画柴油机燃烧室在运行过程中的关键状态演化。随后通过实测数据对模型进行参数标定，确保仿真结果在主要热力学特征上与真实工况一致。

柴油机一维热力学模型示意图

此外，作者利用数据采集模块(Data Collection Module, DCM)的现场运行数据持续校准模型，使仿真输出在关键热力学指标层面具备可对照性与可验证性。

数据采集模块(DCM)图

燃烧室典型故障的物理建模与仿真

完成模型校准后，作者围绕燃烧室关键参数进行定向扰动，构建并仿真五类典型故障场景，且每一类故障均具有明确的机理支撑。

故障编号	故障类型	物理机制	关键参数调节
F1	缸盖裂纹	热–机械载荷导致裂纹产生，结构与散热能力退化	缸盖表面温度 TH 提升至 346 °C
F2	活塞烧蚀	材料退化引发热烧蚀，加剧窜气	活塞温度 TP 升高 + 轻微窜气（0.01 kg/s）
F3	缸套磨损	磨粒侵入导致几何变形与严重密封失效	缸径增大 + 大量窜气（0.03 kg/s）
F4	活塞环磨损	磨损变形引发密封退化，形成窜气正反馈	窜气质量流量调节（0.02 kg/s）
F5	活塞环粘着	积碳、润滑不足与油泥堆积	缸径变化 + 缸套温度升高 + 窜气

通过上述故障建模与仿真，研究获得了覆盖正常工况与多故障状态的综合数据集，为后续机器学习训练提供了高一致性输入。

基于RF与SHAP的特征筛选

在数据集构建完成后，作者采用随机森林（Random Forest, RF）执行故障分类，利用其对非线性关系的建模能力完成多类别识别。为提升诊断透明度，研究进一步引入SHAP（SHapley Additive exPlanations）框架，对模型判别依据进行量化分解。

采用以下两阶段策略进行筛选：

1.随机森林预识别

利用 RF 学习“参数—故障类型”映射关系；
基于预测结果估计各参数的边际贡献度。

2.Tree SHAP 定量分析

计算各参数对应的 SHAP 值；
依据 SHAP 权重筛选高贡献且具明确物理含义的关键参数。

基于SHAP值的参数筛选流程图

实验结果与性能评估

实验结果显示，TSRF框架在小样本条件下仍实现95%以上的诊断准确率，整体表现优于常见黑箱模型。

同时，SHAP结果揭示了不同故障类别下关键热力学参数的重要性差异，为工程场景中的机理分析与根因定位提供了更具可解释性的证据链。

SHAP可视化代码如下：

点击查看代码

import shap
import matplotlib.pyplot as plt
import numpy as np

# --- 0. Setup & Global Settings ---
plt.rcParams['font.family'] = 'Arial'
plt.rcParams['font.size'] = '24'
plt.rcParams['axes.unicode_minus'] = False

# Assumption: 'best_model' is your trained XGBoost/RF model
# Assumption: 'X_train' and 'X_test' are pandas DataFrames

# 1. Calculate SHAP values as Numpy Arrays (for Beeswarm, Dependence, Interaction)
explainer_tree = shap.TreeExplainer(best_model)
shap_values_numpy = explainer_tree.shap_values(X_train)

# 2. Calculate SHAP values as Explanation Object (Specifically for Waterfall)
explainer_obj = shap.Explainer(best_model, X_test)
shap_values_obj = explainer_obj(X_test)


##################################################################
#                                                                #
#                      (a) Waterfall Plot                        #
#          Visualizes contribution for a single sample           #
#                                                                #
##################################################################

class_idx = 4  # Target class
sample_idx = 3 # Specific sample to explain

plt.figure()
shap.plots.waterfall(
    shap_values_obj[sample_idx, :, class_idx],
    max_display=9,
    show=False
)

# Customizing style
ax = plt.gca()
ax.set_xlabel(ax.get_xlabel(), fontsize=36)
ax.set_ylabel(ax.get_ylabel(), fontsize=36)
ax.spines['bottom'].set_linewidth(3)
plt.show()


##################################################################
#                                                                #
#                      (b) Beeswarm Plot                         #
#              Global summary of feature importance              #
#                                                                #
##################################################################

class_idx = 5
plt.figure(figsize=(10, 8))

shap.summary_plot(
    shap_values_numpy[..., class_idx],
    X_train,
    feature_names=X_train.columns,
    plot_type="dot",
    show=False,
    cmap='Greys' # or 'plasma'
)

# Customize Color Bar
cbar = plt.gcf().axes[-1]
cbar.set_ylabel('Parameter Value', fontsize=24)
cbar.tick_params(labelsize=20)
plt.show()


##################################################################
#                                                                #
#                     (c) Interaction Plot                       #
#          Visualizes interaction effects between features       #
#                                                                #
##################################################################

# Note: Calculation can be expensive
shap_interaction_values = explainer_tree.shap_interaction_values(X_test)
class_idx = 4

plt.figure()
shap.summary_plot(
    shap_interaction_values[..., class_idx],
    X_test,
    show=False,
    max_display=6,
    cmap='Greys'
)

# Clean up subplots
axes = plt.gcf().axes
for ax in axes:
    ax.spines['bottom'].set_linewidth(2)
    ax.tick_params(axis="x", labelsize=18, width=2)
    ax.set_title(ax.get_title(), fontsize=14)

plt.subplots_adjust(wspace=0.3, hspace=0.4)
plt.show()


##################################################################
#                                                                #
#                     (d) Dependence Plot                        #
#           Feature relationship colored by interaction          #
#                                                                #
##################################################################

Feature_X = 'P06'  # Main feature
Feature_Y = 'P07'  # Interaction feature
class_idx = 4

shap.dependence_plot(
    Feature_X,
    shap_values_numpy[..., class_idx],
    X_train,
    interaction_index=Feature_Y,
    dot_size=100,
    show=False
)

# Customize Axes
ax = plt.gca()
ax.tick_params(axis='both', which='major', labelsize=36, width=2)
ax.set_ylabel(f'SHAP value ({Feature_X})', fontsize=36)
ax.spines['bottom'].set_linewidth(3)
ax.spines['left'].set_linewidth(3)
plt.show()


##################################################################
#                                                                #
#            (e) Advanced Composite Plot (Beeswarm + Bar)        #
#      Combines Beeswarm (Bottom Axis) & Importance (Top Axis)   #
#                                                                #
##################################################################

class_idx = 5
fig, ax1 = plt.subplots(figsize=(10, 8))

# 1. Main Beeswarm Plot (on ax1)
shap.summary_plot(
    shap_values_numpy[..., class_idx],
    X_train,
    feature_names=X_train.columns,
    plot_type="dot",
    show=False,
    color_bar=True,
    cmap='Greys' # or 'plasma'
)

# Customize Color Bar
cbar = plt.gcf().axes[-1]
cbar.set_ylabel('Parameter Value', fontsize=24)
cbar.tick_params(labelsize=20)

# Adjust layout to make room for the top axis
plt.gca().set_position([0.2, 0.2, 0.65, 0.65])

# 2. Feature Importance Bar Plot (on Top Axis ax2)
# Create a twin axis sharing the y-axis
ax2 = ax1.twiny()

shap.summary_plot(
    shap_values_numpy[..., class_idx],
    X_train,
    plot_type="bar",
    show=False
)

# Align position with the main plot
plt.gca().set_position([0.2, 0.2, 0.65, 0.65])

# Style the bars (Transparent & Light Color)
bars = ax2.patches
for bar in bars:
    bar.set_color('#CCE5FB') # Light blue background bars
    bar.set_alpha(0.4)       # Transparency

# Customize Axes Labels
ax1.set_xlabel(f'Shapley Value Contribution (F{class_idx})', fontsize=24, labelpad=5)
ax1.set_ylabel('Parameters', fontsize=24)
ax2.set_xlabel('Mean Shapley Value (Parameter Importance)', fontsize=24, labelpad=10)

# Move ax2 (Bar plot axis) to the top
ax2.xaxis.set_label_position('top')
ax2.xaxis.tick_top()

# Ensure ax1 (dots) is drawn ON TOP OF ax2 (bars)
ax1.set_zorder(ax1.get_zorder() + 1)
ax1.patch.set_visible(False) # Make ax1 background transparent

plt.show()

数据集及更多

点击此处 →

posted on 2026-03-01 16:05 伊宸阅读(19) 评论(0) 收藏举报