Experiment 描述借鉴

Reinforcement Learning Enhanced Explainer for Graph Neural Networks, 2021 NeurIPS

  • From the table, we see that our method RG-Explainer achieves the best results on 5 out of 6 datasets.

    通过表格,我们看到我们的方法在6个数据集中的5个取得了最好的结果。

  • For example, the AUC score of RG-Explainer on Tree-Grid is 0.927 while that of the runner-up is only 0.714, leading to an improvement of 29.8%.

    (对比在某个数据集中,最好的算法和次之的算法的提升程度)例如,RG-Explainer在Tree-Grid上的AUC得分是0.927,而亚军的得分仅为0.714,导致了29.8%的改进。

  • On the BA-Shapes dataset, RG-Explainer achieves comparable results with the winner’s and significantly outperforms GNNExplainer.

    (在某个数据集中,我们的方法不是最好,但是也显著优于其他的方法)在BA-Shapes数据集上,RG-Explainer取得了与获胜者相当的结果,并显著优于GNNExplainer。

  • These results show the advantage of applying reinforcement learning techniques in constructing explanatory subgraphs.

    (结果表明应用某个技术的优势)这些结果表明了应用强化学习技术在构建解释性子图中的优势。

  • We also test the performance of the locator and find that the locator selects ∼ 66% and ∼ 84% accurate seed nodes (i.e., nodes in the ground-truth motif) for BA-2Motifs and MUTAG, respectively. This further explains the good performance of RG-Explainer for graph classification.

    (还测试了其中一个组件的性能,组件在数据集a和b做某个动作的准确率分别为为x和y。)我们还测试了定位器的性能,发现定位器分别为BA-2Motifs和MUTAG选择了约66%和约84%的准确种子节点。这进一步解释了RG-Explainer在图分类中的良好表现。

  • We further test the performance of RG-Explainer in the inductive setting.

    (我们也测试方法在某种情况的性能)

  • We compare it with PGExplainer, which are both learning-based methods.

    (我们对比方法和另一个方法,两者都具有同样的特征)

  • Specifically, we vary the training set sizes from {10%; 30%; 50%; 70%; 90%} and take the remaining instances for testing.

    (我们调整训练集尺寸,剩余的用作测试)

  • For each dataset, we run the experiments 10 times and compute the average AUC scores.

    (我们运行试验n次,并计算平均的指标)

  • From the figure, RG-Explainer generally outperforms PGExplainer as the training set size increases.

    (随着横坐标的增长,方法逐渐超越baseline)

  • For example, with only 10% training instances in the Tree-Grid dataset, RG-Explainer significantly outperforms PGExplainer by a large margin. This shows that RG-Explainer generalizes better than PGExplainer.

    (方法显著优于baseline,用了状语by a large margin表示大幅度)

POMO: Policy Optimization with Multiple Optima for Reinforcement Learning, 2020 NeurIPS

Given 10,000 random instances of TSP20 and TSP50, POMO finds near-optimal solutions with optimality gaps of 0.0006% in seconds and 0.025% in tens of seconds, respectively. For TSP100, POMO achieves the optimality gap of 0.14% in a minute, outperforming all other learning-based heuristics significantly, both in terms of the quality of the solutions and the time it takes to solve.

(首先介绍多少时间解取得的性能,然后描述在时间和质量方面的优越性)

In the table, results under “AM, greedy” method and “POMO, single trajec.” method are both from the identical network structure that is tested by the same inference technique. The only difference was training, so the substantial improvement (e.g. from 3.51% to 1.07% in optimality gap on TSP100) indicates superiority of the POMO training method. As for the inference techniques, it is shown that the combined use of multiple greedy rollouts of POMO and the ×8 instance augmentation can reduce the optimality gap even further, by an order of magnitude.

Learning curves of TSP50 and TSP100 in Figure 3 show that POMO training is more stable and sample-efficient. In reading these graphs, one should keep in mind that POMO uses N-times more trajectories than simple REINFORCE for each training epoch. POMO training time is, however, comparable to that of REINFORCE, thanks to the parallel processing on trajectory generation. For example, TSP100 training takes about 7 minutes per epoch for POMO while it take 6 minutes for REINFORCE.

This naive way of applying POMO can still make a powerful solver. Experiment results on CVRP with 20, 50, and 100 customer nodes are reported in Table 3, and POMO is shown to outperform simple REINFORCE by a large margin. Note that there is no algorithm yet that can find optimal solutions of 10,000 random CVRP instances in a reasonable time, so the “Gap” values in the table are given relative to LKH3 [33] results. POMO has a smaller gap in CVRP100 (0.32%) than in CVRP50 (0.45%), which is probably due to LKH3 falling faster in performance than POMO as the size of the problem grows.

Improvement-type neural approaches, such as L2I by Lu et al. [15], can produce better results than (single-run) LKH3, given long enough search time.4 To emphasize the differences between POMO and L2I (other than the speed), POMO is a general RL tool that can be applied to many different CO problems in a purely data-driven way. One the other hand, L2I is a specialized routing problem solver based on a handcrafted pool of improvement operators. Because POMO is a construction method, it is possible to combine it with other improvement methods to produce even better results.

Combinatorial Optimization by Graph Pointer Networks and Hierarchical Reinforcement Learning, 2021 AAAI

A smaller gap indicates a better result. The optimal solutions are obtained from the LKH algorithm.

We observe that for small-scale TSP instances, the GPN outperforms the Pointer Network, which demonstrates the usefulness of the graph embedding, but yields worse approximations than the Attention Model.

(因为某个component,)

GLSEARCH, 2021 NeurIPS

As shown in Table 2, our model outperforms baselines in terms of size of extracted subgraphs on all medium-sized synthetic graph datasets.

(在某个数据集的某个项超过其他baselines)

posted @ 2024-08-05 13:56  X1OO  阅读(14)  评论(0)    收藏  举报