Writing Reference - QIM Intro
- Owing to the NP-hard nature, the cooperative PDP along with other VRP variants is still difficult to be optimally solved by exact methods (Toth and Vigo 2002; Madsen, Fisher, and Jornsten 1997). Even though numerous heuristic-based methods are developed to compute nearoptimal solutions, the solution generation process remains time-consuming and there is still potential to find more approximate ones. The recent development of deep reinforcement learning (DRL) offers its effectiveness in solving many combinatorial optimization problems including VRPs, and thus brings another perspective to solve PDP (Bello et al. 2016; Kool, van Hoof, and Welling 2019; Nazari et al. 2018; Chen and Tian 2019; Lu, Zhang, and Yang 2019a). Benefiting from learning a parameterized model instead of relying on manually constructed rules to search for solutions, DRL has shown its appealing performance in typical routing problems. Besides, by splitting the phases of training and online inferring, DRL can generate results with much faster computation. Inspired by both high solution quality and inference speed, it is prospective to well solve cooperative PDP by constructing a DRL framework.
MAPDP: Cooperative Multi-Agent Reinforcement Learning to Solve Pickup and Delivery Problems

浙公网安备 33010602011771号