Writing - Related Work

Combinatorial optimization and RL

Combinatorial optimization and reinforcement learning. In recent years, significant work has been invested in solving NP-hard combinatorial optimization problems using machine learning, notably by developing new architectures such as pointer networks [5] and graph convolutional networks [24]. Leveraging these architectures, reinforcement learning approaches have been developed for the TSP [6, 7] and some of its vehicle routing relatives [25], including the CVRP [12]. Crucially, the CVRP is significantly more challenging than the closely related TSP. While TSPs on tens of thousands of cities can be solved to optimality [26], CVRPs with more than a few hundred cities are very hard to solve exactly [27], often requiring cumbersome methods such as branch-and-cut-and-price, and motivating the search for alternative solution approaches. This work adopts a hybrid approach, casting a hard combinatorial problem (CVRP) as a sequence of easier combinatorial problems (PC-TSP) in an approximate dynamic programming setting. Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing, NeurIPS 2020
Reinforcement learning (RL) methods: To overcome the drawback of SL, several groups chose RL instead of SL. For example, Bello et al. (2017) implemented an actorcritic RL architecture, which uses the tour length as a reward, to guide the search towards promising area. Khalil et al. (2017) proposed a framework which maintains a partial tour and repeatedly calls a RL model to select the most relevant city to add to the partial tour, until forming a complete TSP tour. Emami and Ranka (2018) also implemented an actorcritic neural network, and chose Sinkhorn policy gradient to learn policies by approximating a double stochastic matrix. Concurrently, (Deudon et al. 2018), (Kool, van Hoof, and Welling 2019) both proposed a graph attention network (GAN), which incorporates attention mechanism with RL to auto-regressively improve the quality of the obtained solution. Recently, Wu et al. (2020) presented an improvement based learning framework, which exploited deep RL to automatically discover better improvement policies. In addition, there are several ML based methods recently proposed for other related problems, such as the decision TSP (Prates et al. 2019), the multiple TSP (Kaempfer and Wolf 2019), and the vehicle routing problem (Nazari et al. 2018), (Chen and Tian 2019) and (Lu, Zhang, and Yang 2020), etc. For an overall survey, please refer to (Bengio, Lodi, and Prouvost 2018) and (Guo et al. 2019). Generalize a Small Pre-trained Model to Arbitrarily Large TSP Instances, AAAI 2020

posted @ 2023-04-25 00:19 X1OO 阅读(22) 评论(0) 收藏举报

刷新页面返回顶部

X1OO

Writing - Related Work

Combinatorial optimization and RL

公告