Abstract
AbstractIn this paper, we study the Multi-Start Team Orienteering Problem (MSTOP), a mission re-planning problem where vehicles are initially located away from the depot and have different amounts of fuel. We consider/assume the goal of multiple vehicles is to travel to maximize the sum of collected profits under resource (e.g., time, fuel) consumption constraints. Such re-planning problems occur in a wide range of intelligent UAS applications where changes in the mission environment force the operation of multiple vehicles to change from the original plan. To solve this problem with deep reinforcement learning (RL), we develop a policy network with self-attention on each partial tour and encoder-decoder attention between the partial tour and the remaining nodes. We propose a modified REINFORCE algorithm where the greedy rollout baseline is replaced by a local mini-batch baseline based on multiple, possibly non-duplicate sample rollouts. By drawing multiple samples per training instance, we can learn faster and obtain a stable policy gradient estimator with significantly fewer instances. The proposed training algorithm outperforms the conventional greedy rollout baseline, even when combined with the maximum entropy objective. The efficiency of our method is further demonstrated in two classical problems – the Traveling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP). The experimental results show that our method enables models to develop more effective heuristics and performs competitively with the state-of-the-art deep reinforcement learning methods.
Funder
National Research Foundation of Korea
Korea Advanced Institute of Science and Technology
Publisher
Springer Science and Business Media LLC
Reference39 articles.
1. Coutinho WP, Battarra M, Fliege J (2018) The unmanned aerial vehicle routing and trajectory optimisation problem, a taxonomic review. Comput Ind Eng 120:116–28. https://doi.org/10.1016/j.cie.2018.04.037
2. Rojas Viloria D, Solano-Charris EL, Muñoz-Villamizar A, Montoya-Torres JR (2021) Unmanned aerial vehicles/drones in vehicle routing problems: a literature review. Int Trans Oper Res 28:1626–57. https://doi.org/10.1111/itor.12783
3. Kool W, Hoof HV, Welling M (2019) Attention, Learn to Solve Routing Problems! In: 2019 International Conference on Learning Representations (ICLR).https://doi.org/10.48550/arXiv.1803.08475
4. Kwon Y-D, Choo J, Kim B, Yoon I, Gwon Y, Min S (2020) Pomo: Policy optimization with multiple optima for reinforcement learning. In: Advances in Neural Information Processing Systems (NeurIPS), 21188–98. https://doi.org/10.48550/arXiv.2010.16011
5. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–10. Curran Associates Inc, Long Beach, California, USA. https://dl.acm.org/doi/10.5555/3295222.3295349
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献