Multi-start team orienteering problem for UAS mission re-planning with data-efficient deep reinforcement learning-Reference-Cited by-同舟云学术

Multi-start team orienteering problem for UAS mission re-planning with data-efficient deep reinforcement learning

Published:2024-03 Issue:6 Volume:54 Page:4467-4489
ISSN:0924-669X
Container-title:Applied Intelligence
language:en
Short-container-title:Appl Intell

Author:

Lee Dong Ho^ORCID,Ahn Jaemyung^ORCID

Abstract

AbstractIn this paper, we study the Multi-Start Team Orienteering Problem (MSTOP), a mission re-planning problem where vehicles are initially located away from the depot and have different amounts of fuel. We consider/assume the goal of multiple vehicles is to travel to maximize the sum of collected profits under resource (e.g., time, fuel) consumption constraints. Such re-planning problems occur in a wide range of intelligent UAS applications where changes in the mission environment force the operation of multiple vehicles to change from the original plan. To solve this problem with deep reinforcement learning (RL), we develop a policy network with self-attention on each partial tour and encoder-decoder attention between the partial tour and the remaining nodes. We propose a modified REINFORCE algorithm where the greedy rollout baseline is replaced by a local mini-batch baseline based on multiple, possibly non-duplicate sample rollouts. By drawing multiple samples per training instance, we can learn faster and obtain a stable policy gradient estimator with significantly fewer instances. The proposed training algorithm outperforms the conventional greedy rollout baseline, even when combined with the maximum entropy objective. The efficiency of our method is further demonstrated in two classical problems – the Traveling Salesman Problem (TSP) and the Capacitated Vehicle Routing Problem (CVRP). The experimental results show that our method enables models to develop more effective heuristics and performs competitively with the state-of-the-art deep reinforcement learning methods.

Funder

National Research Foundation of Korea

Korea Advanced Institute of Science and Technology

Publisher

Springer Science and Business Media LLC

Link

https://link.springer.com/content/pdf/10.1007/s10489-024-05367-4.pdf

Reference39 articles.

1. Coutinho WP, Battarra M, Fliege J (2018) The unmanned aerial vehicle routing and trajectory optimisation problem, a taxonomic review. Comput Ind Eng 120:116–28. https://doi.org/10.1016/j.cie.2018.04.037

2. Rojas Viloria D, Solano-Charris EL, Muñoz-Villamizar A, Montoya-Torres JR (2021) Unmanned aerial vehicles/drones in vehicle routing problems: a literature review. Int Trans Oper Res 28:1626–57. https://doi.org/10.1111/itor.12783

3. Kool W, Hoof HV, Welling M (2019) Attention, Learn to Solve Routing Problems! In: 2019 International Conference on Learning Representations (ICLR).https://doi.org/10.48550/arXiv.1803.08475

4. Kwon Y-D, Choo J, Kim B, Yoon I, Gwon Y, Min S (2020) Pomo: Policy optimization with multiple optima for reinforcement learning. In: Advances in Neural Information Processing Systems (NeurIPS), 21188–98. https://doi.org/10.48550/arXiv.2010.16011

5. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6000–10. Curran Associates Inc, Long Beach, California, USA. https://dl.acm.org/doi/10.5555/3295222.3295349

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. A Learnheuristic Algorithm Based on Thompson Sampling for the Heterogeneous and Dynamic Team Orienteering Problem;Mathematics;2024-06-05

2. A Sim-Learnheuristic for the Team Orienteering Problem: Applications to Unmanned Aerial Vehicles;Algorithms;2024-05-08

3. Generation of Tourist Routes Considering Preferences and Public Transport Using Artificial Intelligence Planning Techniques;Lecture Notes in Computer Science;2024