Abstract
This paper considers a Min-Max Multiple Traveling Salesman Problem (MTSP), where the goal is to find a set of tours, one for each agent, to collectively visit all the cities while minimizing the length of the longest tour. Though MTSP has been widely studied, obtaining near-optimal solutions for large-scale problems is still challenging due to its NP-hardness. Recent efforts in data-driven methods face challenges of the need for hard-to-obtain supervision and issues with high variance in gradient estimations, leading to slow convergence and highly suboptimal solutions. We address these issues by reformulating MTSP as a bilevel optimization problem, using the concept of imperative learning (IL). This involves introducing an allocation network that decomposes the MTSP into multiple single-agent traveling salesman problems (TSPs). The longest tour from these TSP solutions is then used to self-supervise the allocation network, resulting in a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP). Additionally, to tackle the high-variance gradient issues during the optimization, we introduce a control variate-based gradient estimation algorithm. Our experiments showed that these innovative designs enable our gradient estimator to converge 20% faster than the advanced reinforcement learning baseline and find up to 80% shorter tour length compared with Google OR-Tools MTSP solver, especially in large-scale problems (e.g. 1000 cities and 15 agents).
Abstract (translated)
本文考虑了最小-最大多代理商旅行商问题(MTSP),其目标是找到一组路径,每个代理商都访问所有城市,同时最小化最长路径的长度。尽管MTSP已广泛研究,但在大规模问题中得到最优解仍然具有挑战性,因为其具有NP困难性。最近的数据驱动方法面临着需要获得困难监督以及高梯度估计问题,导致收敛缓慢和高次优解的问题。为了应对这些问题,我们将MTSP重新建模为双层优化问题,使用指令学习(IL)的概念。这包括引入一个分配网络,将MTSP分解为多个单代理商旅行商问题(TSPs)。然后使用这些TSP解决方案中的最长路径来自动监督分配网络,从而实现了一种新的自监督、双层、端到端学习框架,我们称之为指令MTSP(iMTSP)。此外,为了在优化过程中解决高方差梯度问题,我们引入了一种基于控制变量的梯度估计算法。我们的实验证明,这些创新设计使我们的梯度估计算法能够比高级强化学习基线快20%收敛,并且在大型问题(例如1000个城市和15个代理商)中找到缩短约80%的路径长度。
URL
https://arxiv.org/abs/2405.00285