iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning

Abstract
Abstract (translated)
URL
PDF

Abstract

This paper considers a Min-Max Multiple Traveling Salesman Problem (MTSP), where the goal is to find a set of tours, one for each agent, to collectively visit all the cities while minimizing the length of the longest tour. Though MTSP has been widely studied, obtaining near-optimal solutions for large-scale problems is still challenging due to its NP-hardness. Recent efforts in data-driven methods face challenges of the need for hard-to-obtain supervision and issues with high variance in gradient estimations, leading to slow convergence and highly suboptimal solutions. We address these issues by reformulating MTSP as a bilevel optimization problem, using the concept of imperative learning (IL). This involves introducing an allocation network that decomposes the MTSP into multiple single-agent traveling salesman problems (TSPs). The longest tour from these TSP solutions is then used to self-supervise the allocation network, resulting in a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP). Additionally, to tackle the high-variance gradient issues during the optimization, we introduce a control variate-based gradient estimation algorithm. Our experiments showed that these innovative designs enable our gradient estimator to converge 20% faster than the advanced reinforcement learning baseline and find up to 80% shorter tour length compared with Google OR-Tools MTSP solver, especially in large-scale problems (e.g. 1000 cities and 15 agents).

Abstract (translated)

本文考虑了最小-最大多代理商旅行商问题（MTSP），其目标是找到一组路径，每个代理商都访问所有城市，同时最小化最长路径的长度。尽管MTSP已广泛研究，但在大规模问题中得到最优解仍然具有挑战性，因为其具有NP困难性。最近的数据驱动方法面临着需要获得困难监督以及高梯度估计问题，导致收敛缓慢和高次优解的问题。为了应对这些问题，我们将MTSP重新建模为双层优化问题，使用指令学习（IL）的概念。这包括引入一个分配网络，将MTSP分解为多个单代理商旅行商问题（TSPs）。然后使用这些TSP解决方案中的最长路径来自动监督分配网络，从而实现了一种新的自监督、双层、端到端学习框架，我们称之为指令MTSP（iMTSP）。此外，为了在优化过程中解决高方差梯度问题，我们引入了一种基于控制变量的梯度估计算法。我们的实验证明，这些创新设计使我们的梯度估计算法能够比高级强化学习基线快20%收敛，并且在大型问题（例如1000个城市和15个代理商）中找到缩短约80%的路径长度。

URL

https://arxiv.org/abs/2405.00285

PDF

https://arxiv.org/pdf/2405.00285.pdf

iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning

Abstract

Abstract (translated)

URL

PDF Copy

PDF