Paper Reading AI Learner

iMTSP: Solving Min-Max Multiple Traveling Salesman Problem with Imperative Learning

2024-05-01 02:26:13
Yifan Guo, Zhongqiang Ren, Chen Wang

Abstract

This paper considers a Min-Max Multiple Traveling Salesman Problem (MTSP), where the goal is to find a set of tours, one for each agent, to collectively visit all the cities while minimizing the length of the longest tour. Though MTSP has been widely studied, obtaining near-optimal solutions for large-scale problems is still challenging due to its NP-hardness. Recent efforts in data-driven methods face challenges of the need for hard-to-obtain supervision and issues with high variance in gradient estimations, leading to slow convergence and highly suboptimal solutions. We address these issues by reformulating MTSP as a bilevel optimization problem, using the concept of imperative learning (IL). This involves introducing an allocation network that decomposes the MTSP into multiple single-agent traveling salesman problems (TSPs). The longest tour from these TSP solutions is then used to self-supervise the allocation network, resulting in a new self-supervised, bilevel, end-to-end learning framework, which we refer to as imperative MTSP (iMTSP). Additionally, to tackle the high-variance gradient issues during the optimization, we introduce a control variate-based gradient estimation algorithm. Our experiments showed that these innovative designs enable our gradient estimator to converge 20% faster than the advanced reinforcement learning baseline and find up to 80% shorter tour length compared with Google OR-Tools MTSP solver, especially in large-scale problems (e.g. 1000 cities and 15 agents).

Abstract (translated)

本文考虑了最小-最大多代理商旅行商问题(MTSP),其目标是找到一组路径,每个代理商都访问所有城市,同时最小化最长路径的长度。尽管MTSP已广泛研究,但在大规模问题中得到最优解仍然具有挑战性,因为其具有NP困难性。最近的数据驱动方法面临着需要获得困难监督以及高梯度估计问题,导致收敛缓慢和高次优解的问题。为了应对这些问题,我们将MTSP重新建模为双层优化问题,使用指令学习(IL)的概念。这包括引入一个分配网络,将MTSP分解为多个单代理商旅行商问题(TSPs)。然后使用这些TSP解决方案中的最长路径来自动监督分配网络,从而实现了一种新的自监督、双层、端到端学习框架,我们称之为指令MTSP(iMTSP)。此外,为了在优化过程中解决高方差梯度问题,我们引入了一种基于控制变量的梯度估计算法。我们的实验证明,这些创新设计使我们的梯度估计算法能够比高级强化学习基线快20%收敛,并且在大型问题(例如1000个城市和15个代理商)中找到缩短约80%的路径长度。

URL

https://arxiv.org/abs/2405.00285

PDF

https://arxiv.org/pdf/2405.00285.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot