Paper Reading AI Learner

Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models

2026-02-09 18:58:50
Zichen Jeff Cui, Omar Rayyan, Haritheja Etukuru, Bowen Tan, Zavier Andrianarivo, Zicheng Teng, Yihang Zhou, Krish Mehta, Nicholas Wojno, Kevin Yuanbo Wu, Manan H Anjaria, Ziyuan Wu, Manrong Mao, Guangxun Zhang, Binit Shah, Yejin Kim, Soumith Chintala, Lerrel Pinto, Nur Muhammad Mahi Shafiullah

Abstract

The prevalent paradigm in robot learning attempts to generalize across environments, embodiments, and tasks with language prompts at runtime. A fundamental tension limits this approach: language is often too abstract to guide the concrete physical understanding required for robust manipulation. In this work, we introduce Contact-Anchored Policies (CAP), which replace language conditioning with points of physical contact in space. Simultaneously, we structure CAP as a library of modular utility models rather than a monolithic generalist policy. This factorization allows us to implement a real-to-sim iteration cycle: we build EgoGym, a lightweight simulation benchmark, to rapidly identify failure modes and refine our models and datasets prior to real-world deployment. We show that by conditioning on contact and iterating via simulation, CAP generalizes to novel environments and embodiments out of the box on three fundamental manipulation skills while using only 23 hours of demonstration data, and outperforms large, state-of-the-art VLAs in zero-shot evaluations by 56%. All model checkpoints, codebase, hardware, simulation, and datasets will be open-sourced. Project page: this https URL

Abstract (translated)

机器人学习中普遍采用的范式试图通过运行时的语言提示,在不同环境、实体和任务间进行泛化。然而,这一方法面临一个基本矛盾:语言往往过于抽象,无法指导实现稳健操作所需的具体物理理解。在这项工作中,我们引入了接触锚定策略(CAP),用空间中的物理接触点替代语言条件设置。同时,我们将CAP结构设计为模块化的实用模型库,而不是单一的通才型政策。这种分解使得我们可以实施从真实世界到模拟环境的迭代循环:我们构建了EgoGym,这是一个轻量级的模拟基准测试平台,能够快速识别故障模式,并在实际部署前细化我们的模型和数据集。 我们展示了通过以接触为条件并通过模拟进行迭代,CAP能够在三个基本操作技能上直接泛化到新的环境和实体中,仅使用23小时的操作演示数据。并且,在零样本评估中,与大型、最先进的视觉语言代理(VLAs)相比,CAP的表现高出56%。 所有模型检查点、代码库、硬件、模拟以及数据集都将开源。 项目页面:[此处链接为“this https URL”,请根据实际情况替换或访问具体网址]

URL

https://arxiv.org/abs/2602.09017

PDF

https://arxiv.org/pdf/2602.09017.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot