Paper Reading AI Learner

AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception

2026-02-10 10:05:53
Ruoxuan Feng, Yuxuan Zhou, Siyu Mei, Dongzhan Zhou, Pengwei Wang, Shaowei Cui, Bin Fang, Guocai Yao, Di Hu

Abstract

Real-world contact-rich manipulation demands robots to perceive temporal tactile feedback, capture subtle surface deformations, and reason about object properties as well as force dynamics. Although optical tactile sensors are uniquely capable of providing such rich information, existing tactile datasets and models remain limited. These resources primarily focus on object-level attributes (e.g., material) while largely overlooking fine-grained tactile temporal dynamics during physical interactions. We consider that advancing dynamic tactile perception requires a systematic hierarchy of dynamic perception capabilities to guide both data collection and model design. To address the lack of tactile data with rich dynamic information, we present ToucHD, a large-scale hierarchical tactile dataset spanning tactile atomic actions, real-world manipulations, and touch-force paired data. Beyond scale, ToucHD establishes a comprehensive tactile dynamic data ecosystem that explicitly supports hierarchical perception capabilities from the data perspective. Building on it, we propose AnyTouch 2, a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. The framework captures both pixel-level and action-specific deformations across frames, while explicitly modeling physical force dynamics, thereby learning multi-level dynamic perception capabilities from the model perspective. We evaluate our model on benchmarks that covers static object properties and dynamic physical attributes, as well as real-world manipulation tasks spanning multiple tiers of dynamic perception capabilities-from basic object-level understanding to force-aware dexterous manipulation. Experimental results demonstrate consistent and strong performance across sensors and tasks.

Abstract (translated)

现实世界中的接触密集型操作要求机器人能够感知时间上的触觉反馈,捕捉细微的表面变形,并推理出物体属性及力的动力学特性。尽管光学触觉传感器在这方面表现出独特的能力,但现有的触觉数据集和模型仍然存在局限性。这些资源主要关注对象级别的属性(如材料),而在物理交互过程中的细粒度触觉时间动态方面却严重忽视了这一点。我们认为,推进动态触觉感知需要一个系统的、多层次的感知能力体系来指导数据收集与模型设计。 为了弥补缺乏丰富动态信息的触觉数据这一空白,我们推出了ToucHD,这是一个大规模分层触觉数据集,涵盖了触觉原子动作、现实世界中的操作以及触摸-力成对数据。除了规模之外,ToucHD还建立了全面的触觉动力学数据生态系统,从数据层面明确支持了多层次感知能力。 基于此,我们提出了AnyTouch 2,这是一个针对各种光学触觉传感器的一般性触觉表示学习框架,它将对象级别的理解与细粒度、力感测的动态感知统一起来。该框架能够捕捉像素级和动作特定的变形,并且明确地建模物理力的动力学特性,从而从模型的角度学习多层次的动态感知能力。 我们在静态物体属性、动态物理属性以及跨越多层级动态感知能力的真实世界操作任务上对我们的模型进行了评估——从基本的对象级别理解到力感知灵巧操纵。实验结果表明,在不同传感器和任务中均表现出一致且强大的性能。

URL

https://arxiv.org/abs/2602.09617

PDF

https://arxiv.org/pdf/2602.09617.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot