Abstract
Real-world contact-rich manipulation demands robots to perceive temporal tactile feedback, capture subtle surface deformations, and reason about object properties as well as force dynamics. Although optical tactile sensors are uniquely capable of providing such rich information, existing tactile datasets and models remain limited. These resources primarily focus on object-level attributes (e.g., material) while largely overlooking fine-grained tactile temporal dynamics during physical interactions. We consider that advancing dynamic tactile perception requires a systematic hierarchy of dynamic perception capabilities to guide both data collection and model design. To address the lack of tactile data with rich dynamic information, we present ToucHD, a large-scale hierarchical tactile dataset spanning tactile atomic actions, real-world manipulations, and touch-force paired data. Beyond scale, ToucHD establishes a comprehensive tactile dynamic data ecosystem that explicitly supports hierarchical perception capabilities from the data perspective. Building on it, we propose AnyTouch 2, a general tactile representation learning framework for diverse optical tactile sensors that unifies object-level understanding with fine-grained, force-aware dynamic perception. The framework captures both pixel-level and action-specific deformations across frames, while explicitly modeling physical force dynamics, thereby learning multi-level dynamic perception capabilities from the model perspective. We evaluate our model on benchmarks that covers static object properties and dynamic physical attributes, as well as real-world manipulation tasks spanning multiple tiers of dynamic perception capabilities-from basic object-level understanding to force-aware dexterous manipulation. Experimental results demonstrate consistent and strong performance across sensors and tasks.
Abstract (translated)
现实世界中的接触密集型操作要求机器人能够感知时间上的触觉反馈,捕捉细微的表面变形,并推理出物体属性及力的动力学特性。尽管光学触觉传感器在这方面表现出独特的能力,但现有的触觉数据集和模型仍然存在局限性。这些资源主要关注对象级别的属性(如材料),而在物理交互过程中的细粒度触觉时间动态方面却严重忽视了这一点。我们认为,推进动态触觉感知需要一个系统的、多层次的感知能力体系来指导数据收集与模型设计。 为了弥补缺乏丰富动态信息的触觉数据这一空白,我们推出了ToucHD,这是一个大规模分层触觉数据集,涵盖了触觉原子动作、现实世界中的操作以及触摸-力成对数据。除了规模之外,ToucHD还建立了全面的触觉动力学数据生态系统,从数据层面明确支持了多层次感知能力。 基于此,我们提出了AnyTouch 2,这是一个针对各种光学触觉传感器的一般性触觉表示学习框架,它将对象级别的理解与细粒度、力感测的动态感知统一起来。该框架能够捕捉像素级和动作特定的变形,并且明确地建模物理力的动力学特性,从而从模型的角度学习多层次的动态感知能力。 我们在静态物体属性、动态物理属性以及跨越多层级动态感知能力的真实世界操作任务上对我们的模型进行了评估——从基本的对象级别理解到力感知灵巧操纵。实验结果表明,在不同传感器和任务中均表现出一致且强大的性能。
URL
https://arxiv.org/abs/2602.09617