Abstract
In this paper, we explore the impact of adding tactile sensation to video prediction models for physical robot interactions. Predicting the impact of robotic actions on the environment is a fundamental challenge in robotics. Current methods leverage visual and robot action data to generate video predictions over a given time period, which can then be used to adjust robot actions. However, humans rely on both visual and tactile feedback to develop and maintain a mental model of their physical surroundings. In this paper, we investigate the impact of integrating tactile feedback into video prediction models for physical robot interactions. We propose three multi-modal integration approaches and compare the performance of these tactile-enhanced video prediction models. Additionally, we introduce two new datasets of robot pushing that use a magnetic-based tactile sensor for unsupervised learning. The first dataset contains visually identical objects with different physical properties, while the second dataset mimics existing robot-pushing datasets of household object clusters. Our results demonstrate that incorporating tactile feedback into video prediction models improves scene prediction accuracy and enhances the agent's perception of physical interactions and understanding of cause-effect relationships during physical robot interactions.
Abstract (translated)
在本文中,我们对将触觉感觉添加到视频预测模型中对于实际机器人交互的影响进行了研究。预测机器人行动对环境的影响是机器人学中的一个基本挑战。目前的方法利用视觉和机器人行动数据生成在特定时间段内的视频预测,然后可用于调整机器人行动。然而,人类依赖视觉和触觉反馈来发展和维持他们的物理周围环境的心理模型。在本文中,我们研究了将触觉反馈整合到视频预测模型中对于实际机器人交互的影响。我们提出了三种多模态融合方法,并比较了这些触觉增强的视频预测模型的性能。此外,我们介绍了两个新的机器人推动数据集,这些数据集使用基于磁性的触觉传感器进行无监督学习。第一个数据集包含具有不同物理性质的视觉相同的物体,而第二个数据集模拟了家庭物品簇现有的机器人推动数据集。我们的结果表明,将触觉反馈添加到视频预测模型中可以提高场景预测的准确性,增强Agent在物理机器人交互中对物理互动和因果关系的理解。
URL
https://arxiv.org/abs/2304.11193