Abstract
In this paper, we address the problem of estimating the in-hand 6D pose of an object in contact with multiple vision-based tactile sensors. We reason on the possible spatial configurations of the sensors along the object surface. Specifically, we filter contact hypotheses using geometric reasoning and a Convolutional Neural Network (CNN), trained on simulated object-agnostic images, to promote those that better comply with the actual tactile images from the sensors. We use the selected sensors configurations to optimize over the space of 6D poses using a Gradient Descent-based approach. We finally rank the obtained poses by penalizing those that are in collision with the sensors. We carry out experiments in simulation using the DIGIT vision-based sensor with several objects, from the standard YCB model set. The results demonstrate that our approach estimates object poses that are compatible with actual object-sensor contacts in $87.5\%$ of cases while reaching an average positional error in the order of $2$ centimeters. Our analysis also includes qualitative results of experiments with a real DIGIT sensor.
Abstract (translated)
在本文中,我们解决了一个任务,即估计一个物体与多个视觉触觉传感器接触时的手部6D姿态。我们考虑了传感器在物体表面上的可能空间配置。具体而言,我们使用几何推理和训练过的卷积神经网络(CNN)来筛选接触假设,以促进更好地符合传感器从传感器收集的实际触觉图像的假设。我们使用选定的传感器配置通过梯度下降方法优化6D姿态空间中的性能。最后,我们惩罚那些与传感器发生碰撞的假设。我们在模拟中使用数字视觉触觉传感器与标准YCB模型set中的几个物体进行了实验。结果表明,在87.5%的情况下,我们的方法能够估计与实际物体-传感器接触兼容的物体姿态,并在平均位置误差范围为2厘米的情况下达到较高的精度。我们的分析还包括与真实数字传感器的实验 qualitative results。
URL
https://arxiv.org/abs/2301.13667