Paper Reading AI Learner

Collision-aware In-hand 6D Object Pose Estimation using Multiple Vision-based Tactile Sensors

2023-01-31 14:35:26
Gabriele M. Caddeo, Nicola A. Piga, Fabrizio Bottarel, Lorenzo Natale

Abstract

In this paper, we address the problem of estimating the in-hand 6D pose of an object in contact with multiple vision-based tactile sensors. We reason on the possible spatial configurations of the sensors along the object surface. Specifically, we filter contact hypotheses using geometric reasoning and a Convolutional Neural Network (CNN), trained on simulated object-agnostic images, to promote those that better comply with the actual tactile images from the sensors. We use the selected sensors configurations to optimize over the space of 6D poses using a Gradient Descent-based approach. We finally rank the obtained poses by penalizing those that are in collision with the sensors. We carry out experiments in simulation using the DIGIT vision-based sensor with several objects, from the standard YCB model set. The results demonstrate that our approach estimates object poses that are compatible with actual object-sensor contacts in $87.5\%$ of cases while reaching an average positional error in the order of $2$ centimeters. Our analysis also includes qualitative results of experiments with a real DIGIT sensor.

Abstract (translated)

在本文中,我们解决了一个任务,即估计一个物体与多个视觉触觉传感器接触时的手部6D姿态。我们考虑了传感器在物体表面上的可能空间配置。具体而言,我们使用几何推理和训练过的卷积神经网络(CNN)来筛选接触假设,以促进更好地符合传感器从传感器收集的实际触觉图像的假设。我们使用选定的传感器配置通过梯度下降方法优化6D姿态空间中的性能。最后,我们惩罚那些与传感器发生碰撞的假设。我们在模拟中使用数字视觉触觉传感器与标准YCB模型set中的几个物体进行了实验。结果表明,在87.5%的情况下,我们的方法能够估计与实际物体-传感器接触兼容的物体姿态,并在平均位置误差范围为2厘米的情况下达到较高的精度。我们的分析还包括与真实数字传感器的实验 qualitative results。

URL

https://arxiv.org/abs/2301.13667

PDF

https://arxiv.org/pdf/2301.13667.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot