Paper Reading AI Learner

6D Object Pose Estimation from Approximate 3D Models for Orbital Robotics

2023-03-23 13:18:05
Maximilian Ulmer, Maximilian Durner, Martin Sundermeyer, Manuel Stoiber, Rudolph Triebel

Abstract

We present a novel technique to estimate the 6D pose of objects from single images where the 3D geometry of the object is only given approximately and not as a precise 3D model. To achieve this, we employ a dense 2D-to-3D correspondence predictor that regresses 3D model coordinates for every pixel. In addition to the 3D coordinates, our model also estimates the pixel-wise coordinate error to discard correspondences that are likely wrong. This allows us to generate multiple 6D pose hypotheses of the object, which we then refine iteratively using a highly efficient region-based approach. We also introduce a novel pixel-wise posterior formulation by which we can estimate the probability for each hypothesis and select the most likely one. As we show in experiments, our approach is capable of dealing with extreme visual conditions including overexposure, high contrast, or low signal-to-noise ratio. This makes it a powerful technique for the particularly challenging task of estimating the pose of tumbling satellites for in-orbit robotic applications. Our method achieves state-of-the-art performance on the SPEED+ dataset and has won the SPEC2021 post-mortem competition.

Abstract (translated)

我们提出了一种 novel 技术,用于从单个图像中估计物体的 6D 姿态,其中物体的 3D 几何只给出近似值,而不是精确的 3D 模型。为了实现这一目标,我们使用了一种Dense 2D-to-3D 对应预测器,该预测器对每个像素的 3D 模型坐标进行回归。除了 3D 坐标,我们的模型还估计了像素坐标错误,以排除可能不正确的对应关系。这允许我们生成多个物体的 6D 姿态假设,然后使用高效的区域方法迭代地优化。我们还引入了一种 novel 像素后处理方法,可以估计每个假设的概率,并选择最可能的那个。正如在实验中所示,我们的方法可以处理极端的视觉条件,包括过曝、高对比度或低信号-to-噪声比。这使得它成为估计在轨道机器人应用中翻滚卫星姿态的特别挑战性任务的强大技术。我们的方法在 SPEED+ 数据集上取得了最先进的性能,并赢得了 SPEC2021 post-mortem competition。

URL

https://arxiv.org/abs/2303.13241

PDF

https://arxiv.org/pdf/2303.13241.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot