Paper Reading AI Learner

Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation

2023-03-22 00:55:53
Heng Yang, Marco Pavone

Abstract

The two-stage object pose estimation paradigm first detects semantic keypoints on the image and then estimates the 6D pose by minimizing reprojection errors. Despite performing well on standard benchmarks, existing techniques offer no provable guarantees on the quality and uncertainty of the estimation. In this paper, we inject two fundamental changes, namely conformal keypoint detection and geometric uncertainty propagation, into the two-stage paradigm and propose the first pose estimator that endows an estimation with provable and computable worst-case error bounds. On one hand, conformal keypoint detection applies the statistical machinery of inductive conformal prediction to convert heuristic keypoint detections into circular or elliptical prediction sets that cover the groundtruth keypoints with a user-specified marginal probability (e.g., 90%). Geometric uncertainty propagation, on the other, propagates the geometric constraints on the keypoints to the 6D object pose, leading to a Pose UnceRtainty SEt (PURSE) that guarantees coverage of the groundtruth pose with the same probability. The PURSE, however, is a nonconvex set that does not directly lead to estimated poses and uncertainties. Therefore, we develop RANdom SAmple averaGing (RANSAG) to compute an average pose and apply semidefinite relaxation to upper bound the worst-case errors between the average pose and the groundtruth. On the LineMOD Occlusion dataset we demonstrate: (i) the PURSE covers the groundtruth with valid probabilities; (ii) the worst-case error bounds provide correct uncertainty quantification; and (iii) the average pose achieves better or similar accuracy as representative methods based on sparse keypoints.

Abstract (translated)

两阶段的对象姿态估计范式首先在图像中检测语义关键点,然后最小化投影误差来估计6D姿态。尽管在标准基准测试中表现良好,现有技术并没有提供可证明的质量和不确定性保证。在本文中,我们引入了两个基本变化,即 conformal keypoint 检测和几何不确定性传播,并将这两个变化融入两阶段范式中,并提出了第一个姿态估计器,该器具冒猜值和计算可证明的最坏误差限。一方面, conformal keypoint 检测应用了基于经验引导预测的统计机器,将启发式关键点检测转换为循环或椭圆预测集,以指定用户指定边际概率(例如90%)覆盖 groundtruth 关键点。另一方面,几何不确定性传播将几何约束传播到6D 对象姿态,导致一个 Pose UnceRtainty SEt(PURSE),保证覆盖 groundtruth 姿态的概率与相同的概率。然而,purSE 是一个非凸集合,并不直接导致估计姿态和不确定性。因此,我们开发了RANdom SAmple averaGing(RANSAG),计算平均姿态,并应用半确界放松来限制平均姿态和 groundtruth 姿态之间的最坏误差限。在 LineMOD Occlusion 数据集上,我们证明了:(i) purSE 覆盖 groundtruth 以有效概率;(ii)最坏误差限提供正确的不确定性量化;(iii)平均姿态以基于稀疏关键点的代表方法的更好或类似精度实现。

URL

https://arxiv.org/abs/2303.12246

PDF

https://arxiv.org/pdf/2303.12246.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot