Paper Reading AI Learner

U-ARE-ME: Uncertainty-Aware Rotation Estimation in Manhattan Environments

2024-03-22 19:14:28
Aalok Patwardhan, Callum Rhodes, Gwangbin Bae, Andrew J. Davison

Abstract

Camera rotation estimation from a single image is a challenging task, often requiring depth data and/or camera intrinsics, which are generally not available for in-the-wild videos. Although external sensors such as inertial measurement units (IMUs) can help, they often suffer from drift and are not applicable in non-inertial reference frames. We present U-ARE-ME, an algorithm that estimates camera rotation along with uncertainty from uncalibrated RGB images. Using a Manhattan World assumption, our method leverages the per-pixel geometric priors encoded in single-image surface normal predictions and performs optimisation over the SO(3) manifold. Given a sequence of images, we can use the per-frame rotation estimates and their uncertainty to perform multi-frame optimisation, achieving robustness and temporal consistency. Our experiments demonstrate that U-ARE-ME performs comparably to RGB-D methods and is more robust than sparse feature-based SLAM methods. We encourage the reader to view the accompanying video at this https URL for a visual overview of our method.

Abstract (translated)

从单个图像中估计相机的旋转是一个具有挑战性的任务,通常需要深度数据和/或相机内参,这在野外视频通常不是可用的。尽管外部传感器(如惯性测量单元(IMUs))可以帮助,但它们通常会受到漂移的影响,并且不适用于非惯性参考系。我们提出了 U-ARE-ME 算法,该算法从未校准的 RGB 图像中估计相机的旋转并具有不确定性。使用曼哈顿世界假设,我们的方法利用单帧表面法线预测中的每像素几何先验,并在 SO(3) 上进行优化。对于一系列图像,我们可以使用每帧旋转估计及其不确定性进行多帧优化,实现鲁棒性和时间一致性。我们的实验证明,U-ARE-ME 与其他 RGB-D 方法相当,并且比稀疏特征基于 SLAM 方法更稳健。我们鼓励读者查看附录中的视频,以获得对我们方法的视觉概述。

URL

https://arxiv.org/abs/2403.15583

PDF

https://arxiv.org/pdf/2403.15583.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot