Abstract
Camera rotation estimation from a single image is a challenging task, often requiring depth data and/or camera intrinsics, which are generally not available for in-the-wild videos. Although external sensors such as inertial measurement units (IMUs) can help, they often suffer from drift and are not applicable in non-inertial reference frames. We present U-ARE-ME, an algorithm that estimates camera rotation along with uncertainty from uncalibrated RGB images. Using a Manhattan World assumption, our method leverages the per-pixel geometric priors encoded in single-image surface normal predictions and performs optimisation over the SO(3) manifold. Given a sequence of images, we can use the per-frame rotation estimates and their uncertainty to perform multi-frame optimisation, achieving robustness and temporal consistency. Our experiments demonstrate that U-ARE-ME performs comparably to RGB-D methods and is more robust than sparse feature-based SLAM methods. We encourage the reader to view the accompanying video at this https URL for a visual overview of our method.
Abstract (translated)
从单个图像中估计相机的旋转是一个具有挑战性的任务,通常需要深度数据和/或相机内参,这在野外视频通常不是可用的。尽管外部传感器(如惯性测量单元(IMUs))可以帮助,但它们通常会受到漂移的影响,并且不适用于非惯性参考系。我们提出了 U-ARE-ME 算法,该算法从未校准的 RGB 图像中估计相机的旋转并具有不确定性。使用曼哈顿世界假设,我们的方法利用单帧表面法线预测中的每像素几何先验,并在 SO(3) 上进行优化。对于一系列图像,我们可以使用每帧旋转估计及其不确定性进行多帧优化,实现鲁棒性和时间一致性。我们的实验证明,U-ARE-ME 与其他 RGB-D 方法相当,并且比稀疏特征基于 SLAM 方法更稳健。我们鼓励读者查看附录中的视频,以获得对我们方法的视觉概述。
URL
https://arxiv.org/abs/2403.15583