Paper Reading AI Learner

Princeton365: A Diverse Dataset with Accurate Camera Pose

2025-06-10 17:57:00
Karhan Kayan, Stamatis Alexandropoulos, Rishabh Jain, Yiming Zuo, Erich Liang, Jia Deng

Abstract

We introduce Princeton365, a large-scale diverse dataset of 365 videos with accurate camera pose. Our dataset bridges the gap between accuracy and data diversity in current SLAM benchmarks by introducing a novel ground truth collection framework that leverages calibration boards and a 360-camera. We collect indoor, outdoor, and object scanning videos with synchronized monocular and stereo RGB video outputs as well as IMU. We further propose a new scene scale-aware evaluation metric for SLAM based on the the optical flow induced by the camera pose estimation error. In contrast to the current metrics, our new metric allows for comparison between the performance of SLAM methods across scenes as opposed to existing metrics such as Average Trajectory Error (ATE), allowing researchers to analyze the failure modes of their methods. We also propose a challenging Novel View Synthesis benchmark that covers cases not covered by current NVS benchmarks, such as fully non-Lambertian scenes with 360-degree camera trajectories. Please visit this https URL for the dataset, code, videos, and submission.

Abstract (translated)

我们介绍了Princeton365,这是一个包含365个视频的大规模多样化数据集,并且每个视频都具有精确的相机姿态。我们的数据集通过引入一个新颖的真实情况收集框架来弥合当前SLAM基准测试中准确性与数据多样性之间的差距,该框架利用校准板和360度摄像头进行工作。我们采集了室内、室外以及物体扫描视频,包括同步的单目和立体RGB视频输出及IMU(惯性测量单元)信息。 此外,我们还提出了一种新的基于场景尺度感知的SLAM评估指标,该指标是根据相机姿态估计误差所引起的光流来衡量。与现有的指标如平均轨迹错误(ATE)相比,我们的新度量标准允许跨不同场景比较SLAM方法的性能,这使得研究者能够分析其方法的失效模式。 我们还提出了一项具有挑战性的新型视图合成基准测试,该测试涵盖了当前NVS(Novel View Synthesis,新颖视角合成)基准所未涉及的情况,例如全非朗伯场景以及360度相机轨迹。请访问此[链接](https://example.com)获取数据集、代码、视频和提交指南等相关信息。

URL

https://arxiv.org/abs/2506.09035

PDF

https://arxiv.org/pdf/2506.09035.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot