Paper Reading AI Learner

Charge: A Comprehensive Novel View Synthesis Benchmark and Dataset to Bind Them All

2025-12-15 18:33:08
Michal Nazarczuk, Thomas Tanay, Arthur Moreau, Zhensong Zhang, Eduardo P\'erez-Pellitero

Abstract

This paper presents a new dataset for Novel View Synthesis, generated from a high-quality, animated film with stunning realism and intricate detail. Our dataset captures a variety of dynamic scenes, complete with detailed textures, lighting, and motion, making it ideal for training and evaluating cutting-edge 4D scene reconstruction and novel view generation models. In addition to high-fidelity RGB images, we provide multiple complementary modalities, including depth, surface normals, object segmentation and optical flow, enabling a deeper understanding of scene geometry and motion. The dataset is organised into three distinct benchmarking scenarios: a dense multi-view camera setup, a sparse camera arrangement, and monocular video sequences, enabling a wide range of experimentation and comparison across varying levels of data sparsity. With its combination of visual richness, high-quality annotations, and diverse experimental setups, this dataset offers a unique resource for pushing the boundaries of view synthesis and 3D vision.

Abstract (translated)

本文介绍了一个新的用于新颖视图合成的数据集,该数据集源自一部具有惊人真实感和复杂细节的高质量动画电影。我们的数据集捕捉了各种动态场景,包括详细的纹理、光照和运动,非常适合训练和评估前沿的4D场景重建和新颖视图生成模型。除了高保真的RGB图像外,我们还提供了多种互补模式,如深度信息、表面法线、对象分割及光流,这有助于更深入地理解场景几何结构与运动。 数据集按三个不同的基准测试情景组织:密集多视角相机设置、稀疏相机排列以及单目视频序列。这使得在不同程度的数据稀疏性上进行广泛实验和对比成为可能。凭借其视觉丰富度、高质量注释及多样化的实验设定,该数据集为推进视图合成与3D视觉领域的边界提供了独特资源。

URL

https://arxiv.org/abs/2512.13639

PDF

https://arxiv.org/pdf/2512.13639.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot