Paper Reading AI Learner

NeRF-XL: Scaling NeRFs with Multiple GPUs

2024-04-24 21:43:15
Ruilong Li, Sanja Fidler, Angjoo Kanazawa, Francis Williams

Abstract

We present NeRF-XL, a principled method for distributing Neural Radiance Fields (NeRFs) across multiple GPUs, thus enabling the training and rendering of NeRFs with an arbitrarily large capacity. We begin by revisiting existing multi-GPU approaches, which decompose large scenes into multiple independently trained NeRFs, and identify several fundamental issues with these methods that hinder improvements in reconstruction quality as additional computational resources (GPUs) are used in training. NeRF-XL remedies these issues and enables the training and rendering of NeRFs with an arbitrary number of parameters by simply using more hardware. At the core of our method lies a novel distributed training and rendering formulation, which is mathematically equivalent to the classic single-GPU case and minimizes communication between GPUs. By unlocking NeRFs with arbitrarily large parameter counts, our approach is the first to reveal multi-GPU scaling laws for NeRFs, showing improvements in reconstruction quality with larger parameter counts and speed improvements with more GPUs. We demonstrate the effectiveness of NeRF-XL on a wide variety of datasets, including the largest open-source dataset to date, MatrixCity, containing 258K images covering a 25km^2 city area.

Abstract (translated)

我们提出了NeRF-XL,一种在多个GPU之间有原则地分配Neural Radiance场(NeRFs)的方法,从而实现使用任意大的容量训练和渲染NeRFs。我们首先回顾现有的多GPU方法,这些方法将大场景分解为多个独立训练的NeRFs,并确定这些方法在训练过程中存在几个基本问题,这些问题会随着使用额外的计算资源(GPUs)而有所改善。NeRF-XL解决了这些问题,通过简单地使用更多硬件来训练和渲染NeRFs,实现了NeRFs具有任意数量参数的训练和渲染。 我们方法的核心是一个新的分布式训练和渲染公式,它与经典单GPU情况等价,并最小化了GPU之间的通信。通过解锁具有任意大参数计数的NeRFs,我们的方法揭示了多GPU对NeRFs的扩展规模定律,表明随着参数计数的大幅增加,重建质量的提高以及速度的提高。我们在各种数据集上都证明了NeRF-XL的有效性,包括迄今最大的开源数据集MatrixCity,该数据集包含258K个图像,覆盖了25平方公里的城市区域。

URL

https://arxiv.org/abs/2404.16221

PDF

https://arxiv.org/pdf/2404.16221.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot