Paper Reading AI Learner

Fast 2DGS: Efficient Image Representation with Deep Gaussian Prior

2025-12-14 17:23:28
Hao Wang, Ashish Bastola, Chaoyi Zhou, Wenhui Zhu, Xiwen Chen, Xuanzhao Dong, Siyu Huang, Abolfazl Razi

Abstract

As generative models become increasingly capable of producing high-fidelity visual content, the demand for efficient, interpretable, and editable image representations has grown substantially. Recent advances in 2D Gaussian Splatting (2DGS) have emerged as a promising solution, offering explicit control, high interpretability, and real-time rendering capabilities (>1000 FPS). However, high-quality 2DGS typically requires post-optimization. Existing methods adopt random or heuristics (e.g., gradient maps), which are often insensitive to image complexity and lead to slow convergence (>10s). More recent approaches introduce learnable networks to predict initial Gaussian configurations, but at the cost of increased computational and architectural complexity. To bridge this gap, we present Fast-2DGS, a lightweight framework for efficient Gaussian image representation. Specifically, we introduce Deep Gaussian Prior, implemented as a conditional network to capture the spatial distribution of Gaussian primitives under different complexities. In addition, we propose an attribute regression network to predict dense Gaussian properties. Experiments demonstrate that this disentangled architecture achieves high-quality reconstruction in a single forward pass, followed by minimal fine-tuning. More importantly, our approach significantly reduces computational cost without compromising visual quality, bringing 2DGS closer to industry-ready deployment.

Abstract (translated)

随着生成模型在生产高质量视觉内容方面的能力越来越强,对高效、可解释且易于编辑的图像表示的需求也大幅增长。最近,在二维高斯点绘(2D Gaussian Splatting,简称2DGS)领域的进展展现出了巨大的潜力,提供了显式控制、高度的可解释性和实时渲染能力(>1000 FPS)。然而,高质量的2DGS通常需要进行后优化处理。现有的方法采用随机或启发式策略(例如,梯度图),这些方法往往对图像复杂性不敏感,并导致收敛速度缓慢(>10秒)。较新的方法则引入可学习网络来预测初始高斯配置,但代价是增加了计算和架构的复杂度。 为了弥合这一差距,我们提出了一种轻量级框架Fast-2DGS,用于高效的高斯图像表示。具体来说,我们引入了深度高斯先验(Deep Gaussian Prior),通过条件网络捕捉不同复杂性下高斯原语的空间分布。此外,我们还提出了一个属性回归网络来预测密集的高斯特性。 实验表明,这种解耦架构能够在单次前向传递中实现高质量的重建,并随后进行最少的微调。更重要的是,我们的方法在不牺牲视觉质量的前提下显著降低了计算成本,使2DGS更接近于工业应用部署。

URL

https://arxiv.org/abs/2512.12774

PDF

https://arxiv.org/pdf/2512.12774.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot