Paper Reading AI Learner

CleAR: Robust Context-Guided Generative Lighting Estimation for Mobile Augmented Reality

2024-11-04 15:37:18
Yiqin Zhao, Mallesham Dasari, Tian Guo

Abstract

High-quality environment lighting is the foundation of creating immersive user experiences in mobile augmented reality (AR) applications. However, achieving visually coherent environment lighting estimation for Mobile AR is challenging due to several key limitations associated with AR device sensing capabilities, including limitations in device camera FoV and pixel dynamic ranges. Recent advancements in generative AI, which can generate high-quality images from different types of prompts, including texts and images, present a potential solution for high-quality lighting estimation. Still, to effectively use generative image diffusion models, we must address their key limitations of generation hallucination and slow inference process. To do so, in this work, we design and implement a generative lighting estimation system called CleAR that can produce high-quality and diverse environment maps in the format of 360$^\circ$ images. Specifically, we design a two-step generation pipeline guided by AR environment context data to ensure the results follow physical environment visual context and color appearances. To improve the estimation robustness under different lighting conditions, we design a real-time refinement component to adjust lighting estimation results on AR devices. To train and test our generative models, we curate a large-scale environment lighting estimation dataset with diverse lighting conditions. Through quantitative evaluation and user study, we show that CleAR outperforms state-of-the-art lighting estimation methods on both estimation accuracy and robustness. Moreover, CleAR supports real-time refinement of lighting estimation results, ensuring robust and timely environment lighting updates for AR applications. Our end-to-end generative estimation takes as fast as 3.2 seconds, outperforming state-of-the-art methods by 110x.

Abstract (translated)

高质量的环境照明是创建沉浸式用户体验在移动增强现实(AR)应用中的基础。然而,由于与AR设备传感能力相关的几个关键限制,包括设备摄像头视场和像素动态范围的局限性,实现视觉连贯的环境光照估计对于移动AR来说具有挑战性。最近生成AI的进步,能够从不同类型提示中生成高质量图像,包括文本和图片,为高质量光照估计提供了一个潜在解决方案。不过,要有效使用生成式扩散模型,我们必须解决它们在生成幻觉和推理过程缓慢方面的关键限制。为此,在本研究中,我们设计并实现了一个名为CleAR的生成性光照估计算法系统,该系统能够生产格式为360$^\circ$图像的高质量且多样化的环境贴图。具体而言,我们设计了一个由AR环境上下文数据引导的两阶段生成流程,以确保结果遵循物理环境的视觉背景和色彩外观。为了提高不同照明条件下估计的鲁棒性,我们还设计了实时调整组件来修正AR设备上的光照估计结果。为了训练和测试我们的生成模型,我们整理了一个包含多种照明条件的大规模环境光照估计数据集。通过定量评估和用户研究,我们展示了CleAR在估计准确性和鲁棒性方面均优于最先进的光照估计算法。此外,CleAR支持实时调整光照估计的结果,确保为AR应用提供稳定及时的环境光更新。我们的端到端生成估算仅需3.2秒,比现有最先进方法快110倍。

URL

https://arxiv.org/abs/2411.02179

PDF

https://arxiv.org/pdf/2411.02179.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot