Abstract
High-quality environment lighting is the foundation of creating immersive user experiences in mobile augmented reality (AR) applications. However, achieving visually coherent environment lighting estimation for Mobile AR is challenging due to several key limitations associated with AR device sensing capabilities, including limitations in device camera FoV and pixel dynamic ranges. Recent advancements in generative AI, which can generate high-quality images from different types of prompts, including texts and images, present a potential solution for high-quality lighting estimation. Still, to effectively use generative image diffusion models, we must address their key limitations of generation hallucination and slow inference process. To do so, in this work, we design and implement a generative lighting estimation system called CleAR that can produce high-quality and diverse environment maps in the format of 360$^\circ$ images. Specifically, we design a two-step generation pipeline guided by AR environment context data to ensure the results follow physical environment visual context and color appearances. To improve the estimation robustness under different lighting conditions, we design a real-time refinement component to adjust lighting estimation results on AR devices. To train and test our generative models, we curate a large-scale environment lighting estimation dataset with diverse lighting conditions. Through quantitative evaluation and user study, we show that CleAR outperforms state-of-the-art lighting estimation methods on both estimation accuracy and robustness. Moreover, CleAR supports real-time refinement of lighting estimation results, ensuring robust and timely environment lighting updates for AR applications. Our end-to-end generative estimation takes as fast as 3.2 seconds, outperforming state-of-the-art methods by 110x.
Abstract (translated)
高质量的环境照明是创建沉浸式用户体验在移动增强现实(AR)应用中的基础。然而,由于与AR设备传感能力相关的几个关键限制,包括设备摄像头视场和像素动态范围的局限性,实现视觉连贯的环境光照估计对于移动AR来说具有挑战性。最近生成AI的进步,能够从不同类型提示中生成高质量图像,包括文本和图片,为高质量光照估计提供了一个潜在解决方案。不过,要有效使用生成式扩散模型,我们必须解决它们在生成幻觉和推理过程缓慢方面的关键限制。为此,在本研究中,我们设计并实现了一个名为CleAR的生成性光照估计算法系统,该系统能够生产格式为360$^\circ$图像的高质量且多样化的环境贴图。具体而言,我们设计了一个由AR环境上下文数据引导的两阶段生成流程,以确保结果遵循物理环境的视觉背景和色彩外观。为了提高不同照明条件下估计的鲁棒性,我们还设计了实时调整组件来修正AR设备上的光照估计结果。为了训练和测试我们的生成模型,我们整理了一个包含多种照明条件的大规模环境光照估计数据集。通过定量评估和用户研究,我们展示了CleAR在估计准确性和鲁棒性方面均优于最先进的光照估计算法。此外,CleAR支持实时调整光照估计的结果,确保为AR应用提供稳定及时的环境光更新。我们的端到端生成估算仅需3.2秒,比现有最先进方法快110倍。
URL
https://arxiv.org/abs/2411.02179