Paper Reading AI Learner

Fovea Stacking: Imaging with Dynamic Localized Aberration Correction

2025-05-31 21:15:27
Shi Mao, Yogeshwar Mishra, Wolfgang Heidrich

Abstract

The desire for cameras with smaller form factors has recently lead to a push for exploring computational imaging systems with reduced optical complexity such as a smaller number of lens elements. Unfortunately such simplified optical systems usually suffer from severe aberrations, especially in off-axis regions, which can be difficult to correct purely in software. In this paper we introduce Fovea Stacking, a new type of imaging system that utilizes emerging dynamic optical components called deformable phase plates (DPPs) for localized aberration correction anywhere on the image sensor. By optimizing DPP deformations through a differentiable optical model, off-axis aberrations are corrected locally, producing a foveated image with enhanced sharpness at the fixation point - analogous to the eye's fovea. Stacking multiple such foveated images, each with a different fixation point, yields a composite image free from aberrations. To efficiently cover the entire field of view, we propose joint optimization of DPP deformations under imaging budget constraints. Due to the DPP device's non-linear behavior, we introduce a neural network-based control model for improved alignment between simulation-hardware performance. We further demonstrated that for extended depth-of-field imaging, fovea stacking outperforms traditional focus stacking in image quality. By integrating object detection or eye-tracking, the system can dynamically adjust the lens to track the object of interest-enabling real-time foveated video suitable for downstream applications such as surveillance or foveated virtual reality displays.

Abstract (translated)

对更小尺寸相机的需求最近推动了探索采用简化光学系统(如减少透镜元件数量)的计算成像系统的趋势。然而,这种简化的光学系统通常在偏离中心区域会产生严重的像差,这些像差往往难以仅通过软件进行校正。本文介绍了 Fovea Stacking,这是一种新的成像系统,利用新兴动态光学组件——可变形相位板(DPPs)来对图像传感器上的任何位置进行局部像差校正。通过使用可微分的光学模型优化 DPP 的形变,在偏离中心区域可以实现局部校正,生成一个在注视点处增强清晰度的视觉聚焦图像,类似于眼睛中的黄斑区。叠加多个这样的视觉聚焦图像(每个都有不同的注视点)则可以得到一幅没有像差的合成图像。 为了高效地覆盖整个视野范围,我们提出了基于成像预算限制下共同优化 DPP 形变的方法。由于 DPP 设备的行为是非线性的,因此我们引入了一种基于神经网络的控制模型以改善模拟和硬件性能之间的对齐度。进一步的演示表明,在扩展景深成像的情况下,与传统的焦点堆叠相比,视觉聚焦叠加在图像质量方面表现出色。 通过集成物体检测或眼动追踪技术,系统可以动态调整镜头以跟踪感兴趣的对象,从而实现适用于下游应用(如监控或视觉聚焦虚拟现实显示器)的实时视觉聚焦视频。

URL

https://arxiv.org/abs/2506.00716

PDF

https://arxiv.org/pdf/2506.00716.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot