Paper Reading AI Learner

DIME-Net: Neural Network-Based Dynamic Intrinsic Parameter Rectification for Cameras with Optical Image Stabilization System

2023-03-20 17:45:12
Shu-Hao Yeh, Shuangyu Xie, Di Wang, Wei Yan, Dezhen Song

Abstract

Optical Image Stabilization (OIS) system in mobile devices reduces image blurring by steering lens to compensate for hand jitters. However, OIS changes intrinsic camera parameters (i.e. $\mathrm{K}$ matrix) dynamically which hinders accurate camera pose estimation or 3D reconstruction. Here we propose a novel neural network-based approach that estimates $\mathrm{K}$ matrix in real-time so that pose estimation or scene reconstruction can be run at camera native resolution for the highest accuracy on mobile devices. Our network design takes gratified projection model discrepancy feature and 3D point positions as inputs and employs a Multi-Layer Perceptron (MLP) to approximate $f_{\mathrm{K}}$ manifold. We also design a unique training scheme for this network by introducing a Back propagated PnP (BPnP) layer so that reprojection error can be adopted as the loss function. The training process utilizes precise calibration patterns for capturing accurate $f_{\mathrm{K}}$ manifold but the trained network can be used anywhere. We name the proposed Dynamic Intrinsic Manifold Estimation network as DIME-Net and have it implemented and tested on three different mobile devices. In all cases, DIME-Net can reduce reprojection error by at least $64\%$ indicating that our design is successful.

Abstract (translated)

在移动设备上,光学图像稳定器(OIS)系统可以减少镜头引导镜片造成的图像模糊,以补偿手抖动。然而,OIS动态地改变了相机固有的参数(即$\mathrm{K}$矩阵),这妨碍了准确的相机姿态估计或三维重建。在此,我们提出了一种基于神经网络的新方法,实时估计$\mathrm{K}$矩阵,以便在移动设备上以相机原生分辨率运行姿态估计或场景重建,以获得最高的准确性。我们的网络设计使用了满足愉悦投影模型差异特征和3D点位置的输入,并使用多层感知器(MLP)近似$f_{\mathrm{K}}$万维网。我们还设计了一个独特的训练方案,引入了反向传播PnP层,以便将投影误差作为损失函数使用。训练过程使用了精确的校准模式来捕捉准确的$f_{\mathrm{K}}$万维网,但训练网络可以应用于任何设备。我们称之为动态固有万维网估计网络,将其在三个不同的移动设备上实现和测试。在所有情况下,DIME-Net都可以至少减少投影误差64%,这表明我们的设计是成功的。

URL

https://arxiv.org/abs/2303.11307

PDF

https://arxiv.org/pdf/2303.11307.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot