Paper Reading AI Learner

UDC-VIT: A Real-World Video Dataset for Under-Display Cameras

2025-01-30 18:13:29
Kyusu Ahn, JiSoo Kim, Sangik Lee, HyunGyu Lee, Byeonghyun Ko, Chanwoo Park, Jaejin Lee

Abstract

Under Display Camera (UDC) is an advanced imaging system that places a digital camera lens underneath a display panel, effectively concealing the camera. However, the display panel significantly degrades captured images or videos, introducing low transmittance, blur, noise, and flare issues. Tackling such issues is challenging because of the complex degradation of UDCs, including diverse flare patterns. Despite extensive research on UDC images and their restoration models, studies on videos have yet to be significantly explored. While two UDC video datasets exist, they primarily focus on unrealistic or synthetic UDC degradation rather than real-world UDC degradation. In this paper, we propose a real-world UDC video dataset called UDC-VIT. Unlike existing datasets, only UDC-VIT exclusively includes human motions that target facial recognition. We propose a video-capturing system to simultaneously acquire non-degraded and UDC-degraded videos of the same scene. Then, we align a pair of captured videos frame by frame, using discrete Fourier transform (DFT). We compare UDC-VIT with six representative UDC still image datasets and two existing UDC video datasets. Using six deep-learning models, we compare UDC-VIT and an existing synthetic UDC video dataset. The results indicate the ineffectiveness of models trained on earlier synthetic UDC video datasets, as they do not reflect the actual characteristics of UDC-degraded videos. We also demonstrate the importance of effective UDC restoration by evaluating face recognition accuracy concerning PSNR, SSIM, and LPIPS scores. UDC-VIT enables further exploration in the UDC video restoration and offers better insights into the challenge. UDC-VIT is available at our project site.

Abstract (translated)

隐藏在显示屏下的摄像头(UDC)是一种高级成像系统,它将数字相机镜头放置于显示面板下方,从而有效隐藏了摄像头。然而,这种设计显著降低了拍摄图像或视频的质量,引入了透光率低、模糊度高、噪声和眩光等问题。解决这些问题极具挑战性,因为UDC的退化情况复杂多变,包括各种不同的眩光模式。尽管针对UDC图像及其恢复模型的研究已经非常广泛,但对于视频的相关研究却尚未得到充分探索。目前虽然存在两个UDC视频数据集,但它们主要关注的是不现实或合成的UDC退化问题,而非真实世界中的UDC退化。 在本文中,我们提出了一个名为UDC-VIT的真实世界UDC视频数据集。与现有的数据集不同,只有UDC-VIT专门包含了针对面部识别的人类动作场景。为了构建这一数据集,我们设计了一套视频采集系统,用于同时获取同一场景下的未退化和UDC退化的视频片段。然后,使用离散傅里叶变换(DFT),我们将捕获的一对视频逐帧对齐。 与六个具有代表性的UDC静态图像数据集以及两个现有的UDC视频数据集进行对比后,我们利用六种深度学习模型比较了UDC-VIT和现有合成UDC视频数据集。研究结果表明,基于早期合成UDC视频数据集训练的模型在处理真实世界中的UDC退化问题上效果不佳,因为它们未能准确反映实际的特征。 此外,通过评估PSNR、SSIM和LPIPS评分下的面部识别精度,我们展示了有效恢复UDC的重要性。UDC-VIT为UDC视频修复的研究提供了进一步探索的可能性,并更好地揭示了这一挑战的真实情况。UDC-VIT在我们的项目网站上可以获取。

URL

https://arxiv.org/abs/2501.18545

PDF

https://arxiv.org/pdf/2501.18545.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot