Paper Reading AI Learner

Efficient Flow-Guided Multi-frame De-fencing

2023-01-25 18:42:59
Stavros Tsogkas, Fengjia Zhang, Allan Jepson, Alex Levinshtein

Abstract

Taking photographs ''in-the-wild'' is often hindered by fence obstructions that stand between the camera user and the scene of interest, and which are hard or impossible to avoid. De-fencing is the algorithmic process of automatically removing such obstructions from images, revealing the invisible parts of the scene. While this problem can be formulated as a combination of fence segmentation and image inpainting, this often leads to implausible hallucinations of the occluded regions. Existing multi-frame approaches rely on propagating information to a selected keyframe from its temporal neighbors, but they are often inefficient and struggle with alignment of severely obstructed images. In this work we draw inspiration from the video completion literature and develop a simplified framework for multi-frame de-fencing that computes high quality flow maps directly from obstructed frames and uses them to accurately align frames. Our primary focus is efficiency and practicality in a real-world setting: the input to our algorithm is a short image burst (5 frames) - a data modality commonly available in modern smartphones - and the output is a single reconstructed keyframe, with the fence removed. Our approach leverages simple yet effective CNN modules, trained on carefully generated synthetic data, and outperforms more complicated alternatives real bursts, both quantitatively and qualitatively, while running real-time.

Abstract (translated)

拍摄野生照片时,常常受到围栏障碍的限制,这些障碍位于相机用户与感兴趣的场景之间,难以避免。裁剪是自动从图像中去除这些障碍的算法过程,揭示场景的隐形部分。尽管这个问题可以组合成围栏分割和图像填充的组合,但这常常导致不可信的幻觉,使遮挡区域变得模糊。现有的多帧方法依赖于从其时间邻居中传播信息到选定关键帧的信息传递,但它们往往效率和 aligned 图像的匹配方面表现不佳。在本作品中,我们从视频完成文献中汲取灵感,并开发了简化的多帧裁剪框架,直接从遮挡帧计算高质量的流地图,并使用它们精确对齐帧。我们的主要目标是在实际环境中提高效率和实用性:我们的算法输入是一张简短的图像帧(5帧),这是一种现代智能手机中常见的数据模式,输出是一个重构的关键帧,围栏已被移除。我们的方法利用简单但有效的卷积神经网络模块,在精心生成的人工数据上进行训练,并在实时运行时比更复杂的替代方法表现出更好的质量和数量上的结果。

URL

https://arxiv.org/abs/2301.10759

PDF

https://arxiv.org/pdf/2301.10759.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot