Paper Reading AI Learner

AURA : Automatic Mask Generator using Randomized Input Sampling for Object Removal

2023-05-13 07:51:35
Changsuk Oh, Dongseok Shim, H. Jin Kim

Abstract

The objective of the image inpainting task is to fill missing regions of an image in a visually plausible way. Recently, deep-learning-based image inpainting networks have generated outstanding results, and some utilize their models as object removers by masking unwanted objects in an image. However, while trying to better remove objects using their networks, the previous works pay less attention to the importance of the input mask. In this paper, we focus on generating the input mask to better remove objects using the off-the-shelf image inpainting network. We propose an automatic mask generator inspired by the explainable AI (XAI) method, whose output can better remove objects than a semantic segmentation mask. The proposed method generates an importance map using randomly sampled input masks and quantitatively estimated scores of the completed images obtained from the random masks. The output mask is selected by a judge module among the candidate masks which are generated from the importance map. We design the judge module to quantitatively estimate the quality of the object removal results. In addition, we empirically find that the evaluation methods used in the previous works reporting object removal results are not appropriate for estimating the performance of an object remover. Therefore, we propose new evaluation metrics (FID$^*$ and U-IDS$^*$) to properly evaluate the quality of object removers. Experiments confirm that our method shows better performance in removing target class objects than the masks generated from the semantic segmentation maps, and the two proposed metrics make judgments consistent with humans.

Abstract (translated)

图像填充任务的目标是在视觉效果上合理地填充图像中的空缺区域。近年来,基于深度学习的图像填充网络取得了卓越的结果,有些网络将其模型用作对象删除器,在图像中遮盖不希望出现的对象。然而,在使用网络进行对象删除时,以前的工作往往不太关注输入掩码的重要性。在本文中,我们重点讨论如何生成常用的图像填充网络输入掩码,以更好地使用它们进行对象删除。我们提出了一种基于可解释AI(XAI)方法的自动掩码生成器,其输出可以更好地删除对象,比语义分割掩码更有效。该方法使用随机采样输入掩码生成一个重要性地图,并 quantitatively 估算从随机掩码中生成的完整图像的得分。生成的输出掩码由评判模块选择。我们设计了评判模块以 quantitatively 估算对象删除结果的质量。此外,我们经验证,以前报告的对象删除结果所采用的评价方法不适合估计对象删除器的性能。因此,我们提出了新的评价指标(FID$^*$和U-IDS$^*$),以正确评估对象删除器的质量。实验证实,我们的方法在删除目标类对象方面表现出更好的性能,比从语义分割映射中生成的掩码更有效。我们提出的两个评价指标也以人类一致的方式做出了判断。

URL

https://arxiv.org/abs/2305.07857

PDF

https://arxiv.org/pdf/2305.07857.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot