Paper Reading AI Learner

Dual-Context Aggregation for Universal Image Matting

2024-02-28 06:56:24
Qinglin Liu, Xiaoqian Lv, Wei Yu, Changyong Guo, Shengping Zhang

Abstract

Natural image matting aims to estimate the alpha matte of the foreground from a given image. Various approaches have been explored to address this problem, such as interactive matting methods that use guidance such as click or trimap, and automatic matting methods tailored to specific objects. However, existing matting methods are designed for specific objects or guidance, neglecting the common requirement of aggregating global and local contexts in image matting. As a result, these methods often encounter challenges in accurately identifying the foreground and generating precise boundaries, which limits their effectiveness in unforeseen scenarios. In this paper, we propose a simple and universal matting framework, named Dual-Context Aggregation Matting (DCAM), which enables robust image matting with arbitrary guidance or without guidance. Specifically, DCAM first adopts a semantic backbone network to extract low-level features and context features from the input image and guidance. Then, we introduce a dual-context aggregation network that incorporates global object aggregators and local appearance aggregators to iteratively refine the extracted context features. By performing both global contour segmentation and local boundary refinement, DCAM exhibits robustness to diverse types of guidance and objects. Finally, we adopt a matting decoder network to fuse the low-level features and the refined context features for alpha matte estimation. Experimental results on five matting datasets demonstrate that the proposed DCAM outperforms state-of-the-art matting methods in both automatic matting and interactive matting tasks, which highlights the strong universality and high performance of DCAM. The source code is available at \url{this https URL}.

Abstract (translated)

自然图像合成旨在从给定的图像中估计前景的alpha遮罩。为了解决这个问题,已经探索了许多方法,例如使用点击或剪裁映像的交互式遮罩方法和针对特定对象的自动遮罩方法。然而,现有的遮罩方法都是为特定物体或指导设计的,忽视了图像遮罩中全局和局部上下文整合的常见要求。因此,这些方法通常会在准确识别前景和生成精确边界方面遇到困难,从而限制其在未知场景中的有效性。在本文中,我们提出了一个简单而通用的遮罩框架,名为双上下文聚合遮罩(DCAM),它具有任意指导或无指导的鲁棒图像合成能力。具体来说,DCAM首先采用语义骨架网络从输入图像和指导中提取低级特征和上下文特征。然后,我们引入了一种双上下文聚合网络,它包括全局物体聚合器和局部外观聚合器,用于迭代优化提取的上下文特征。通过执行全局轮廓分割和局部边界修复,DCAM在各种类型的指导和物体上表现出鲁棒性。最后,我们采用遮罩解码器网络将低级特征和修复后的上下文特征融合进行alpha遮罩估计。在五个遮罩数据集上的实验结果表明,与最先进的遮罩方法相比,DCAM在自动遮罩和交互式遮罩任务上都表现出卓越的性能,这突出了DCAM的宽泛性和高性能。源代码可在此处访问:\url{this <https://this URL>.

URL

https://arxiv.org/abs/2402.18109

PDF

https://arxiv.org/pdf/2402.18109.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot