Paper Reading AI Learner

Rethinking Context Aggregation in Natural Image Matting

2023-04-03 17:40:30
Qinglin Liu, Shengping Zhang, Quanling Meng, Ru Li, Bineng Zhong, Liqiang Nie

Abstract

For natural image matting, context information plays a crucial role in estimating alpha mattes especially when it is challenging to distinguish foreground from its background. Exiting deep learning-based methods exploit specifically designed context aggregation modules to refine encoder features. However, the effectiveness of these modules has not been thoroughly explored. In this paper, we conduct extensive experiments to reveal that the context aggregation modules are actually not as effective as expected. We also demonstrate that when learned on large image patches, basic encoder-decoder networks with a larger receptive field can effectively aggregate context to achieve better performance.Upon the above findings, we propose a simple yet effective matting network, named AEMatter, which enlarges the receptive field by incorporating an appearance-enhanced axis-wise learning block into the encoder and adopting a hybrid-transformer decoder. Experimental results on four datasets demonstrate that our AEMatter significantly outperforms state-of-the-art matting methods (e.g., on the Adobe Composition-1K dataset, \textbf{25\%} and \textbf{40\%} reduction in terms of SAD and MSE, respectively, compared against MatteFormer). The code and model are available at \url{this https URL}.

Abstract (translated)

对于自然图像剪辑,上下文信息在估计 alpha 剪辑方面发挥着至关重要的作用,特别是在难以区分前景和背景时。深度学习方法的exit点利用专门设计的上下文聚合模块来优化编码特征。然而,这些模块的效果并没有得到充分探索。在本文中,我们进行了广泛的实验,以揭示上下文聚合模块实际上并不像预期那样有效。我们还证明,在大型图像补丁上学习时,具有更大接收域的基本编码-解码网络可以 effectively 聚合上下文,实现更好的性能。基于以上发现,我们提出了一种简单但有效的剪辑网络,名为AEMatter,它通过将增强的外观轴学习块嵌入编码器并采用混合Transformer解码器,扩大了接收域。在四个数据集上的实验结果表明,我们的AEMatter在剪接方法中 significantly 超越了当前最先进的方法(例如,在Adobe Composition-1K数据集上,与前一种剪接方法相比,SAD和MSE分别减少了25%和40%)。代码和模型可在 \url{this https URL} 上获取。

URL

https://arxiv.org/abs/2304.01171

PDF

https://arxiv.org/pdf/2304.01171.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot