Paper Reading AI Learner

In-Context Matting

2024-03-23 10:32:29
He Guo, Zixuan Ye, Zhiguo Cao, Hao Lu

Abstract

We introduce in-context matting, a novel task setting of image matting. Given a reference image of a certain foreground and guided priors such as points, scribbles, and masks, in-context matting enables automatic alpha estimation on a batch of target images of the same foreground category, without additional auxiliary input. This setting marries good performance in auxiliary input-based matting and ease of use in automatic matting, which finds a good trade-off between customization and automation. To overcome the key challenge of accurate foreground matching, we introduce IconMatting, an in-context matting model built upon a pre-trained text-to-image diffusion model. Conditioned on inter- and intra-similarity matching, IconMatting can make full use of reference context to generate accurate target alpha mattes. To benchmark the task, we also introduce a novel testing dataset ICM-$57$, covering 57 groups of real-world images. Quantitative and qualitative results on the ICM-57 testing set show that IconMatting rivals the accuracy of trimap-based matting while retaining the automation level akin to automatic matting. Code is available at this https URL

Abstract (translated)

我们提出了一个名为"in-context matting"的新图像配对任务,这是一种图像配对任务,其旨在解决基于辅助输入的图像配对中的关键挑战 - 准确的目标匹配。在本文中,我们将介绍一种基于预训练文本到图像扩散模型的"in-context matting"模型,该模型可以在不需要额外辅助输入的情况下,对同一目标类别的目标图像进行自动alpha估计。这一设置将辅助输入驱动的图像配对和自动配对中易用性的优势相结合,找到了定制化和自动化之间的良好平衡。为了克服准确目标匹配的关键挑战,我们引入了"IconMatting"模型,这是一种基于预训练文本到图像扩散模型的"in-context matting"模型。通过条件处理互相似性和内部相似性,IconMatting可以充分利用参考上下文生成准确的靶alpha Mattes。为了验证该任务,我们还引入了"ICM-$57"测试数据集,涵盖了57个真实世界图像组。在ICM-57测试集中的定量和定性结果表明,IconMatting与基于trimap的图像配对中的准确性相媲美,同时保留了自动配对中类似于自动配对的自动化水平。代码可以从该链接https://该链接中获取。

URL

https://arxiv.org/abs/2403.15789

PDF

https://arxiv.org/pdf/2403.15789.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot