Paper Reading AI Learner

Matte Anything: Interactive Natural Image Matting with Segment Anything Models

2023-06-07 03:31:39
Jingfeng Yao, Xinggang Wang, Lang Ye, Wenyu Liu

Abstract

Natural image matting algorithms aim to predict the transparency map (alpha-matte) with the trimap guidance. However, the production of trimaps often requires significant labor, which limits the widespread application of matting algorithms on a large scale. To address the issue, we propose Matte Anything model (MatAny), an interactive natural image matting model which could produce high-quality alpha-matte with various simple hints. The key insight of MatAny is to generate pseudo trimap automatically with contour and transparency prediction. We leverage task-specific vision models to enhance the performance of natural image matting. Specifically, we use the segment anything model (SAM) to predict high-quality contour with user interaction and an open-vocabulary (OV) detector to predict the transparency of any object. Subsequently, a pretrained image matting model generates alpha mattes with pseudo trimaps. MatAny is the interactive matting algorithm with the most supported interaction methods and the best performance to date. It consists of orthogonal vision models without any additional training. We evaluate the performance of MatAny against several current image matting algorithms, and the results demonstrate the significant potential of our approach.

Abstract (translated)

自然图像裁剪算法旨在利用三度地图指导预测透明度图(alpha matte),但生产三度地图通常需要大量的劳动,这限制了大规模应用裁剪算法。为了解决这个问题,我们提出了 Matte Anything模型(MatAny),它是一个交互式的自然图像裁剪模型,能够以各种简单提示生产高质量的alpha matte。MatAny的关键见解是自动生成伪三度地图,结合轮廓和透明度预测。我们利用特定的视觉任务模型来提高自然图像裁剪的性能。具体来说,我们使用片段 anything模型(SAM)通过用户交互预测高质量的轮廓,并使用开放词汇表(OV)探测器预测任何物体的透明度。随后,一个预训练的图像裁剪模型生成alpha mattes,使用伪三度地图。MatAny是支持最多交互方法和行为的最佳交互裁剪算法。它由两个互相垂直的视觉模型组成,不需要任何额外的训练。我们评估了MatAny的性能与几种当前的图像裁剪算法,结果证明了我们方法的重大潜力。

URL

https://arxiv.org/abs/2306.04121

PDF

https://arxiv.org/pdf/2306.04121.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot