Abstract
In this paper, we propose the Matting Anything Model (MAM), an efficient and versatile framework for estimating the alpha matte of any instance in an image with flexible and interactive visual or linguistic user prompt guidance. MAM offers several significant advantages over previous specialized image matting networks: (i) MAM is capable of dealing with various types of image matting, including semantic, instance, and referring image matting with only a single model; (ii) MAM leverages the feature maps from the Segment Anything Model (SAM) and adopts a lightweight Mask-to-Matte (M2M) module to predict the alpha matte through iterative refinement, which has only 2.7 million trainable parameters. (iii) By incorporating SAM, MAM simplifies the user intervention required for the interactive use of image matting from the trimap to the box, point, or text prompt. We evaluate the performance of MAM on various image matting benchmarks, and the experimental results demonstrate that MAM achieves comparable performance to the state-of-the-art specialized image matting models under different metrics on each benchmark. Overall, MAM shows superior generalization ability and can effectively handle various image matting tasks with fewer parameters, making it a practical solution for unified image matting. Our code and models are open-sourced at this https URL.
Abstract (translated)
本文提出了“裁剪任何东西模型”(MAM),一个高效、多功能的框架,能够在灵活的、交互式的可视化或语言用户提示下,估计图像中任意实例的Alpha matte。相较于以往的专门图像处理网络,MAM提供了多项显著优势:(i) MAM能够处理各种图像处理,包括语义、实例和引用图像处理,仅使用一种模型即可;(ii) MAM利用Segment anything Model(SAM)的特征映射,采用轻量级Mask-to-Matte(M2M)模块进行迭代优化,以预测Alpha matte,该模块仅有2.7百万可训练参数;(iii) 通过集成SAM,MAM简化了用户对于图像处理交互使用的干预,从Trimap到盒子、点或文本提示等各个基准的交互使用所需的用户干预均简化了许多参数。我们评估了MAM在各种图像处理基准上的表现,实验结果表明,MAM在每个基准上的表现与最先进的专门图像处理模型的相似度达到了最高水平。总的来说,MAM具有更好的泛化能力,能够以较少的参数有效处理各种图像处理任务,使其成为统一图像处理的实用解决方案。我们的代码和模型在此httpsURL上开源。
URL
https://arxiv.org/abs/2306.05399