Abstract
Natural image matting algorithms aim to predict the transparency map (alpha-matte) with the trimap guidance. However, the production of trimaps often requires significant labor, which limits the widespread application of matting algorithms on a large scale. To address the issue, we propose Matte Anything model (MatAny), an interactive natural image matting model which could produce high-quality alpha-matte with various simple hints. The key insight of MatAny is to generate pseudo trimap automatically with contour and transparency prediction. We leverage task-specific vision models to enhance the performance of natural image matting. Specifically, we use the segment anything model (SAM) to predict high-quality contour with user interaction and an open-vocabulary (OV) detector to predict the transparency of any object. Subsequently, a pretrained image matting model generates alpha mattes with pseudo trimaps. MatAny is the interactive matting algorithm with the most supported interaction methods and the best performance to date. It consists of orthogonal vision models without any additional training. We evaluate the performance of MatAny against several current image matting algorithms, and the results demonstrate the significant potential of our approach.
Abstract (translated)
自然图像裁剪算法旨在利用三度地图指导预测透明度图(alpha matte),但生产三度地图通常需要大量的劳动,这限制了大规模应用裁剪算法。为了解决这个问题,我们提出了 Matte Anything模型(MatAny),它是一个交互式的自然图像裁剪模型,能够以各种简单提示生产高质量的alpha matte。MatAny的关键见解是自动生成伪三度地图,结合轮廓和透明度预测。我们利用特定的视觉任务模型来提高自然图像裁剪的性能。具体来说,我们使用片段 anything模型(SAM)通过用户交互预测高质量的轮廓,并使用开放词汇表(OV)探测器预测任何物体的透明度。随后,一个预训练的图像裁剪模型生成alpha mattes,使用伪三度地图。MatAny是支持最多交互方法和行为的最佳交互裁剪算法。它由两个互相垂直的视觉模型组成,不需要任何额外的训练。我们评估了MatAny的性能与几种当前的图像裁剪算法,结果证明了我们方法的重大潜力。
URL
https://arxiv.org/abs/2306.04121