Abstract
Image matting requires high-quality pixel-level human annotations to support the training of a deep model in recent literature. Whereas such annotation is costly and hard to scale, significantly holding back the development of the research. In this work, we make the first attempt towards addressing this problem, by proposing a self-supervised pre-training approach that can leverage infinite numbers of data to boost the matting performance. The pre-training task is designed in a similar manner as image matting, where random trimap and alpha matte are generated to achieve an image disentanglement objective. The pre-trained model is then used as an initialisation of the downstream matting task for fine-tuning. Extensive experimental evaluations show that the proposed approach outperforms both the state-of-the-art matting methods and other alternative self-supervised initialisation approaches by a large margin. We also show the robustness of the proposed approach over different backbone architectures. The code and models will be publicly available.
Abstract (translated)
图像剪辑需要高质量的像素级人类标注来支持最近文献中的深度学习模型训练。而这种标注成本较高且难以规模扩展,极大地限制了研究的发展。在本研究中,我们尝试解决这个问题,提出了一种自我监督的前训练方法,可以利用无限数据来增强剪辑性能。前训练任务与图像剪辑类似,通过生成随机三视图和Alpha matte来实现图像分离目标。训练好的模型 then 用作后续剪辑任务的初始化,进行微调。广泛的实验评估表明, proposed 方法比最先进的剪辑方法和其他替代的自我监督初始化方法表现更好。我们还展示了该方法在不同主干架构下的鲁棒性。代码和模型将公开可用。
URL
https://arxiv.org/abs/2304.00784