Paper Reading AI Learner

Training Matting Models without Alpha Labels

2024-08-20 04:34:06
Wenze Liu, Zixuan Ye, Hao Lu, Zhiguo Cao, Xiangyu Yue

Abstract

The labelling difficulty has been a longstanding problem in deep image matting. To escape from fine labels, this work explores using rough annotations such as trimaps coarsely indicating the foreground/background as supervision. We present that the cooperation between learned semantics from indicated known regions and proper assumed matting rules can help infer alpha values at transition areas. Inspired by the nonlocal principle in traditional image matting, we build a directional distance consistency loss (DDC loss) at each pixel neighborhood to constrain the alpha values conditioned on the input image. DDC loss forces the distance of similar pairs on the alpha matte and on its corresponding image to be consistent. In this way, the alpha values can be propagated from learned known regions to unknown transition areas. With only images and trimaps, a matting model can be trained under the supervision of a known loss and the proposed DDC loss. Experiments on AM-2K and P3M-10K dataset show that our paradigm achieves comparable performance with the fine-label-supervised baseline, while sometimes offers even more satisfying results than human-labelled ground truth. Code is available at \url{this https URL}.

Abstract (translated)

标签难度一直是一个长期存在于深度图像合成中的问题。为了逃避细粒度标签,这项工作探讨了使用粗粒度注释,如剪切面表示前景/背景,作为监督。我们发现,指示已知区域的预训练语义和学习到的语义之间的合作可以帮助推断在过渡区域中的alpha值。受到传统图像合成中非局部原则的启发,我们在每个像素邻域中建立了一个方向距离一致损失(DDC损失),用于约束基于输入图像的alpha值。DDC损失迫使alphamatte和其相应图像中类似对的距离保持一致。以这种方式,从预训练已知区域中传播alpha值到未知的过渡区域。仅使用图像和剪切面,可以在已知损失的监督下训练合成模型。在AM-2K和P3M-10K数据集上进行的实验证明,我们的范式与细粒度标签监督基线具有可比较的性能,有时甚至比人类标注的地面真实现得更令人满意。代码可在此处访问:\url{this <https://this URL>.}

URL

https://arxiv.org/abs/2408.10539

PDF

https://arxiv.org/pdf/2408.10539.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot