Paper Reading AI Learner

Efficient Portrait Matte Creation With Layer Diffusion and Connectivity Priors

2025-01-27 15:41:19
Zhiyuan Lu, Hao Lu, Hua Huang

Abstract

Learning effective deep portrait matting models requires training data of both high quality and large quantity. Neither quality nor quantity can be easily met for portrait matting, however. Since the most accurate ground-truth portrait mattes are acquired in front of the green screen, it is almost impossible to harvest a large-scale portrait matting dataset in reality. This work shows that one can leverage text prompts and the recent Layer Diffusion model to generate high-quality portrait foregrounds and extract latent portrait mattes. However, the portrait mattes cannot be readily in use due to significant generation artifacts. Inspired by the connectivity priors observed in portrait images, that is, the border of portrait foregrounds always appears connected, a connectivity-aware approach is introduced to refine portrait mattes. Building on this, a large-scale portrait matting dataset is created, termed LD-Portrait-20K, with 20,051 portrait foregrounds and high-quality alpha mattes. Extensive experiments demonstrated the value of the LD-Portrait-20K dataset, with models trained on it significantly outperforming those trained on other datasets. In addition, comparisons with the chroma keying algorithm and an ablation study on dataset capacity further confirmed the effectiveness of the proposed matte creation approach. Further, the dataset also contributes to state-of-the-art video portrait matting, implemented by simple video segmentation and a trimap-based image matting model trained on this dataset.

Abstract (translated)

学习有效的深度肖像抠图模型需要高质量且数量庞大的训练数据。然而,对于肖像抠图来说,要同时满足这两点却非常困难。最准确的肖像抠图通常是通过在绿幕前拍摄获得的,因此现实中很难收集到大规模的肖像抠图数据集。这项工作表明,可以通过利用文本提示和最近的Layer Diffusion模型生成高质量的肖像前景并提取潜在的肖像抠图。然而,由于显著的生成伪影,这些肖像抠图不能直接使用。 受到肖像图像中连接性的启发,即肖像前景的边界总是连贯的,这里介绍了一种基于连接性感知的方法来优化肖像抠图。在此基础上,创建了一个大规模的数据集,命名为LD-Portrait-20K,包含了20,051个高质量的肖像前景和阿尔法遮罩。 广泛的实验展示了LD-Portrait-20K数据集的价值,使用该数据集训练的模型在性能上显著优于其他数据集上训练的模型。此外,与色键技术进行对比以及关于数据集容量的消融研究进一步证实了所提出的创建肖像抠图方法的有效性。 更进一步,这个数据集也对最先进的视频肖像抠图有所贡献,通过简单的视频分割和基于三值图像抠图模型实现,在该数据集上训练可以达到顶尖水平。

URL

https://arxiv.org/abs/2501.16147

PDF

https://arxiv.org/pdf/2501.16147.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot