Paper Reading AI Learner

Learing Trimaps via Clicks for Image Matting

2024-03-30 12:10:34
Chenyi Zhang, Yihan Hu, Henghui Ding, Humphrey Shi, Yao Zhao, Yunchao Wei

Abstract

Despite significant advancements in image matting, existing models heavily depend on manually-drawn trimaps for accurate results in natural image scenarios. However, the process of obtaining trimaps is time-consuming, lacking user-friendliness and device compatibility. This reliance greatly limits the practical application of all trimap-based matting methods. To address this issue, we introduce Click2Trimap, an interactive model capable of predicting high-quality trimaps and alpha mattes with minimal user click inputs. Through analyzing real users' behavioral logic and characteristics of trimaps, we successfully propose a powerful iterative three-class training strategy and a dedicated simulation function, making Click2Trimap exhibit versatility across various scenarios. Quantitative and qualitative assessments on synthetic and real-world matting datasets demonstrate Click2Trimap's superior performance compared to all existing trimap-free matting methods. Especially, in the user study, Click2Trimap achieves high-quality trimap and matting predictions in just an average of 5 seconds per image, demonstrating its substantial practical value in real-world applications.

Abstract (translated)

尽管在图像修饰方面取得了显著的进步,但现有的模型仍然高度依赖人工绘制的截面图以实现自然图像场景下的准确结果。然而,获取截面图的过程耗时且缺乏用户友好性,也不支持设备兼容性。这种依赖大大限制了所有基于截面图的修饰方法的实际应用。为解决这个问题,我们引入了Click2Trimap,一种具有预测高质量截面图和透明度贴图功能的有交互式模型。通过分析真实用户的心理逻辑和截面图的特点,我们成功提出了一种强大的迭代三分类训练策略和专门的模拟功能,使Click2Trimap在各种场景中都表现出 versatility。在合成和真实世界数据集上的定量定性和定性评估表明,与所有现有的截面图免费修饰方法相比,Click2Trimap的性能优越。特别是在用户研究中,Click2Trimap仅在每张图片上平均需要5秒钟即可实现高质量截面图和贴图预测,这充分证明了其在现实世界应用中的实际价值。

URL

https://arxiv.org/abs/2404.00335

PDF

https://arxiv.org/pdf/2404.00335.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot