Paper Reading AI Learner

Path-Adaptive Matting for Efficient Inference Under Various Computational Cost Constraints

2025-03-05 06:56:42
Qinglin Liu, Zonglin Li, Xiaoqian Lv, Xin Sun, Ru Li, Shengping Zhang

Abstract

In this paper, we explore a novel image matting task aimed at achieving efficient inference under various computational cost constraints, specifically FLOP limitations, using a single matting network. Existing matting methods which have not explored scalable architectures or path-learning strategies, fail to tackle this challenge. To overcome these limitations, we introduce Path-Adaptive Matting (PAM), a framework that dynamically adjusts network paths based on image contexts and computational cost constraints. We formulate the training of the computational cost-constrained matting network as a bilevel optimization problem, jointly optimizing the matting network and the path estimator. Building on this formalization, we design a path-adaptive matting architecture by incorporating path selection layers and learnable connect layers to estimate optimal paths and perform efficient inference within a unified network. Furthermore, we propose a performance-aware path-learning strategy to generate path labels online by evaluating a few paths sampled from the prior distribution of optimal paths and network estimations, enabling robust and efficient online path learning. Experiments on five image matting datasets demonstrate that the proposed PAM framework achieves competitive performance across a range of computational cost constraints.

Abstract (translated)

在这篇论文中,我们探讨了一项新的图像抠图任务,旨在通过使用单一的抠图网络,在各种计算成本约束(特别是FLOP限制)下实现高效的推理。现有的未探索可扩展架构或路径学习策略的抠图方法无法应对这一挑战。为克服这些局限性,我们提出了Path-自适应抠图(PAM),这是一种框架,可以根据图像上下文和计算成本限制动态调整网络路径。我们将受计算成本约束的抠图网络的训练公式化为一个双层优化问题,同时对抠图网络和路径估计器进行联合优化。基于这一正式化,我们设计了一种路径自适应抠图架构,通过引入路径选择层和可学习连接层来估算最优路径,并在一个统一的网络中执行高效推理。此外,我们提出了一种感知性能的路径学习策略,通过评估从最优路径的先验分布以及网络估计中采样的少量路径,在线生成路径标签,从而实现稳健且高效的在线路径学习。在五个图像抠图数据集上的实验表明,所提出的PAM框架能够在各种计算成本约束下达到具有竞争力的表现水平。

URL

https://arxiv.org/abs/2503.03228

PDF

https://arxiv.org/pdf/2503.03228.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Time_Series Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot