Paper Reading AI Learner

ASAM: Boosting Segment Anything Model with Adversarial Tuning

2024-05-01 00:13:05
Bo Li, Haoke Xiao, Lv Tang

Abstract

In the evolving landscape of computer vision, foundation models have emerged as pivotal tools, exhibiting exceptional adaptability to a myriad of tasks. Among these, the Segment Anything Model (SAM) by Meta AI has distinguished itself in image segmentation. However, SAM, like its counterparts, encounters limitations in specific niche applications, prompting a quest for enhancement strategies that do not compromise its inherent capabilities. This paper introduces ASAM, a novel methodology that amplifies SAM's performance through adversarial tuning. We harness the potential of natural adversarial examples, inspired by their successful implementation in natural language processing. By utilizing a stable diffusion model, we augment a subset (1%) of the SA-1B dataset, generating adversarial instances that are more representative of natural variations rather than conventional imperceptible perturbations. Our approach maintains the photorealism of adversarial examples and ensures alignment with original mask annotations, thereby preserving the integrity of the segmentation task. The fine-tuned ASAM demonstrates significant improvements across a diverse range of segmentation tasks without necessitating additional data or architectural modifications. The results of our extensive evaluations confirm that ASAM establishes new benchmarks in segmentation tasks, thereby contributing to the advancement of foundational models in computer vision. Our project page is in this https URL.

Abstract (translated)

在计算机视觉不断演变的领域中,基础模型已经成为了重要的工具,表现出了对各种任务非常出色的适应性。在这些基础模型中,元人工智能(Meta AI)的分割 Anything Model(SAM)在图像分割方面表现出了卓越的适应性。然而,与它的同类模型一样,SAM在特定领域应用中遇到了局限性,催生了不牺牲其固有能力的增强策略的寻求。本文介绍了 ASAM,一种通过对抗调整来提高 SAM性能的新型方法。我们利用自然 adversarial 示例的潜力,受到它们在自然语言处理领域成功实施的影响。通过使用稳定的扩散模型,我们扩展了 SA-1B 数据集中的子集(1%),生成了更具有代表性的自然变化实例,而不是传统的不可见的扰动。我们的方法保持了对 adversarial 示例的 photorealism,并确保与原始掩码注释保持一致,从而保留分割任务的完整性。经过我们进行的广泛评估,ASAM 在各种分割任务上都取得了显著的改进,而无需增加数据或架构修改。我们广泛的评估结果证实,ASAM在分割任务上建立了新的基准,从而促进了计算机视觉基础模型的发展。我们的项目页面在这个链接。

URL

https://arxiv.org/abs/2405.00256

PDF

https://arxiv.org/pdf/2405.00256.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot