Paper Reading AI Learner

BACS: Background Aware Continual Semantic Segmentation

2024-04-19 19:25:26
Mostafa ElAraby, Ali Harakeh, Liam Paull

Abstract

Semantic segmentation plays a crucial role in enabling comprehensive scene understanding for robotic systems. However, generating annotations is challenging, requiring labels for every pixel in an image. In scenarios like autonomous driving, there's a need to progressively incorporate new classes as the operating environment of the deployed agent becomes more complex. For enhanced annotation efficiency, ideally, only pixels belonging to new classes would be annotated. This approach is known as Continual Semantic Segmentation (CSS). Besides the common problem of classical catastrophic forgetting in the continual learning setting, CSS suffers from the inherent ambiguity of the background, a phenomenon we refer to as the "background shift'', since pixels labeled as background could correspond to future classes (forward background shift) or previous classes (backward background shift). As a result, continual learning approaches tend to fail. This paper proposes a Backward Background Shift Detector (BACS) to detect previously observed classes based on their distance in the latent space from the foreground centroids of previous steps. Moreover, we propose a modified version of the cross-entropy loss function, incorporating the BACS detector to down-weight background pixels associated with formerly observed classes. To combat catastrophic forgetting, we employ masked feature distillation alongside dark experience replay. Additionally, our approach includes a transformer decoder capable of adjusting to new classes without necessitating an additional classification head. We validate BACS's superior performance over existing state-of-the-art methods on standard CSS benchmarks.

Abstract (translated)

语义分割在使机器人系统全面理解场景方面扮演了关键角色。然而,生成注释是具有挑战性的,需要为图像的每个像素提供标签。在自动驾驶等场景中,需要随着部署代理的操作环境变得越来越复杂而逐步引入新的类别。为了提高注释效率,理想情况下,只注释属于新类别的像素。这种方法被称为连续语义分割(CSS)。 除了在连续学习设置中经典灾难性遗忘的问题之外,CSS还受到背景固有不确定性的困扰,这种现象我们称之为“背景漂移”,因为被标注为背景的像素可能对应未来的类别(前向背景漂移)或过去的类别(后向背景漂移)。因此,连续学习方法往往失败。本文提出了一种基于背景漂移检测器(BACS)的Backward Background Shift Detector(BACS)来检测以前观察到的类别,根据它们在先前步骤中前景集中点的距离。此外,我们还提出了一个修改过的交叉熵损失函数,该函数包含BACS检测器,用于减轻与以前观察到的类别相关的背景像素的权重。 为了应对灾难性遗忘,我们还采用遮罩特征蒸馏以及暗经验回放。此外,我们的方法包括一个能够适应新类别的Transformer解码器,而无需增加额外的分类头。我们在标准CSS基准测试上验证BACS的卓越性能。

URL

https://arxiv.org/abs/2404.13148

PDF

https://arxiv.org/pdf/2404.13148.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot