Paper Reading AI Learner

Self-Balanced R-CNN for Instance Segmentation

2024-04-25 14:22:44
Leonardo Rossi, Akbar Karimi, Andrea Prati

Abstract

Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at this https URL

Abstract (translated)

目前最先进的两阶段模型在实例分割任务中存在多种不平衡类型。在本文中,我们解决了在第二阶段训练过程中输入区域关键点(RoIs)的交集over联合(IoU)分布不平衡。我们的自平衡R-CNN(SBR-CNN)模型,是Hybrid Task Cascade(HTC)模型的进化版本,带来了边界框和掩码精度的循环机制。通过改进的通用RoI提取(GRoIE),我们还解决了特征层不平衡问题,源于低层和高层特征之间的非均匀整合。此外,模型的架构朝着全卷积方法迈进,FCC进一步减少了参数并获得更多关于任务要解决的和层使用的提示。此外,与最先进的其他模型相结合,我们的SBR-CNN模型显示出相同或更好的性能。事实上,使用轻量级的ResNet-50作为骨架,在COCO minival 2017数据集上评估,我们的模型达到45.3%和41.5%的AP,经过12个epoch和无需额外技巧。代码可在此处访问:https://url

URL

https://arxiv.org/abs/2404.16633

PDF

https://arxiv.org/pdf/2404.16633.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot