Abstract
Current state-of-the-art two-stage models on instance segmentation task suffer from several types of imbalances. In this paper, we address the Intersection over the Union (IoU) distribution imbalance of positive input Regions of Interest (RoIs) during the training of the second stage. Our Self-Balanced R-CNN (SBR-CNN), an evolved version of the Hybrid Task Cascade (HTC) model, brings brand new loop mechanisms of bounding box and mask refinements. With an improved Generic RoI Extraction (GRoIE), we also address the feature-level imbalance at the Feature Pyramid Network (FPN) level, originated by a non-uniform integration between low- and high-level features from the backbone layers. In addition, the redesign of the architecture heads toward a fully convolutional approach with FCC further reduces the number of parameters and obtains more clues to the connection between the task to solve and the layers used. Moreover, our SBR-CNN model shows the same or even better improvements if adopted in conjunction with other state-of-the-art models. In fact, with a lightweight ResNet-50 as backbone, evaluated on COCO minival 2017 dataset, our model reaches 45.3% and 41.5% AP for object detection and instance segmentation, with 12 epochs and without extra tricks. The code is available at this https URL
Abstract (translated)
目前最先进的两阶段模型在实例分割任务中存在多种不平衡类型。在本文中,我们解决了在第二阶段训练过程中输入区域关键点(RoIs)的交集over联合(IoU)分布不平衡。我们的自平衡R-CNN(SBR-CNN)模型,是Hybrid Task Cascade(HTC)模型的进化版本,带来了边界框和掩码精度的循环机制。通过改进的通用RoI提取(GRoIE),我们还解决了特征层不平衡问题,源于低层和高层特征之间的非均匀整合。此外,模型的架构朝着全卷积方法迈进,FCC进一步减少了参数并获得更多关于任务要解决的和层使用的提示。此外,与最先进的其他模型相结合,我们的SBR-CNN模型显示出相同或更好的性能。事实上,使用轻量级的ResNet-50作为骨架,在COCO minival 2017数据集上评估,我们的模型达到45.3%和41.5%的AP,经过12个epoch和无需额外技巧。代码可在此处访问:https://url
URL
https://arxiv.org/abs/2404.16633