Paper Reading AI Learner

Faster Training of Mask R-CNN by Focusing on Instance Boundaries

2018-09-19 08:54:18
Roland S. Zimmermann, Julien N. Siems

Abstract

We present an auxiliary task to Mask R-CNN, an instance segmentation network, which leads to faster training of the mask head. Our addition to Mask R-CNN is a new prediction head, the Edge Agreement Head, which is inspired by the way human annotators perform instance segmentation. Human annotators copy the contour of an object instance and only indirectly the occupied instance area. Hence, the edges of instance masks are particularly useful as they characterize the instance well. The Edge Agreement Head therefore encourages predicted masks to have similar image gradients to the groundtruth mask using edge detection filters. We provide a detailed survey of loss combinations and show improvements on the MS COCO Mask metrics compared to using no additional loss. Our approach marginally increases the model size and adds no additional trainable model variables. While the computational costs are increased slightly, the increment is negligible considering the high computational cost of the Mask R-CNN architecture. As the additional network head is only relevant during training, inference speed remains unchanged compared to Mask R-CNN. In a default Mask R-CNN setup, we achieve a training speed up of 29% and an overall improvement of 8.1% on the MS COCO metrics compared to the baseline.

Abstract (translated)

我们向Mask R-CNN(一个实例分割网络)提出了一个辅助任务,它可以加快掩模头的训练。我们对Mask R-CNN的补充是一个新的预测头,边缘协议头,它的灵感来自人类注释器执行实例分割的方式。人类注释器复制对象实例的轮廓,仅间接复制占用的实例区域。因此,实例掩码的边缘特别有用,因为它们很好地表征了实例。因此,边缘协议头使用边缘检测滤波器鼓励预测的掩模具有与地面掩模类似的图像梯度。我们提供了详细的损失组合调查,并显示了MS COCO Mask指标的改进,与不使用额外损失相比。我们的方法略微增加了模型尺寸,并且没有添加额外的可训练模型变量。虽然计算成本略有增加,但考虑到掩模R-CNN架构的高计算成本,增量可忽略不计。由于附加网络头仅在训练期间相关,因此与掩模R-CNN相比,推断速度保持不变。在默认的Mask R-CNN设置中,与基线相比,MS COCO指标的培训速度提高了29%,整体提升了8.1%。

URL

https://arxiv.org/abs/1809.07069

PDF

https://arxiv.org/pdf/1809.07069.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot