Paper Reading AI Learner

SEMEDA: Enhancing Segmentation Precision with Semantic Edge Aware Loss

2019-05-06 09:14:01
Yifu Chen, Arnaud Dapogny, Matthieu Cord

Abstract

While nowadays deep neural networks achieve impressive performances on semantic segmentation tasks, they are usually trained by optimizing pixel-wise losses such as cross-entropy. As a result, the predictions outputted by such networks usually struggle to accurately capture the object boundaries and exhibit holes inside the objects. In this paper, we propose a novel approach to improve the structure of the predicted segmentation masks. We introduce a novel semantic edge detection network, which allows to match the predicted and ground truth segmentation masks. This Semantic Edge-Aware strategy (SEMEDA) can be combined with any backbone deep network in an end-to-end training framework. Through thorough experimental validation on Pascal VOC 2012 and Cityscapes datasets, we show that the proposed SEMEDA approach enhances the structure of the predicted segmentation masks by enforcing sharp boundaries and avoiding discontinuities inside objects, improving the segmentation performance. In addition, our semantic edge-aware loss can be integrated into any popular segmentation network without requiring any additional annotation and with negligible computational load, as compared to standard pixel-wise cross-entropy loss.

Abstract (translated)

虽然目前深层神经网络在语义分割任务上取得了令人印象深刻的性能,但它们通常是通过优化像素损失(如交叉熵)来训练的。因此,这些网络输出的预测通常难以准确地捕捉对象边界并在对象内部显示孔。本文提出了一种改进预测分割掩模结构的新方法。我们引入了一种新的语义边缘检测网络,它可以匹配预测和地面真值分割掩模。这种语义边缘感知策略(semeda)可以在端到端的培训框架中与任何主干深度网络相结合。通过对Pascal VOC 2012和城市景观数据集的深入实验验证,我们发现所提出的semeda方法通过加强锐利的边界和避免对象内部的不连续性,提高了分割性能,从而增强了预测分割遮罩的结构。此外,与标准的像素级交叉熵损失相比,我们的语义边缘感知损失可以集成到任何流行的分割网络中,而无需任何附加注释和可忽略的计算负载。

URL

https://arxiv.org/abs/1905.01892

PDF

https://arxiv.org/pdf/1905.01892.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot