Paper Reading AI Learner

Weakly Supervised Object Detection with 2D and 3D Regression Neural Networks

2019-06-05 09:08:38
Florian Dubost, Hieab Adams, Pinar Yilmaz, Gerda Bortsova, Gijs van Tulder, M. Arfan Ikram, Wiro Niessen, Meike Vernooij, Marleen de Bruijne

Abstract

Weakly supervised detection methods can infer the location of target objects in an image without requiring location or appearance information during training. We propose a weakly supervised deep learning method for the detection of objects that appear at multiple locations in an image. The method computes attention maps using the last feature maps of an encoder-decoder network optimized only with global labels: the number of occurrences of the target object in an image. In contrast with previous approaches, attention maps are generated at full input resolution thanks to the decoder part. The proposed approach is compared to multiple state-of-the-art methods in two tasks: the detection of digits in MNIST-based datasets, and the real life application of detection of enlarged perivascular spaces -- a type of brain lesion -- in four brain regions in a dataset of 2202 3D brain MRI scans. In MNIST-based datasets, the proposed method outperforms the other methods. In the brain dataset, several weakly supervised detection methods come close to the human intrarater agreement in each region. The proposed method reaches the lowest number of false positive detections in all brain regions at the operating point, while its average sensitivity is similar to that of the other best methods.

Abstract (translated)

弱监督检测方法可以在训练过程中推断目标在图像中的位置,而不需要位置或外观信息。我们提出了一种弱监督的深度学习方法来检测图像中多个位置出现的物体。该方法使用仅使用全局标签优化的编码器-解码器网络的最后一个特征映射来计算注意力映射:图像中目标对象的出现次数。与以前的方法相比,由于解码器部分的存在,注意力地图是以完全的输入分辨率生成的。该方法在两个任务中与多个最先进的方法进行了比较:基于mnist的数据集中的数字检测,以及在2202个三维脑MRI扫描数据集中的四个脑区中检测扩大的血管周围空间(一种脑损伤)的实际应用。在基于mnist的数据集中,该方法优于其他方法。在大脑数据集中,几个弱监督的检测方法在每个区域接近人类内部评级器协议。该方法在操作点所有脑区的假阳性检出率最低,平均灵敏度与其他最佳方法相似。

URL

https://arxiv.org/abs/1906.01891

PDF

https://arxiv.org/pdf/1906.01891.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot