Paper Reading AI Learner

Bayesian Deep Learning for Affordance Segmentation in images

2023-03-02 00:01:13
Lorenzo Mur-Labadia, Ruben Martinez-Cantin, Jose J. Guerrero

Abstract

Affordances are a fundamental concept in robotics since they relate available actions for an agent depending on its sensory-motor capabilities and the environment. We present a novel Bayesian deep network to detect affordances in images, at the same time that we quantify the distribution of the aleatoric and epistemic variance at the spatial level. We adapt the Mask-RCNN architecture to learn a probabilistic representation using Monte Carlo dropout. Our results outperform the state-of-the-art of deterministic networks. We attribute this improvement to a better probabilistic feature space representation on the encoder and the Bayesian variability induced at the mask generation, which adapts better to the object contours. We also introduce the new Probability-based Mask Quality measure that reveals the semantic and spatial differences on a probabilistic instance segmentation model. We modify the existing Probabilistic Detection Quality metric by comparing the binary masks rather than the predicted bounding boxes, achieving a finer-grained evaluation of the probabilistic segmentation. We find aleatoric variance in the contours of the objects due to the camera noise, while epistemic variance appears in visual challenging pixels.

Abstract (translated)

行为是机器人学中一个基本的概念,因为行为取决于机器人感知和运动能力以及环境。我们提出了一种新的贝叶斯深度学习网络,用于在图像中检测行为,同时我们也量化了空间级别的 aleatoric 和 epistemic 差异的分布。我们采用了 Mask-RCNN 架构,使用蒙特卡罗 dropout 来学习一个概率表示。我们的结果显示比确定性网络更好。这得益于编码器中更好的概率特征空间表示,以及在 mask 生成中引入的贝叶斯变化,这些适应对象轮廓。我们还介绍了一种新的概率型 mask 质量度量,用于在概率实例分割模型中揭示语义和空间差异。我们通过比较二进制 mask 而不是预测的边界框,修改了现有的概率检测质量度量,实现了更精细的的概率分割评估。我们发现对象的轮廓中的 aleatoric 差异是由于相机噪声引起的,而Epistemic 差异出现在视觉挑战性的像素中。

URL

https://arxiv.org/abs/2303.00871

PDF

https://arxiv.org/pdf/2303.00871.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot