Abstract
Affordances are a fundamental concept in robotics since they relate available actions for an agent depending on its sensory-motor capabilities and the environment. We present a novel Bayesian deep network to detect affordances in images, at the same time that we quantify the distribution of the aleatoric and epistemic variance at the spatial level. We adapt the Mask-RCNN architecture to learn a probabilistic representation using Monte Carlo dropout. Our results outperform the state-of-the-art of deterministic networks. We attribute this improvement to a better probabilistic feature space representation on the encoder and the Bayesian variability induced at the mask generation, which adapts better to the object contours. We also introduce the new Probability-based Mask Quality measure that reveals the semantic and spatial differences on a probabilistic instance segmentation model. We modify the existing Probabilistic Detection Quality metric by comparing the binary masks rather than the predicted bounding boxes, achieving a finer-grained evaluation of the probabilistic segmentation. We find aleatoric variance in the contours of the objects due to the camera noise, while epistemic variance appears in visual challenging pixels.
Abstract (translated)
行为是机器人学中一个基本的概念,因为行为取决于机器人感知和运动能力以及环境。我们提出了一种新的贝叶斯深度学习网络,用于在图像中检测行为,同时我们也量化了空间级别的 aleatoric 和 epistemic 差异的分布。我们采用了 Mask-RCNN 架构,使用蒙特卡罗 dropout 来学习一个概率表示。我们的结果显示比确定性网络更好。这得益于编码器中更好的概率特征空间表示,以及在 mask 生成中引入的贝叶斯变化,这些适应对象轮廓。我们还介绍了一种新的概率型 mask 质量度量,用于在概率实例分割模型中揭示语义和空间差异。我们通过比较二进制 mask 而不是预测的边界框,修改了现有的概率检测质量度量,实现了更精细的的概率分割评估。我们发现对象的轮廓中的 aleatoric 差异是由于相机噪声引起的,而Epistemic 差异出现在视觉挑战性的像素中。
URL
https://arxiv.org/abs/2303.00871