We present a recurrent model for semantic instance segmentation that sequentially generates binary masks and their associated class probabilities for every object in an image. Our proposed system is trainable end-to-end from an input image to a sequence of labeled masks and, compared to methods relying on object proposals, does not require post-processing steps on its output. We study the suitability of our recurrent model on three different instance segmentation benchmarks, namely Pascal VOC 2012, CVPPP Plant Leaf Segmentation and Cityscapes. Further, we analyze the object sorting patterns generated by our model and observe that it learns to follow a consistent pattern, which correlates with the activations learned in the encoder part of our network. Source code and models are available at https://imatge-upc.github.io/rsis/
我们提出了一种语义实例分割的循环模型,它为图像中的每个对象顺序生成二进制掩码及其相关的类概率。我们提出的系统可以从输入图像到标记掩码序列进行端到端的训练,并且与依赖于对象提议的方法相比,不需要对其输出进行后处理步骤。我们研究了我们的重复模型在三种不同的实例分割基准上的适用性,即Pascal VOC 2012,CVPPP Plant Leaf Segmentation和Cityscapes。此外,我们分析了我们的模型生成的对象排序模式,并观察它学会遵循一致的模式,这与我们网络的编码器部分中学习的激活相关。源代码和模型可在https://imatge-upc.github.io/rsis/获得。
https://arxiv.org/abs/1712.00617
This paper addresses the semantic instance segmentation task in the open-set conditions, where input images can contain known and unknown object classes. The training process of existing semantic instance segmentation methods requires annotation masks for all object instances, which is expensive to acquire or even infeasible in some realistic scenarios, where the number of categories may increase boundlessly. In this paper, we present a novel open-set semantic instance segmentation approach capable of segmenting all known and unknown object classes in images, based on the output of an object detector trained on known object classes. We formulate the problem using a Bayesian framework, where the posterior distribution is approximated with a simulated annealing optimization equipped with an efficient image partition sampler. We show empirically that our method is competitive with state-of-the-art supervised methods on known classes, but also performs well on unknown classes when compared with unsupervised methods.
本文讨论了开放条件下的语义实例分割任务,其中输入图像可以包含已知和未知的对象类。现有语义实例分割方法的训练过程需要所有对象实例的注释掩模,这在某些现实场景中获取或甚至是不可行的,其中类别的数量可能无限增加。在本文中,我们提出了一种新颖的开放式语义实例分割方法,该方法能够基于在已知对象类上训练的对象检测器的输出来分割图像中的所有已知和未知对象类。我们使用贝叶斯框架来制定问题,其中后验分布用配备有效图像分区采样器的模拟退火优化来近似。我们凭经验证明,我们的方法与已知类的最先进的监督方法相比具有竞争力,但与无监督方法相比,它在未知类上也表现良好。
https://arxiv.org/abs/1806.00911
This paper pushes the envelope on salient regions in a video to decompose them into semantically meaningful components, semantic salient instances. To address this video semantic salient instance segmentation, we construct a new dataset, Semantic Salient Instance Video (SESIV) dataset. Our SESIV dataset consists of 84 high-quality video sequences with pixel-wisely per-frame ground-truth labels annotated for different segmentation tasks. We also provide a baseline for this problem, called Fork-Join Strategy (FJS). FJS is a two-stream network leveraging advantages of two different segmentation tasks, i.e., semantic instance segmentation and salient object segmentation. In FJS, we introduce a sequential fusion that combines the outputs of the two streams to have non-overlapping instances one by one. We also introduce a recurrent instance propagation to refine the shapes and semantic meanings of instances, and an identity tracking to maintain both the identity and the semantic meaning of an instance over the entire video. Experimental results demonstrated the effectiveness of our proposed FJS.
本文将信封推到视频中的显着区域,将它们分解为语义上有意义的组件,即语义显着实例。为了解决这个视频语义显着实例分割,我们构建了一个新的数据集,即语义显着实例视频(SESIV)数据集。我们的SESIV数据集由84个高质量的视频序列组成,每个帧的地面实况标签按照不同的分段任务进行注释。我们还为此问题提供了一个基线,称为Fork-Join Strategy(FJS)。 FJS是利用两个不同分段任务的优点的双流网络,即语义实例分割和显着对象分割。在FJS中,我们引入了一种顺序融合,它将两个流的输出结合起来,逐个具有非重叠的实例。我们还引入了一个循环实例传播来优化实例的形状和语义含义,并引入一个身份跟踪来维护整个视频中实例的身份和语义。实验结果证明了我们提出的FJS的有效性。
https://arxiv.org/abs/1807.01452