Abstract
Open-world entity segmentation, as an emerging computer vision task, aims at segmenting entities in images without being restricted by pre-defined classes, offering impressive generalization capabilities on unseen images and concepts. Despite its promise, existing entity segmentation methods like Segment Anything Model (SAM) rely heavily on costly expert annotators. This work presents Self-supervised Open-world Hierarchical Entity Segmentation (SOHES), a novel approach that eliminates the need for human annotations. SOHES operates in three phases: self-exploration, self-instruction, and self-correction. Given a pre-trained self-supervised representation, we produce abundant high-quality pseudo-labels through visual feature clustering. Then, we train a segmentation model on the pseudo-labels, and rectify the noises in pseudo-labels via a teacher-student mutual-learning procedure. Beyond segmenting entities, SOHES also captures their constituent parts, providing a hierarchical understanding of visual entities. Using raw images as the sole training data, our method achieves unprecedented performance in self-supervised open-world segmentation, marking a significant milestone towards high-quality open-world entity segmentation in the absence of human-annotated masks. Project page: this https URL.
Abstract (translated)
开放世界实体分割(Open-world entity segmentation)是一个新兴的计算机视觉任务,旨在通过不限制预定义类别的图像,实现对未见过的图像和概念的令人印象深刻的泛化能力。尽管其前景广阔,但现有的实体分割方法,如Segment Anything Model(SAM),仍然依赖于昂贵的专家注释。这项工作提出了自监督开放世界层次实体分割(SOHES)这一新方法,消除了人类注释的需要。SOHES分为三个阶段:自探索、自指导和学习。在一个预训练的自监督表示的基础上,我们通过视觉特征聚类生成丰富的高质量伪标签。然后,我们在伪标签上训练一个分割模型,并通过师生互学方法修复伪标签中的噪声。 除了对实体进行分割外,SOHES还捕获了它们的组成部分,提供了一个视觉实体的层次理解。使用原始图像作为唯一训练数据,我们的方法在自监督开放世界分割方面取得了史无前例的性能,这标志着在无需人类注释掩码的情况下实现高质量开放世界实体分割的里程碑。项目页面:此https URL。
URL
https://arxiv.org/abs/2404.12386