Abstract
Deep clustering as an important branch of unsupervised representation learning focuses on embedding semantically similar samples into the identical feature space. This core demand inspires the exploration of contrastive learning and subspace clustering. However, these solutions always rely on the basic assumption that there are sufficient and category-balanced samples for generating valid high-level representation. This hypothesis actually is too strict to be satisfied for real-world applications. To overcome such a challenge, the natural strategy is utilizing generative models to augment considerable instances. How to use these novel samples to effectively fulfill clustering performance improvement is still difficult and under-explored. In this paper, we propose a novel Generative Calibration Clustering (GCC) method to delicately incorporate feature learning and augmentation into clustering procedure. First, we develop a discriminative feature alignment mechanism to discover intrinsic relationship across real and generated samples. Second, we design a self-supervised metric learning to generate more reliable cluster assignment to boost the conditional diffusion generation. Extensive experimental results on three benchmarks validate the effectiveness and advantage of our proposed method over the state-of-the-art methods.
Abstract (translated)
深度聚类作为无监督表示学习的一个重要分支,专注于将语义相似的样本嵌入到相同的特征空间中。这一核心需求引发了对比学习以及子空间聚类的探索。然而,这些解决方案总是依赖于生成模型生成足够且类别平衡的样本来生成有效的高级表示的基本假设。这个假设实际上过于严格,无法满足现实世界的应用需求。为了克服这一挑战,自然策略是利用生成模型来增加大量的实例。然而,如何有效地利用这些新颖样本进行聚类性能的改进仍然很难,并且没有被充分探索。在本文中,我们提出了一种新颖的生成校准聚类(GCC)方法,将特征学习和增强融入聚类过程。首先,我们开发了一个判别特征对齐机制,以发现真实和生成样本之间的内在关系。其次,我们设计了一个自监督的度量学习,以生成更可靠的聚类分配来提高条件扩散生成。在三个基准测试上进行的大量实验结果证实了与最先进方法相比,我们提出的方法的有效性和优势。
URL
https://arxiv.org/abs/2404.09115