Abstract
This work investigates an important phenomenon in centroid-based deep clustering (DC) algorithms: Performance quickly saturates after a period of rapid early gains. Practitioners commonly address early saturation with periodic reclustering, which we demonstrate to be insufficient to address performance plateaus. We call this phenomenon the "reclustering barrier" and empirically show when the reclustering barrier occurs, what its underlying mechanisms are, and how it is possible to Break the Reclustering Barrier with our algorithm BRB. BRB avoids early over-commitment to initial clusterings and enables continuous adaptation to reinitialized clustering targets while remaining conceptually simple. Applying our algorithm to widely-used centroid-based DC algorithms, we show that (1) BRB consistently improves performance across a wide range of clustering benchmarks, (2) BRB enables training from scratch, and (3) BRB performs competitively against state-of-the-art DC algorithms when combined with a contrastive loss. We release our code and pre-trained models at this https URL .
Abstract (translated)
这项工作探讨了基于质心的深度聚类(DC)算法中的一个重要现象:性能在初期快速提升后迅速饱和。实践者通常通过定期重新聚类来应对早期饱和,但我们证明这不足以解决性能平台期的问题。我们称这一现象为“重新聚类障碍”,并从实证上展示了重新聚类障碍何时发生、其背后机制是什么以及如何使用我们的算法BRB打破这种障碍。BRB避免了对初始聚类的过早承诺,并能够在重新初始化聚类目标时持续适应,同时保持概念上的简洁性。将我们的算法应用于广泛使用的基于质心的DC算法中,我们展示了:(1) BRB在广泛的聚类基准测试中一致提升性能;(2) BRB支持从头开始训练;(3) 当与对比损失结合使用时,BRB可以与最先进的DC算法竞争。我们在[此链接](https://www.example.com/)发布了我们的代码和预训练模型。
URL
https://arxiv.org/abs/2411.02275