Abstract
We propose a graph clustering formulation based on multicut (a.k.a. weighted correlation clustering) on the complete graph. Our formulation does not need specification of the graph topology as in the original sparse formulation of multicut, making our approach simpler and potentially better performing. In contrast to unweighted correlation clustering we allow for a more expressive weighted cost structure. In dense multicut, the clustering objective is given in a factorized form as inner products of node feature vectors. This allows for an efficient formulation and inference in contrast to multicut/weighted correlation clustering, which has at least quadratic representation and computation complexity when working on the complete graph. We show how to rewrite classical greedy algorithms for multicut in our dense setting and how to modify them for greater efficiency and solution quality. In particular, our algorithms scale to graphs with tens of thousands of nodes. Empirical evidence on instance segmentation on Cityscapes and clustering of ImageNet datasets shows the merits of our approach.
Abstract (translated)
我们提出了基于多切(也称为加权相关聚类)的整图(即多切的原始稀疏 formulation)的Graph聚类 formulation。我们的 formulation不需要指定图拓扑,这与多切的原始稀疏 formulation相比,我们的算法更加简单,并且可能性能更好。与无加权相关聚类相比,我们允许更加表达能力强的加权成本结构。在密集多切中,聚类目标是以节点特征向量的内积作为 factorized 的形式给出的。这相较于多切/加权相关聚类,可以在处理完整图时实现高效的 formulation 和推断。我们展示了如何将经典的贪心算法改写为多切的密集 setting 中的算法,并如何对其进行优化,以提高效率和质量。特别是,我们的算法可以处理数千节点的 graphs。城市风景实例分割和ImageNet数据集的聚类实证证据展示了我们的方法的优点。
URL
https://arxiv.org/abs/2301.12159