Abstract
Clustering is a fundamental problem, aiming to partition a set of elements, like agents or data points, into clusters such that elements in the same cluster are closer to each other than to those in other clusters. In this paper, we present a new framework for studying online non-centroid clustering with delays, where elements, that arrive one at a time as points in a finite metric space, should be assigned to clusters, but assignments need not be immediate. Specifically, upon arrival, each point's location is revealed, and an online algorithm has to irrevocably assign it to an existing cluster or create a new one containing, at this moment, only this point. However, we allow decisions to be postponed at a delay cost, instead of following the more common assumption of immediate decisions upon arrival. This poses a critical challenge: the goal is to minimize both the total distance costs between points in each cluster and the overall delay costs incurred by postponing assignments. In the classic worst-case arrival model, where points arrive in an arbitrary order, no algorithm has a competitive ratio better than sublogarithmic in the number of points. To overcome this strong impossibility, we focus on a stochastic arrival model, where points' locations are drawn independently across time from an unknown and fixed probability distribution over the finite metric space. We offer hope for beyond worst-case adversaries: we devise an algorithm that is constant competitive in the sense that, as the number of points grows, the ratio between the expected overall costs of the output clustering and an optimal offline clustering is bounded by a constant.
Abstract (translated)
聚类是一种基本问题,目标是将一组元素(如代理或数据点)划分为若干集群,使得同一集群内的元素彼此之间的距离更近,而非同集群的元素之间距离较远。本文介绍了一个新的框架来研究带有延迟的在线非中心聚类,在这种情况下,每个元素以单个时间点的形式在一个有限度量空间中依次到达,并应被分配到一个集群中,但是分配不一定需要即时完成。 具体来说,在每个元素到达时,其位置会被揭示出来,而一个在线算法必须立即决定将该点分配给现有的某个集群或创建一个新的只包含这个点的集群。然而,我们允许决策可以延后进行,并为此付出延迟成本,这不同于通常假设的即时决策模型。这一设定带来了关键挑战:目标是同时最小化每个集群内部元素之间的总距离成本和由于推迟分配所导致的整体延迟成本。 在经典最坏情况到达模式中,即点以任意顺序到达时,没有任何算法的竞争比(competitive ratio)能够优于关于点数量对数的次线性值。为了克服这种强烈的不可能性结果,我们关注于随机到达模型,在此模型下,每个元素的位置独立地从一个未知但固定的概率分布在有限度量空间中抽取而来。针对这一情况,我们提出了一种算法,它在某种意义上是常数竞争性的:随着点的数量增加,输出聚类的期望总成本与最优离线聚类的成本之比被限制在一个常数值内。这种结果为超越最坏情形对手提供了希望。
URL
https://arxiv.org/abs/2601.16091