Abstract
Accurately clustering high-dimensional measurements is vital for adequately analyzing scientific data. Deep learning machinery has remarkably improved clustering capabilities in recent years due to its ability to extract meaningful representations. In this work, we are given unlabeled samples from multiple source domains, and we aim to learn a shared classifier that assigns the examples to various clusters. Evaluation is done by using the classifier for predicting cluster assignments in a previously unseen domain. This setting generalizes the problem of unsupervised domain generalization to the case in which no supervised learning samples are given (completely unsupervised). Towards this goal, we present an end-to-end model and evaluate its capabilities on several multi-domain image datasets. Specifically, we demonstrate that our model is more accurate than schemes that require fine-tuning using samples from the target domain or some level of supervision.
Abstract (translated)
精确地将高维测量分组对于充分分析科学数据至关重要。深度学习机器在近年来已经显著改进了分组能力,因为其能够提取有意义的表示。在本研究中,我们收到了多个来源领域的未标记样本,并旨在学习一个共享的分类器,将示例分配到各种簇中。评估是通过使用分类器预测从未观察到过的领域的簇分配来进行的。这个设置将未监督领域泛化问题Generalization problem to the case in which no supervised learning samples are given ( completely unsupervised )。为了达成这个目标,我们提出了一个端到端模型,并评估了它在不同领域图像数据集上的能力。具体而言,我们证明了我们的模型比需要微调使用目标领域的样本或一定程度的监督的方案更为准确。
URL
https://arxiv.org/abs/2301.13530