Paper Reading AI Learner

Beyond Known Clusters: Probe New Prototypes for Efficient Generalized Class Discovery

2024-04-13 12:41:40
Ye Wang, Yaxiong Wang, Yujiao Wu, Bingchen Zhao, Xueming Qian

Abstract

Generalized Class Discovery (GCD) aims to dynamically assign labels to unlabelled data partially based on knowledge learned from labelled data, where the unlabelled data may come from known or novel classes. The prevailing approach generally involves clustering across all data and learning conceptions by prototypical contrastive learning. However, existing methods largely hinge on the performance of clustering algorithms and are thus subject to their inherent limitations. Firstly, the estimated cluster number is often smaller than the ground truth, making the existing methods suffer from the lack of prototypes for comprehensive conception learning. To address this issue, we propose an adaptive probing mechanism that introduces learnable potential prototypes to expand cluster prototypes (centers). As there is no ground truth for the potential prototype, we develop a self-supervised prototype learning framework to optimize the potential prototype in an end-to-end fashion. Secondly, clustering is computationally intensive, and the conventional strategy of clustering both labelled and unlabelled instances exacerbates this issue. To counteract this inefficiency, we opt to cluster only the unlabelled instances and subsequently expand the cluster prototypes with our introduced potential prototypes to fast explore novel classes. Despite the simplicity of our proposed method, extensive empirical analysis on a wide range of datasets confirms that our method consistently delivers state-of-the-art results. Specifically, our method surpasses the nearest competitor by a significant margin of \textbf{9.7}$\%$ within the Stanford Cars dataset and \textbf{12$\times$} clustering efficiency within the Herbarium 19 dataset. We will make the code and checkpoints publicly available at \url{this https URL}.

Abstract (translated)

泛化类发现(GCD)旨在基于从已标注数据中学到的知识动态地为未标注数据分配标签,其中未标注数据可能来自已知的或新生的类别。当前的实现方法通常涉及对所有数据进行聚类,并通过原型对比学习来学习概念。然而,现有的方法很大程度上依赖于聚类算法的性能,因此它们受到其固有局限性的限制。首先,估计的聚类数量通常小于真实值,导致现有的方法在全面概念学习缺乏原型方面存在局限性。为了解决这个问题,我们提出了一个自适应探测机制,引入了可学习的潜在原型以扩展聚类原型(中心)。由于潜在原型的地面真没有给出,我们开发了一个自监督原型学习框架,以端到端地优化潜在原型。其次,聚类计算密集型,而传统聚类策略同时对已标注和未标注实例进行聚类,加剧了这一问题。为了应对这种低效性,我们选择仅聚类未标注实例,然后用我们引入的潜在原型扩展聚类原型,以快速探索新的类别。尽管我们提出的方法简单,但广泛的数据实证分析结果证实,我们的方法在各种数据集上 consistently实现了最先进的性能。具体来说,在我们的方法中,斯坦福汽车数据集上,我们的方法比最接近的竞争对手领先了9.7%的显著 margin,而在胡泊19数据集上,我们的方法具有12倍的聚类效率。我们将公开提供代码和检查点,在 this <https://url> 上。

URL

https://arxiv.org/abs/2404.08995

PDF

https://arxiv.org/pdf/2404.08995.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot