Paper Reading AI Learner

Block-Diagonal Guided DBSCAN Clustering

2024-03-31 05:04:38
Zheng Xing, Weibing Zhao

Abstract

Cluster analysis plays a crucial role in database mining, and one of the most widely used algorithms in this field is DBSCAN. However, DBSCAN has several limitations, such as difficulty in handling high-dimensional large-scale data, sensitivity to input parameters, and lack of robustness in producing clustering results. This paper introduces an improved version of DBSCAN that leverages the block-diagonal property of the similarity graph to guide the clustering procedure of DBSCAN. The key idea is to construct a graph that measures the similarity between high-dimensional large-scale data points and has the potential to be transformed into a block-diagonal form through an unknown permutation, followed by a cluster-ordering procedure to generate the desired permutation. The clustering structure can be easily determined by identifying the diagonal blocks in the permuted graph. We propose a gradient descent-based method to solve the proposed problem. Additionally, we develop a DBSCAN-based points traversal algorithm that identifies clusters with high densities in the graph and generates an augmented ordering of clusters. The block-diagonal structure of the graph is then achieved through permutation based on the traversal order, providing a flexible foundation for both automatic and interactive cluster analysis. We introduce a split-and-refine algorithm to automatically search for all diagonal blocks in the permuted graph with theoretically optimal guarantees under specific cases. We extensively evaluate our proposed approach on twelve challenging real-world benchmark clustering datasets and demonstrate its superior performance compared to the state-of-the-art clustering method on every dataset.

Abstract (translated)

聚类分析在数据库挖掘中扮演着关键角色,该领域中最常用的算法是 DBSCAN。然而,DBSCAN 具有多个限制,例如处理高维大型数据集的困难性、对输入参数的敏感性以及产生聚类结果缺乏鲁棒性。本文介绍了一种改进的 DBSCAN 版本,它利用相似图的块状性质指导 DBSCAN 的聚类过程。关键思想是构建一个衡量高维大型数据点之间相似性的图形,具有将经过未知排列后可能变为块状的潜在可能性,然后通过聚类排序程序生成所需的排列。通过确定 permuted 图形中的块状行,可以轻松确定聚类结构。我们提出了一种基于梯度下降的方法来解决所提出的问题。此外,我们还开发了一种基于 DBSCAN 的点遍历算法,它可以在图中识别高密度聚类,并生成聚类的增强顺序。通过基于遍历顺序的排列,为自动和交互式聚类分析提供了灵活的基础。我们对特定情况下的理论最优保证下的所有 permuted 图形中的块状行进行自动搜索,并从每个数据集的大量挑战实例中进行全面评估。我们证明了所提出方法在所有数据集上的优越性能,与最先进的聚类方法相比。

URL

https://arxiv.org/abs/2404.01341

PDF

https://arxiv.org/pdf/2404.01341.pdf


Tags
3D Action Action_Localization Action_Recognition Activity Adversarial Agent Attention Autonomous Bert Boundary_Detection Caption Chat Classification CNN Compressive_Sensing Contour Contrastive_Learning Deep_Learning Denoising Detection Dialog Diffusion Drone Dynamic_Memory_Network Edge_Detection Embedding Embodied Emotion Enhancement Face Face_Detection Face_Recognition Facial_Landmark Few-Shot Gait_Recognition GAN Gaze_Estimation Gesture Gradient_Descent Handwriting Human_Parsing Image_Caption Image_Classification Image_Compression Image_Enhancement Image_Generation Image_Matting Image_Retrieval Inference Inpainting Intelligent_Chip Knowledge Knowledge_Graph Language_Model LLM Matching Medical Memory_Networks Multi_Modal Multi_Task NAS NMT Object_Detection Object_Tracking OCR Ontology Optical_Character Optical_Flow Optimization Person_Re-identification Point_Cloud Portrait_Generation Pose Pose_Estimation Prediction QA Quantitative Quantitative_Finance Quantization Re-identification Recognition Recommendation Reconstruction Regularization Reinforcement_Learning Relation Relation_Extraction Represenation Represenation_Learning Restoration Review RNN Robot Salient Scene_Classification Scene_Generation Scene_Parsing Scene_Text Segmentation Self-Supervised Semantic_Instance_Segmentation Semantic_Segmentation Semi_Global Semi_Supervised Sence_graph Sentiment Sentiment_Classification Sketch SLAM Sparse Speech Speech_Recognition Style_Transfer Summarization Super_Resolution Surveillance Survey Text_Classification Text_Generation Tracking Transfer_Learning Transformer Unsupervised Video_Caption Video_Classification Video_Indexing Video_Prediction Video_Retrieval Visual_Relation VQA Weakly_Supervised Zero-Shot