亲和力传播聚类算法中最佳聚类数量的确定

何选森; 何帆; 薄喜柱; 肖湘萍

doi:10.12178/1001-0548.2024308

亲和力传播聚类算法中最佳聚类数量的确定

Determination of the optimal number of clusters in affinity propagation clustering algorithm

摘要

摘要: 亲和力传播（AP）聚类能自动搜索聚类数量和聚类中心，但它提供的聚类数量与数据固有的聚类结构相差较大。为此，提出一种确定数据集潜在聚类数量的方法。利用任意两个数据点的欧氏距离平方构成相似性矩阵，以数据样本容量和相似性矩阵中非对角元素的中位数为参数，建立偏好的更新公式以确定聚类数量；将相似性与可用性相加构成亲和矩阵，并将亲和矩阵中取正值的主对角元素作为聚类的质心，以实现聚类数量与质心数量的相互验证。通过对随机数据集以及真实数据集的仿真，对多种性能度量以及算法的运行时间进行评估，结果说明该方法不仅能准确地估计聚类的数量，而且能有效地加快算法的收敛，从而适应于大数据应用的要求。

Abstract: Affinity propagation (AP) clustering can automatically search the number and center of clusters, but the number of clusters provided by AP algorithm is quite different from the inherent clustering structure of dataset. Therefore, a method to determine the number of potential clusters in a dataset is proposed. The Euclidean distance square of any two data points is used to form the similarity matrix, and the sample size and the median of non-diagonal elements of the similarity matrix are used as parameters, a preference update formula is established to determine the number of clusters. Similarity and availability are added to form an affinity matrix, the main diagonal elements with positive values in the affinity matrix are taken as centroids of the clusters to realize the mutual verification of the number of clusters and the number of centroids. Through simulation on both random and real datasets, various performance metrics and algorithm running time are evaluated. The results show that the proposed method not only accurately estimates the number of clusters, but also effectively accelerates the convergence of the algorithm, thus meeting the requirements of big data applications.

HTML全文

参考文献(28)

施引文献

资源附件(0)

亲和力传播聚类算法中最佳聚类数量的确定

Determination of the optimal number of clusters in affinity propagation clustering algorithm

期刊在线

编辑办公

友情链接