Determination of the optimal number of clusters in affinity propagation clustering algorithm
-
Graphical Abstract
-
Abstract
Affinity propagation (AP) clustering can automatically search the number and center of clusters, but the number of clusters provided by AP algorithm is quite different from the inherent clustering structure of dataset. Therefore, a method to determine the number of potential clusters in a dataset is proposed. The Euclidean distance square of any two data points is used to form the similarity matrix, and the sample size and the median of non-diagonal elements of the similarity matrix are used as parameters, a preference update formula is established to determine the number of clusters. Similarity and availability are added to form an affinity matrix, the main diagonal elements with positive values in the affinity matrix are taken as centroids of the clusters to realize the mutual verification of the number of clusters and the number of centroids. Through simulation on both random and real datasets, various performance metrics and algorithm running time are evaluated. The results show that the proposed method not only accurately estimates the number of clusters, but also effectively accelerates the convergence of the algorithm, thus meeting the requirements of big data applications.
-
-