Estimating the number of clusters in a dataset via consensus clustering
Access
info:eu-repo/semantics/closedAccessDate
2019Access
info:eu-repo/semantics/closedAccessMetadata
Show full item recordAbstract
In unsupervised learning, the problem of finding the appropriate number of clusters-usually notated as k- is very challenging. Its importance lies in the fact that k is a vital hyperparameter for the most clustering algorithms. One algorithmic approach for tacking this problem is to apply a certain clustering algorithm with various cluster configurations and decide to use the one that maximizes a certain internal validity measure. This is a promising and computationally efficient approach since the independent runs are parallelizable. In this paper, we attempt to improve over this estimation approach by incorporating a consensus clustering approach into k estimating scheme. The weighted consensus clustering scheme employs four different indices namely Silhouette (SH), Calinski-Harabasz (CH), Davies-Bouldin (DB), and Consensus (CI) indices to estimate the correct number of cluster. Computational experiments in a dataset with clusters ranging from 2 to 7 show the profound advantages of weighted consensus clustering for correctly finding k in comparison to individual clustering method (e.g, k-means) and simple consensus clustering. (C) 2019 Elsevier Ltd. All rights reserved.