Estimating the number of clusters in a dataset via consensus clustering

Unlu, Ramazan; Xanthopoulos, Petros

Gelişmiş Arama

Erişim

info:eu-repo/semantics/closedAccess

Tarih

2019

Yazar

Unlu, Ramazan
Xanthopoulos, Petros

Erişim

info:eu-repo/semantics/closedAccess

Üst veri

Tüm öğe kaydını göster

Özet

In unsupervised learning, the problem of finding the appropriate number of clusters-usually notated as k- is very challenging. Its importance lies in the fact that k is a vital hyperparameter for the most clustering algorithms. One algorithmic approach for tacking this problem is to apply a certain clustering algorithm with various cluster configurations and decide to use the one that maximizes a certain internal validity measure. This is a promising and computationally efficient approach since the independent runs are parallelizable. In this paper, we attempt to improve over this estimation approach by incorporating a consensus clustering approach into k estimating scheme. The weighted consensus clustering scheme employs four different indices namely Silhouette (SH), Calinski-Harabasz (CH), Davies-Bouldin (DB), and Consensus (CI) indices to estimate the correct number of cluster. Computational experiments in a dataset with clusters ranging from 2 to 7 show the profound advantages of weighted consensus clustering for correctly finding k in comparison to individual clustering method (e.g, k-means) and simple consensus clustering. (C) 2019 Elsevier Ltd. All rights reserved.

Cilt

125

Bağlantı

https://doi.org/10.1016/j.eswa.2019.01.074
https://hdl.handle.net/20.500.12440/3373

Koleksiyonlar

Scopus İndeksli Yayınlar Koleksiyonu [2037]
WoS İndeksli Yayınlar Koleksiyonu [1814]