Determining the number of clusters, before finding clusters, from the susceptibility of the similarity matrix

Lippiello, E.; Baccari, S.; Bountzis, P.

Clustering represents a fundamental procedure to provide users with meaningful insights from an original data set. The quality of the resulting clusters is largely dependent on the correct estimation of their number, K∗, which must be provided as an input parameter in many clustering algorithms. Only very few techniques provide an automatic detection of K∗ and are usually based on cluster validity indexes which are expensive with regard to computation time. Here, we present a new algorithm which allows one to obtain an accurate estimate of K∗, without partitioning data into the different clusters. This makes the algorithm particularly efficient in handling large-scale data sets from both the perspective of time and space complexity. The algorithm, indeed, highlights the block structure which is implicitly present in the similarity matrix, and associates K∗ to the number of blocks in the matrix. We test the algorithm on synthetic data sets with or without a hierarchical organization of elements. We explore a wide range of K∗ and show the effectiveness of the proposed algorithm to identify K∗, even more accurate than existing methods based on standard internal validity indexes, with a huge advantage in terms of computation time and memory storage. We also discuss the application of the novel algorithm to the de-clustering of instrumental earthquake catalogs, a procedure finalized to identify the level of background seismic activity useful for seismic hazard assessment.

Determining the number of clusters, before finding clusters, from the susceptibility of the similarity matrix

E. Lippiello;S. Baccari;P. Bountzis

2023

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

Determining the number of clusters, before finding clusters, from the susceptibility of the similarity matrix

E. Lippiello;S. Baccari;P. Bountzis

2023

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)