Dynamic clustering of histogram data based on adaptive squared Wasserstein distances

Irpino, Antonio; Verde, Rosanna; De Carvalho, Francisco

doi:http://dx.doi.org/10.1016/j.eswa.2013.12.001

This paper presents a Dynamic Clustering Algorithm for histogram data with an automatic weighting step of the variables by using adaptive distances. The Dynamic Clustering Algorithm is a k-means-like algorithm for clustering a set of objects into a predefined number of classes. Histogram data are realizations of particular set-valued descriptors defined in the context of Symbolic Data Analysis. We propose to use the ℓ2ℓ2 Wasserstein distance for clustering histogram data and two novel adaptive distance based clustering schemes. The ℓ2ℓ2 Wasserstein distance allows to express the variability of a set of histograms in two components: the first related to the variability of their averages and the second to the variability of the histograms related to different size and shape. The weighting step aims to take into account global and local adaptive distances as well as two components of the variability of a set of histograms. To evaluate the clustering results, we extend some classic partition quality indexes when the proposed adaptive distances are used in the clustering criterion function. Examples on synthetic and real-world datasets corroborate the proposed clustering procedure