A distributional variable describes an object by a 1-D probability or frequency density function. While in standard clustering algorithms all the variables contribute to the clusters definition with the same importance, subspace clustering aims at finding a subspace, as a linear combination of the original variables, where clusters are well represented. This is done by weighting variables automatically and accordingly to their capacity of being discriminant for the clusters. Considering a decomposition of the squared L2 Wasserstein distance for distributional data, and using the notion of adaptive distance, we extend a fuzzy subspace clustering for automatically computing relevance weights associated with variables as well as with their components. This is done for the whole dataset or cluster-wisely. An application shows the advantages of using such algorithms
Automatic variable and components weighting system for Fuzzy cmeans of distributional data
A. Irpino;R. Verde
2017
Abstract
A distributional variable describes an object by a 1-D probability or frequency density function. While in standard clustering algorithms all the variables contribute to the clusters definition with the same importance, subspace clustering aims at finding a subspace, as a linear combination of the original variables, where clusters are well represented. This is done by weighting variables automatically and accordingly to their capacity of being discriminant for the clusters. Considering a decomposition of the squared L2 Wasserstein distance for distributional data, and using the notion of adaptive distance, we extend a fuzzy subspace clustering for automatically computing relevance weights associated with variables as well as with their components. This is done for the whole dataset or cluster-wisely. An application shows the advantages of using such algorithmsI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.