This paper deals with a batch self organizing map algorithm for data described by distributional-valued variables (DBSOM). Such variables are characterized to take as values probability or frequency distributions on numeric support. According to the nature of the data, the loss function is based on the L2 Wasserstein distance, that is one of the most used metrics to compare distributions in the context of distributional data analysis. Besides, to consider the different contributions of the variables, four adaptive versions of the DBSOM algorithm are proposed. Relevance weights are automatically learned, one for each distributional-valued variable, in an additional step of the algorithm. Since the L2 Wasserstein metric allows a decomposition of the distance into two components, one related to the means and one related to the size and shape of the distributions, relevance weights are automatically learned for each of the two components to emphasize the importance of the different characteristics, related to the moments of the distributions, on the distance value. The proposed algorithms are corroborated by applications on real distributional-valued data sets.
Batch Self-Organizing Maps for Distributional Data with an Automatic Weighting of Variables and Components
Irpino, Antonio;Verde, Rosanna;Balzanella, Antonio
2022
Abstract
This paper deals with a batch self organizing map algorithm for data described by distributional-valued variables (DBSOM). Such variables are characterized to take as values probability or frequency distributions on numeric support. According to the nature of the data, the loss function is based on the L2 Wasserstein distance, that is one of the most used metrics to compare distributions in the context of distributional data analysis. Besides, to consider the different contributions of the variables, four adaptive versions of the DBSOM algorithm are proposed. Relevance weights are automatically learned, one for each distributional-valued variable, in an additional step of the algorithm. Since the L2 Wasserstein metric allows a decomposition of the distance into two components, one related to the means and one related to the size and shape of the distributions, relevance weights are automatically learned for each of the two components to emphasize the importance of the different characteristics, related to the moments of the distributions, on the distance value. The proposed algorithms are corroborated by applications on real distributional-valued data sets.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.