The construction of indicators that synthesize two or more input variables is a crucial topic in the statistical literature and in several data science applications. An appropriate indicator should be, beyond statistically sound, capable of properly describing not only the data variability but also the associated distribution. In addition, aggregated indicators should be interpretable in terms of the role played by each input variable. In this paper, we propose a method for obtaining a compound indicator using optimal transport from the perspective of the Wasserstein distance. Our proposal is particularly relevant when multiple–and potentially non-homogenous–evaluations from different raters are available. Using probabilistic distance metrics, we can produce an aggregated indicator and also provide the associated confidence interval. Such consensus rating allows us to exploit the distributional characteristics of the data and also complies with the explainability principle, making the final user able to consistently compare the single ratings and their contribution.
A new compound indicator based on optimal transport
Balzanella, Antonio;
2025
Abstract
The construction of indicators that synthesize two or more input variables is a crucial topic in the statistical literature and in several data science applications. An appropriate indicator should be, beyond statistically sound, capable of properly describing not only the data variability but also the associated distribution. In addition, aggregated indicators should be interpretable in terms of the role played by each input variable. In this paper, we propose a method for obtaining a compound indicator using optimal transport from the perspective of the Wasserstein distance. Our proposal is particularly relevant when multiple–and potentially non-homogenous–evaluations from different raters are available. Using probabilistic distance metrics, we can produce an aggregated indicator and also provide the associated confidence interval. Such consensus rating allows us to exploit the distributional characteristics of the data and also complies with the explainability principle, making the final user able to consistently compare the single ratings and their contribution.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


