This paper presents a novel approach for clustering probabil- ity distributions using the regularized tangent space distance in Wasser- stein tangent spaces. We leverage optimal transport theory to map dis- tributions to a linearized tangent space via the logarithmic map at the Wasserstein barycenter, enabling the application of standard statistical tools while preserving crucial geometric properties. Our method extends the classical k-means algorithm by incorporating a regularized tangent space distance that accounts for the covariance structure of distributional data, effectively weighting features based on their discriminative impor- tance. Through simulation studies with anisotropic covariance structures, we demonstrate that our Wasserstein Tangent K-Means approach sig- nificantly outperforms existing methods, particularly when distributions differ in shape, scale, or orientation rather than just location. Our frame- work provides a mathematically rigorous yet computationally tractable solution to the distribution clustering problem with applications across numerous domains where analyzing full distributional patterns rather than summary statistics is essential. We are applying this method to ana- lyze renewable and non-renewable electricity production patterns across Italian regions, accounting for the covariance structure among different energy sources.
Statistics for Innovation I
Mohammed Sabri
;Rosanna Verde;Antonio Balzanella;
2025
Abstract
This paper presents a novel approach for clustering probabil- ity distributions using the regularized tangent space distance in Wasser- stein tangent spaces. We leverage optimal transport theory to map dis- tributions to a linearized tangent space via the logarithmic map at the Wasserstein barycenter, enabling the application of standard statistical tools while preserving crucial geometric properties. Our method extends the classical k-means algorithm by incorporating a regularized tangent space distance that accounts for the covariance structure of distributional data, effectively weighting features based on their discriminative impor- tance. Through simulation studies with anisotropic covariance structures, we demonstrate that our Wasserstein Tangent K-Means approach sig- nificantly outperforms existing methods, particularly when distributions differ in shape, scale, or orientation rather than just location. Our frame- work provides a mathematically rigorous yet computationally tractable solution to the distribution clustering problem with applications across numerous domains where analyzing full distributional patterns rather than summary statistics is essential. We are applying this method to ana- lyze renewable and non-renewable electricity production patterns across Italian regions, accounting for the covariance structure among different energy sources.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


