In the framework of Symbolic Data Analysis (SDA), distributional variables are a particular case of multi-valued variables: each unit is represented by a set of distributions (e.g., histograms, density functions, or quantile functions), one for each variable. Factor analysis (FA) methods are primary exploratory tools for dimension reduction and visualization. In the present work, we use a Multiple Factor Analysis (MFA) approach for the analysis of data described by distributional variables. Each distributional variable induces a set of new numeric variables related to the quantiles of each distribution. We call these new variables quantile variables, and the set of quantile variables related to a distributional one is treated as a block in the MFA approach. Thus, a MFA is performed on juxtaposed tables of quantile variables. We show that the criterion decomposed in the analysis is an approximation of the variability based on a suitable metric between distributions: the squared L2 Wasserstein distance. Applications on simulated and real distributional data corroborate the method. The interpretation of the results on the factorial planes is performed by new interpretative tools that are related to several characteristics of the distributions (location, scale, and shape).

Multiple factor analysis of distributional data

Rosanna Verde
;
Antonio Irpino
2017

Abstract

In the framework of Symbolic Data Analysis (SDA), distributional variables are a particular case of multi-valued variables: each unit is represented by a set of distributions (e.g., histograms, density functions, or quantile functions), one for each variable. Factor analysis (FA) methods are primary exploratory tools for dimension reduction and visualization. In the present work, we use a Multiple Factor Analysis (MFA) approach for the analysis of data described by distributional variables. Each distributional variable induces a set of new numeric variables related to the quantiles of each distribution. We call these new variables quantile variables, and the set of quantile variables related to a distributional one is treated as a block in the MFA approach. Thus, a MFA is performed on juxtaposed tables of quantile variables. We show that the criterion decomposed in the analysis is an approximation of the variability based on a suitable metric between distributions: the squared L2 Wasserstein distance. Applications on simulated and real distributional data corroborate the method. The interpretation of the results on the factorial planes is performed by new interpretative tools that are related to several characteristics of the distributions (location, scale, and shape).
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/389434
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact