Distributional data are multi-valued weighted descriptions of a collection of measurements, where each unit is described by a empirical distribution for a particular quantitative attribute. Symbolic Data Analysis (SDA) provides tools for the statistical treatment of multi-valued data. When the number of variables increases, dimension reduction techniques are useful for extracting pattern from data. The most known dimension reduction techniques for quantitative data are the Principal Component Analysis (PCA) and the Multidimensional Scaling (MDS). In the literature of SDA, several PCA techniques for histogram variables have been proposed. The proposed PCAs do not consider directly association measures between histogram variables, but relationships between some particular features of the histograms (the means or only the vector of observed empirical frequencies). Starting from a new association measures for distributional variables based on the squared Wasserstein distance, we propose a new PCA and a new MDS doe distributional data. Thus, we solve the problem of working only on partial information on distributional variables and we furnish the tools for interpreting the results of the dimension reduction techniques.

Dimension reduction techniques for distributional symbolic data

IRPINO, Antonio;VERDE, Rosanna
2013

Abstract

Distributional data are multi-valued weighted descriptions of a collection of measurements, where each unit is described by a empirical distribution for a particular quantitative attribute. Symbolic Data Analysis (SDA) provides tools for the statistical treatment of multi-valued data. When the number of variables increases, dimension reduction techniques are useful for extracting pattern from data. The most known dimension reduction techniques for quantitative data are the Principal Component Analysis (PCA) and the Multidimensional Scaling (MDS). In the literature of SDA, several PCA techniques for histogram variables have been proposed. The proposed PCAs do not consider directly association measures between histogram variables, but relationships between some particular features of the histograms (the means or only the vector of observed empirical frequencies). Starting from a new association measures for distributional variables based on the squared Wasserstein distance, we propose a new PCA and a new MDS doe distributional data. Thus, we solve the problem of working only on partial information on distributional variables and we furnish the tools for interpreting the results of the dimension reduction techniques.
2013
978 88 343 2556 8
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/340431
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact