Distributional data are multi-valued weighted descriptions of a collection of measurements, where each unit is described by a empirical distribution for a particular quantitative attribute. Symbolic Data Analysis (SDA) provides tools for the statistical treatment of multi-valued data. When the number of variables increases, dimension reduction techniques are useful for extracting pattern from data. The most known dimension reduction techniques for quantitative data are the Principal Component Analysis (PCA) and the Multidimensional Scaling (MDS). In the literature of SDA, several PCA techniques for histogram variables have been proposed. The proposed PCAs do not consider directly association measures between histogram variables, but relationships between some particular features of the histograms (the means or only the vector of observed empirical frequencies). Starting from a new association measures for distributional variables based on the squared Wasserstein distance, we propose a new PCA and a new MDS doe distributional data. Thus, we solve the problem of working only on partial information on distributional variables and we furnish the tools for interpreting the results of the dimension reduction techniques.
Dimension reduction techniques for distributional symbolic data
IRPINO, Antonio;VERDE, Rosanna
2013
Abstract
Distributional data are multi-valued weighted descriptions of a collection of measurements, where each unit is described by a empirical distribution for a particular quantitative attribute. Symbolic Data Analysis (SDA) provides tools for the statistical treatment of multi-valued data. When the number of variables increases, dimension reduction techniques are useful for extracting pattern from data. The most known dimension reduction techniques for quantitative data are the Principal Component Analysis (PCA) and the Multidimensional Scaling (MDS). In the literature of SDA, several PCA techniques for histogram variables have been proposed. The proposed PCAs do not consider directly association measures between histogram variables, but relationships between some particular features of the histograms (the means or only the vector of observed empirical frequencies). Starting from a new association measures for distributional variables based on the squared Wasserstein distance, we propose a new PCA and a new MDS doe distributional data. Thus, we solve the problem of working only on partial information on distributional variables and we furnish the tools for interpreting the results of the dimension reduction techniques.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.