Histogram representation of a large set of data is a good way for summarizing and visualize data and is frequently performed in order to optimize query estimation in DBMS. In this paper, we show the performance and the properties of two strategies for an optimal construction of histograms on a single real valued descriptor on the base of a prior choice of the number of buckets. The first one is based on the Fisher algorithm, while the second one is based on a geometrical procedure for the interpolation of the empirical distribution function by a piecewise linear function. The goodness of fit is computed using the Wasserstein metric between distributions. We compare the performances of the proposed methods against some existing ones on artificial and real datasets.

Optimal histogram representation of large data sets: Fisher vs piecewise linear approximations

IRPINO, Antonio;ROMANO, Elvira
2007

Abstract

Histogram representation of a large set of data is a good way for summarizing and visualize data and is frequently performed in order to optimize query estimation in DBMS. In this paper, we show the performance and the properties of two strategies for an optimal construction of histograms on a single real valued descriptor on the base of a prior choice of the number of buckets. The first one is based on the Fisher algorithm, while the second one is based on a geometrical procedure for the interpolation of the empirical distribution function by a piecewise linear function. The goodness of fit is computed using the Wasserstein metric between distributions. We compare the performances of the proposed methods against some existing ones on artificial and real datasets.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/228465
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? ND
social impact