Evaluating the quality of synthetically generated data is an open problem, still not adequately debated when compared to the interest in how it could be generated, and still far from an effective standardization stage. The evaluation of such artificial data produced, for example, by GANs or other generative AI models, should be approached having in mind several relevant questions concerning, among the others, the following ones: i) the meaning of reliability, faithfulness, and overall quality of artificial data as respect to genuine data; ii) the appropriate set of metrics and tools to estimate the quality of synthetic data; iii) the identification of dependencies occurring among the quality of the artificial data, the generative models to produce them and the kind of native data used to train the generative processes. This work aims to contribute to this challenge by providing a reconnaissance study supporting the groundwork for a more structured analysis of the problem. More exactly, a particular kind of Generative models, such as the Tabular GANs, is discussed in details to focus the attention on effective methods for analyzing and of evaluating the quality of synthetically generated data.

Exploring the Faithfulness of Synthetic Data by Generative Models

Fiammetta Marulli
Methodology
;
2023

Abstract

Evaluating the quality of synthetically generated data is an open problem, still not adequately debated when compared to the interest in how it could be generated, and still far from an effective standardization stage. The evaluation of such artificial data produced, for example, by GANs or other generative AI models, should be approached having in mind several relevant questions concerning, among the others, the following ones: i) the meaning of reliability, faithfulness, and overall quality of artificial data as respect to genuine data; ii) the appropriate set of metrics and tools to estimate the quality of synthetic data; iii) the identification of dependencies occurring among the quality of the artificial data, the generative models to produce them and the kind of native data used to train the generative processes. This work aims to contribute to this challenge by providing a reconnaissance study supporting the groundwork for a more structured analysis of the problem. More exactly, a particular kind of Generative models, such as the Tabular GANs, is discussed in details to focus the attention on effective methods for analyzing and of evaluating the quality of synthetically generated data.
2023
979-8-3503-4534-6
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11591/519972
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact