Radiomics enables the extraction of quantitative features from medical images, supporting data-driven approaches for diagnosis and prognosis. However, the field is often constrained by limited sample size, due to factors such as the high cost of imaging, strict patient eligibility criteria, and challenges in acquiring annotated medical data. This study addressed the limitation by applying two synthetic data augmentation techniques: Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) directly to structured first-order radiomic features derived from Tl-weighted brain magnetic resonance imaging. Working with a small dataset of 55 healthy subjects and 18 first-order radiomic features, the research evaluated which method better preserves the original data distribution improving augmentation quality and if there was agreement between the two different data augmentation techniques in terms of first-order radiomic features. The goal was to address the limitations of small sample sizes in radiomics by determining the more effective technique for generating stable and diverse synthetic data within the radiomic feature space. According to the preliminary results, it was found that there was no significant divergence between real and augmented data generated using the two different approaches. Therefore, both techniques might be used for further analysis in radiomics studies.
SMOTE and ADASYN to Augment Neuroradiomic Data: A Comparative Analysis
Zamir, Bukhtawar;Pirozzi, Maria Agnese;Donisi, Leandro;Esposito, Fabrizio
2025
Abstract
Radiomics enables the extraction of quantitative features from medical images, supporting data-driven approaches for diagnosis and prognosis. However, the field is often constrained by limited sample size, due to factors such as the high cost of imaging, strict patient eligibility criteria, and challenges in acquiring annotated medical data. This study addressed the limitation by applying two synthetic data augmentation techniques: Synthetic Minority Over-sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) directly to structured first-order radiomic features derived from Tl-weighted brain magnetic resonance imaging. Working with a small dataset of 55 healthy subjects and 18 first-order radiomic features, the research evaluated which method better preserves the original data distribution improving augmentation quality and if there was agreement between the two different data augmentation techniques in terms of first-order radiomic features. The goal was to address the limitations of small sample sizes in radiomics by determining the more effective technique for generating stable and diverse synthetic data within the radiomic feature space. According to the preliminary results, it was found that there was no significant divergence between real and augmented data generated using the two different approaches. Therefore, both techniques might be used for further analysis in radiomics studies.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


