Use and Evaluation of GANs for Synthetic Data Generation in Pharmacogenetics.

Autor: Aeschbacher D; Bern University of Applied Sciences, Switzerland., Meisner J; Bern University of Applied Sciences, Switzerland., Miletic M; Bern University of Applied Sciences, Switzerland., Sariyar M; Bern University of Applied Sciences, Switzerland.
Jazyk: angličtina
Zdroj: Studies in health technology and informatics [Stud Health Technol Inform] 2024 Nov 22; Vol. 321, pp. 240-244.
DOI: 10.3233/SHTI241100
Abstrakt: Pharmacogenetics (PGx) explores the influence of genetic variability on drug efficacy and tolerability. Synthetic Data Generation (SDG) has emerged as a promising alternative to the labor-intensive process of collecting real-world PGx data, which is required for high-qualitative prediction models. This study investigates the performance of two Generative Adversarial Network (GAN) models, CTGAN and CTAB-GAN+, in generating synthetic PGx data. The benchmarking is based on utility metrics (Hellinger distance and Random Forest accuracy) and ϵ-identifiability. Results demonstrate that synthetic data generated by CTAB-GAN+ can surpass the original dataset in terms of utility. For instance, CTAB-GAN+ achieves higher Random Forest accuracy compared to the original data, indicating better predictive performance. These improvements suggest that synthetic data not only capture the essential patterns of the original data but also enhance model generalization and prediction capabilities, providing a more robust training ground for machine learning models. Consequently, SDG offers a promising solution to address data scarcity and imbalance in pharmacogenetic research.
Databáze: MEDLINE