Precious2GPT: the combination of multiomics pretrained transformer and conditional diffusion for artificial multi-omics multi-species multi-tissue sample generation.

Autor: Sidorenko D; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Pushkov S; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Sakip A; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Leung GHD; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Lok SWY; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Urban A; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Zagirova D; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Veviorskiy A; Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE., Tihonova N; Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE., Kalashnikov A; Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE., Kozlova E; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Naumov V; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Pun FW; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China., Aliper A; Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE., Ren F; Insilico Medicine Shanghai Ltd., Suite 902, Tower C, Changtai Plaza, 2889 Jinke Road, Pudong, Shanghai, 201203, China., Zhavoronkov A; Insilico Medicine Hong Kong Ltd., Unit 310, 3/F, Building 8W Hong Kong Science and Technology Park, Hong Kong SAR, China. alex@insilico.com.; Insilico Medicine AI Limited, Level 6, Unit 08, Block A, IRENA HQ Building, Masdar City, Abu Dhabi, UAE. alex@insilico.com.; Buck Institute for Research on Aging, Novato, CA, 94945, USA. alex@insilico.com.
Jazyk: angličtina
Zdroj: Npj aging [NPJ Aging] 2024 Aug 08; Vol. 10 (1), pp. 37. Date of Electronic Publication: 2024 Aug 08.
DOI: 10.1038/s41514-024-00163-3
Abstrakt: Synthetic data generation in omics mimics real-world biological data, providing alternatives for training and evaluation of genomic analysis tools, controlling differential expression, and exploring data architecture. We previously developed Precious1GPT, a multimodal transformer trained on transcriptomic and methylation data, along with metadata, for predicting biological age and identifying dual-purpose therapeutic targets potentially implicated in aging and age-associated diseases. In this study, we introduce Precious2GPT, a multimodal architecture that integrates Conditional Diffusion (CDiffusion) and decoder-only Multi-omics Pretrained Transformer (MoPT) models trained on gene expression and DNA methylation data. Precious2GPT excels in synthetic data generation, outperforming Conditional Generative Adversarial Networks (CGANs), CDiffusion, and MoPT. We demonstrate that Precious2GPT is capable of generating representative synthetic data that captures tissue- and age-specific information from real transcriptomics and methylomics data. Notably, Precious2GPT surpasses other models in age prediction accuracy using the generated data, and it can generate data beyond 120 years of age. Furthermore, we showcase the potential of using this model in identifying gene signatures and potential therapeutic targets in a colorectal cancer case study.
(© 2024. The Author(s).)
Databáze: MEDLINE