A foundational large language model for edible plant genomes.
Autor: | Mendoza-Revilla J; InstaDeep, London, UK., Trop E; InstaDeep, London, UK., Gonzalez L; InstaDeep, London, UK., Roller M; InstaDeep, London, UK., Dalla-Torre H; InstaDeep, London, UK., de Almeida BP; InstaDeep, London, UK., Richard G; InstaDeep, London, UK., Caton J; Google DeepMind, London, UK., Lopez Carranza N; InstaDeep, London, UK., Skwark M; InstaDeep, London, UK., Laterre A; InstaDeep, London, UK., Beguir K; InstaDeep, London, UK., Pierrot T; InstaDeep, London, UK. t.pierrot@instadeep.com., Lopez M; InstaDeep, London, UK. m.lopez@instadeep.com. |
---|---|
Jazyk: | angličtina |
Zdroj: | Communications biology [Commun Biol] 2024 Jul 09; Vol. 7 (1), pp. 835. Date of Electronic Publication: 2024 Jul 09. |
DOI: | 10.1038/s42003-024-06465-2 |
Abstrakt: | Significant progress has been made in the field of plant genomics, as demonstrated by the increased use of high-throughput methodologies that enable the characterization of multiple genome-wide molecular phenotypes. These findings have provided valuable insights into plant traits and their underlying genetic mechanisms, particularly in model plant species. Nonetheless, effectively leveraging them to make accurate predictions represents a critical step in crop genomic improvement. We present AgroNT, a foundational large language model trained on genomes from 48 plant species with a predominant focus on crop species. We show that AgroNT can obtain state-of-the-art predictions for regulatory annotations, promoter/terminator strength, tissue-specific gene expression, and prioritize functional variants. We conduct a large-scale in silico saturation mutagenesis analysis on cassava to evaluate the regulatory impact of over 10 million mutations and provide their predicted effects as a resource for variant characterization. Finally, we propose the use of the diverse datasets compiled here as the Plants Genomic Benchmark (PGB), providing a comprehensive benchmark for deep learning-based methods in plant genomic research. The pre-trained AgroNT model is publicly available on HuggingFace at https://huggingface.co/InstaDeepAI/agro-nucleotide-transformer-1b for future research purposes. (© 2024. The Author(s).) |
Databáze: | MEDLINE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |