Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
Autor: | Alicja Tadych, Raquel Marco-Ferreres, Ignacio E. Schor, Jian Zhou, Victoria Yao, Eileen E. M. Furlong, Chandra L. Theesfeld, Olga G. Troyanskaya |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
Cancer Research
Embryology ved/biology.organism_classification_rank.species Gene Expression Gene prediction QH426-470 Muscle tissue Genome Transcriptome Machine Learning purl.org/becyt/ford/1 [https] 0302 clinical medicine Gene expression Medicine and Health Sciences Musculoskeletal System Genetics (clinical) 0303 health sciences Drosophila Melanogaster Applied Mathematics Simulation and Modeling Muscles Gene Expression Regulation Developmental Eukaryota Genomics Animal Models Bioquímica y Biología Molecular Chromatin Insects Drosophila melanogaster Experimental Organism Systems Embryo Physical Sciences Pharyngeal Muscles Drosophila Anatomy Transcriptome analysis Transcriptome Analysis CIENCIAS NATURALES Y EXACTAS Algorithms Research Article Computer and Information Sciences Arthropoda In silico Muscle Tissue Embryonic Development Computational biology Biology Machine learning algorithms Research and Analysis Methods Ciencias Biológicas 03 medical and health sciences Machine Learning Algorithms Spatio-Temporal Analysis Model Organisms Artificial Intelligence Genetics Animals Computer Simulation Genes Developmental Model organism Gene Prediction purl.org/becyt/ford/1.6 [https] Molecular Biology Gene Ecology Evolution Behavior and Systematics 030304 developmental biology ved/biology Gene Expression Profiling Embryos Organisms Computational Biology Biology and Life Sciences Genome Analysis Invertebrates Biological Tissue Animal Studies 030217 neurology & neurosurgery Mathematics Forecasting Genome-Wide Association Study Developmental Biology |
Zdroj: | CONICET Digital (CONICET) Consejo Nacional de Investigaciones Científicas y Técnicas instacron:CONICET PLoS Genetics PLoS Genetics, Vol 15, Iss 9, p e1008382 (2019) |
DOI: | 10.1371/journal.pgen.1008382 |
Popis: | Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles. Author summary When and where a gene is expressed is fundamental information for understanding embryonic development. Current knowledge for such expression patterns is typically far from complete. Even for the long-standing model organism, Drosophila melanogaster, with large-scale in situ projects that have provided invaluable expression information for many genes, 40% of the genes still lack spatio-temporally resolved expression information. Such data is complemented by transcriptome datasets such as microarray and RNA-seq, which have whole-genome coverage and measure expression levels with greater dynamic range, but they typically lack precise spatio-temporal resolution. To bridge this gap, we developed a machine learning approach that combines the spatio-temporal resolution of in situ data with the accurate quantification and whole-genome coverage of genomic experiments, integrating information from 6,378 expression and chromatin profiling data sets. With this new approach, we present a genome-wide resource of spatio-temporal gene expression predictions for over 200 tissue-developmental stages during Drosophila embryogenesis. This resource is experimentally validated to have high-quality predictions, can guide the discovery of new tissue-specific genes, and provides a new tool to perform genome-wide analyses of spatio-temporal specificity. |
Databáze: | OpenAIRE |
Externí odkaz: |