Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development

Autor: Alicja Tadych, Raquel Marco-Ferreres, Ignacio E. Schor, Jian Zhou, Victoria Yao, Eileen E. M. Furlong, Chandra L. Theesfeld, Olga G. Troyanskaya
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Cancer Research
Embryology
ved/biology.organism_classification_rank.species
Gene Expression
Gene prediction
QH426-470
Muscle tissue
Genome
Transcriptome
Machine Learning
purl.org/becyt/ford/1 [https]
0302 clinical medicine
Gene expression
Medicine and Health Sciences
Musculoskeletal System
Genetics (clinical)
0303 health sciences
Drosophila Melanogaster
Applied Mathematics
Simulation and Modeling
Muscles
Gene Expression Regulation
Developmental

Eukaryota
Genomics
Animal Models
Bioquímica y Biología Molecular
Chromatin
Insects
Drosophila melanogaster
Experimental Organism Systems
Embryo
Physical Sciences
Pharyngeal Muscles
Drosophila
Anatomy
Transcriptome analysis
Transcriptome Analysis
CIENCIAS NATURALES Y EXACTAS
Algorithms
Research Article
Computer and Information Sciences
Arthropoda
In silico
Muscle Tissue
Embryonic Development
Computational biology
Biology
Machine learning algorithms
Research and Analysis Methods
Ciencias Biológicas
03 medical and health sciences
Machine Learning Algorithms
Spatio-Temporal Analysis
Model Organisms
Artificial Intelligence
Genetics
Animals
Computer Simulation
Genes
Developmental

Model organism
Gene Prediction
purl.org/becyt/ford/1.6 [https]
Molecular Biology
Gene
Ecology
Evolution
Behavior and Systematics

030304 developmental biology
ved/biology
Gene Expression Profiling
Embryos
Organisms
Computational Biology
Biology and Life Sciences
Genome Analysis
Invertebrates
Biological Tissue
Animal Studies
030217 neurology & neurosurgery
Mathematics
Forecasting
Genome-Wide Association Study
Developmental Biology
Zdroj: CONICET Digital (CONICET)
Consejo Nacional de Investigaciones Científicas y Técnicas
instacron:CONICET
PLoS Genetics
PLoS Genetics, Vol 15, Iss 9, p e1008382 (2019)
DOI: 10.1371/journal.pgen.1008382
Popis: Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.
Author summary When and where a gene is expressed is fundamental information for understanding embryonic development. Current knowledge for such expression patterns is typically far from complete. Even for the long-standing model organism, Drosophila melanogaster, with large-scale in situ projects that have provided invaluable expression information for many genes, 40% of the genes still lack spatio-temporally resolved expression information. Such data is complemented by transcriptome datasets such as microarray and RNA-seq, which have whole-genome coverage and measure expression levels with greater dynamic range, but they typically lack precise spatio-temporal resolution. To bridge this gap, we developed a machine learning approach that combines the spatio-temporal resolution of in situ data with the accurate quantification and whole-genome coverage of genomic experiments, integrating information from 6,378 expression and chromatin profiling data sets. With this new approach, we present a genome-wide resource of spatio-temporal gene expression predictions for over 200 tissue-developmental stages during Drosophila embryogenesis. This resource is experimentally validated to have high-quality predictions, can guide the discovery of new tissue-specific genes, and provides a new tool to perform genome-wide analyses of spatio-temporal specificity.
Databáze: OpenAIRE