An improved catalogue of putative synaptic genes defined exclusively by temporal transcription profiles through an ensemble machine learning approach

Autor: Flavio Pazos Obregón, Gustavo Guerberoff, Rafael Cantera, Martin Palazzo, Pablo Soto, Patricio Yankilevich
Rok vydání: 2019
Předmět:
Zdroj: BMC Genomics, Vol 20, Iss 1, Pp 1-8 (2019)
BMC Genomics
CONICET Digital (CONICET)
Consejo Nacional de Investigaciones Científicas y Técnicas
instacron:CONICET
Popis: Background: Assembly and function of neuronal synapses require the coordinated expression of a yet undetermined set of genes. Previously, we had trained an ensemble machine learning model to assign a probability of having synaptic function to every protein-coding gene in Drosophila melanogaster. This approach resulted in the publication of a catalogue of 893 genes which we postulated to be very enriched in genes with a still undocumented synaptic function. Since then, the scientific community has experimentally identified 79 new synaptic genes. Here we use these new empirical data to evaluate our original prediction. We also implement a series of changes to the training scheme of our model and using the new data we demonstrate that this improves its predictive power. Finally, we added the new synaptic genes to the training set and trained a new model, obtaining a new, enhanced catalogue of putative synaptic genes. Results: The retrospective analysis demonstrate that our original catalogue was significantly enriched in new synaptic genes. When the changes to the training scheme were implemented using the original training set we obtained even higher enrichment. Finally, applying the new training scheme with a training set including the 79 new synaptic genes, resulted in an enhanced catalogue of putative synaptic genes. Here we present this new catalogue and announce that a regularly updated version will be available online at: Http://synapticgenes.bnd.edu.uy Conclusions: We show that training an ensemble of machine learning classifiers solely with the whole-body temporal transcription profiles of known synaptic genes resulted in a catalogue with a significant enrichment in undiscovered synaptic genes. Using new empirical data provided by the scientific community, we validated our original approach, improved our model an obtained an arguably more precise prediction. This approach reduces the number of genes to be tested through hypothesis-driven experimentation and will facilitate our understanding of neuronal function. Availability: Http://synapticgenes.bnd.edu.uy Fil: Pazos Obregón, Flavio. Instituto de Investigaciones Biológicas "Clemente Estable"; Uruguay Fil: Palazzo, Martin. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina Fil: Soto, Pablo. Instituto de Investigaciones Biológicas "Clemente Estable"; Uruguay Fil: Guerberoff, Gustavo. Universidad de la República; Uruguay Fil: Yankilevich, Patricio. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Parque Centenario. Instituto de Investigación en Biomedicina de Buenos Aires - Instituto Partner de la Sociedad Max Planck; Argentina Fil: Cantera, Rafael. Instituto de Investigaciones Biológicas "Clemente Estable"; Uruguay
Databáze: OpenAIRE