Scalable Phylogenetic Profiling using MinHash Uncovers Likely Eukaryotic Sexual Reproduction Genes
Autor: | Laurent Kilchoer, David Moi, Pablo S. Aguilar, Christophe Dessimoz |
---|---|
Přispěvatelé: | Ouzounis, Christos A. (ed.) |
Jazyk: | angličtina |
Rok vydání: | 2019 |
Předmět: |
0301 basic medicine
Proteomics Sexual Reproduction Forests Genome Biochemistry 0302 clinical medicine Three-domain system Fungal Evolution Cluster Analysis Biology (General) Kinetochores Phylogeny Data Management Phylogenetic tree Ecology Reproduction Eukaryota Phylogenetic Analysis Genomics Terrestrial Environments Phylogenetics Computational Theory and Mathematics Modeling and Simulation Phylogenetic profiling Protein Interaction Networks Network Analysis Research Article Computer and Information Sciences Protein family QH301-705.5 Modes of Reproduction Tree of life Computational biology MinHash Mycology Biology Ecosystems 03 medical and health sciences Cellular and Molecular Neuroscience Genetics Evolutionary Systematics Molecular Biology Ecology Evolution Behavior and Systematics Computational Biology/methods Eukaryota/classification Eukaryota/genetics Kinetochores/metabolism Models Statistical Reproduction/genetics Taxonomy Evolutionary Biology Ecology and Environmental Sciences Organisms Biology and Life Sciences Computational Biology 030104 developmental biology 030217 neurology & neurosurgery Developmental Biology |
Zdroj: | PLoS Computational Biology PLoS Computational Biology, Vol 16, Iss 7, p e1007553 (2020) PLoS computational biology, vol. 16, no. 7, pp. e1007553 |
DOI: | 10.1101/852491 |
Popis: | Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require at least quadratic time as a function of the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at https://github.com/DessimozLab/HogProf. Author summary Genes that are involved in the same biological process tend to co-evolve. This property is exploited by the technique of phylogenetic profiling, which identifies co-evolving (and therefore likely functionally related) genes through patterns of correlated gene retention and loss in evolution and across species. However, conventional methods to computing and clustering these correlated genes do not scale with increasing numbers of genomes. HogProf is a novel phylogenetic profiling tool built on probabilistic data structures. It allows the user to construct searchable databases containing the evolutionary history of hundreds of thousands of protein families. Such fast detection of coevolution takes advantage of the rapidly increasing amount of genomic data publicly available, and can uncover unknown biological networks and guide in-vivo research and experimentation. We have applied our tool to describe the biological networks underpinning sexual reproduction in eukaryotes. |
Databáze: | OpenAIRE |
Externí odkaz: |