Scalable Phylogenetic Profiling using MinHash Uncovers Likely Eukaryotic Sexual Reproduction Genes

Autor: Laurent Kilchoer, David Moi, Pablo S. Aguilar, Christophe Dessimoz
Přispěvatelé: Ouzounis, Christos A. (ed.)
Jazyk: angličtina
Rok vydání: 2019
Předmět:
0301 basic medicine
Proteomics
Sexual Reproduction
Forests
Genome
Biochemistry
0302 clinical medicine
Three-domain system
Fungal Evolution
Cluster Analysis
Biology (General)
Kinetochores
Phylogeny
Data Management
Phylogenetic tree
Ecology
Reproduction
Eukaryota
Phylogenetic Analysis
Genomics
Terrestrial Environments
Phylogenetics
Computational Theory and Mathematics
Modeling and Simulation
Phylogenetic profiling
Protein Interaction Networks
Network Analysis
Research Article
Computer and Information Sciences
Protein family
QH301-705.5
Modes of Reproduction
Tree of life
Computational biology
MinHash
Mycology
Biology
Ecosystems
03 medical and health sciences
Cellular and Molecular Neuroscience
Genetics
Evolutionary Systematics
Molecular Biology
Ecology
Evolution
Behavior and Systematics

Computational Biology/methods
Eukaryota/classification
Eukaryota/genetics
Kinetochores/metabolism
Models
Statistical

Reproduction/genetics
Taxonomy
Evolutionary Biology
Ecology and Environmental Sciences
Organisms
Biology and Life Sciences
Computational Biology
030104 developmental biology
030217 neurology & neurosurgery
Developmental Biology
Zdroj: PLoS Computational Biology
PLoS Computational Biology, Vol 16, Iss 7, p e1007553 (2020)
PLoS computational biology, vol. 16, no. 7, pp. e1007553
DOI: 10.1101/852491
Popis: Phylogenetic profiling is a computational method to predict genes involved in the same biological process by identifying protein families which tend to be jointly lost or retained across the tree of life. Phylogenetic profiling has customarily been more widely used with prokaryotes than eukaryotes, because the method is thought to require many diverse genomes. There are now many eukaryotic genomes available, but these are considerably larger, and typical phylogenetic profiling methods require at least quadratic time as a function of the number of genes. We introduce a fast, scalable phylogenetic profiling approach entitled HogProf, which leverages hierarchical orthologous groups for the construction of large profiles and locality-sensitive hashing for efficient retrieval of similar profiles. We show that the approach outperforms Enhanced Phylogenetic Tree, a phylogeny-based method, and use the tool to reconstruct networks and query for interactors of the kinetochore complex as well as conserved proteins involved in sexual reproduction: Hap2, Spo11 and Gex1. HogProf enables large-scale phylogenetic profiling across the three domains of life, and will be useful to predict biological pathways among the hundreds of thousands of eukaryotic species that will become available in the coming few years. HogProf is available at https://github.com/DessimozLab/HogProf.
Author summary Genes that are involved in the same biological process tend to co-evolve. This property is exploited by the technique of phylogenetic profiling, which identifies co-evolving (and therefore likely functionally related) genes through patterns of correlated gene retention and loss in evolution and across species. However, conventional methods to computing and clustering these correlated genes do not scale with increasing numbers of genomes. HogProf is a novel phylogenetic profiling tool built on probabilistic data structures. It allows the user to construct searchable databases containing the evolutionary history of hundreds of thousands of protein families. Such fast detection of coevolution takes advantage of the rapidly increasing amount of genomic data publicly available, and can uncover unknown biological networks and guide in-vivo research and experimentation. We have applied our tool to describe the biological networks underpinning sexual reproduction in eukaryotes.
Databáze: OpenAIRE