tRNA functional signatures classify plastids as late-branching cyanobacteria.
Autor: | Lawrence TJ; Biosciences Division, Oak Ridge National Laboratory, P.O. Box 2008, Oak Ridge, TN, 37831, USA. lawrencetj@ornl.gov.; Quantitative and Systems Biology Program, University of California, Merced, 5200 North Lake Rd., Merced, CA, 95343, USA. lawrencetj@ornl.gov., Amrine KC; Quantitative and Systems Biology Program, University of California, Merced, 5200 North Lake Rd., Merced, CA, 95343, USA.; Insight Data Science, 500 3rd St., San Francisco, CA, 94107, USA., Swingley WD; Department of Biological Sciences, Northern Illinois University, 1425 Lincoln Hwy., DeKalb, IL, 60115, USA., Ardell DH; Quantitative and Systems Biology Program, University of California, Merced, 5200 North Lake Rd., Merced, CA, 95343, USA.; Molecular and Cell Biology, School of Natural Sciences, University of California, Merced, 5200 North Lake Rd., Merced, CA, 95343, USA. |
---|---|
Jazyk: | angličtina |
Zdroj: | BMC evolutionary biology [BMC Evol Biol] 2019 Dec 09; Vol. 19 (1), pp. 224. Date of Electronic Publication: 2019 Dec 09. |
DOI: | 10.1186/s12862-019-1552-7 |
Abstrakt: | Background: Eukaryotes acquired the trait of oxygenic photosynthesis through endosymbiosis of the cyanobacterial progenitor of plastid organelles. Despite recent advances in the phylogenomics of Cyanobacteria, the phylogenetic root of plastids remains controversial. Although a single origin of plastids by endosymbiosis is broadly supported, recent phylogenomic studies are contradictory on whether plastids branch early or late within Cyanobacteria. One underlying cause may be poor fit of evolutionary models to complex phylogenomic data. Results: Using Posterior Predictive Analysis, we show that recently applied evolutionary models poorly fit three phylogenomic datasets curated from cyanobacteria and plastid genomes because of heterogeneities in both substitution processes across sites and of compositions across lineages. To circumvent these sources of bias, we developed CYANO-MLP, a machine learning algorithm that consistently and accurately phylogenetically classifies ("phyloclassifies") cyanobacterial genomes to their clade of origin based on bioinformatically predicted function-informative features in tRNA gene complements. Classification of cyanobacterial genomes with CYANO-MLP is accurate and robust to deletion of clades, unbalanced sampling, and compositional heterogeneity in input tRNA data. CYANO-MLP consistently classifies plastid genomes into a late-branching cyanobacterial sub-clade containing single-cell, starch-producing, nitrogen-fixing ecotypes, consistent with metabolic and gene transfer data. Conclusions: Phylogenomic data of cyanobacteria and plastids exhibit both site-process heterogeneities and compositional heterogeneities across lineages. These aspects of the data require careful modeling to avoid bias in phylogenomic estimation. Furthermore, we show that amino acid recoding strategies may be insufficient to mitigate bias from compositional heterogeneities. However, the combination of our novel tRNA-specific strategy with machine learning in CYANO-MLP appears robust to these sources of bias with high accuracy in phyloclassification of cyanobacterial genomes. CYANO-MLP consistently classifies plastids as late-branching Cyanobacteria, consistent with independent evidence from signature-based approaches and some previous phylogenetic studies. |
Databáze: | MEDLINE |
Externí odkaz: | |
Nepřihlášeným uživatelům se plný text nezobrazuje | K zobrazení výsledku je třeba se přihlásit. |