NGS-Indel Coder: A pipeline to code indel characters in phylogenomic data with an example of its application in milkweeds (Asclepias).

Autor: Boutte J; Department of Biology, Hobart and William Smith Colleges, Geneva, NY, USA., Fishbein M; Department of Plant Biology, Ecology and Evolution, Oklahoma State University, Stillwater, OK, USA., Liston A; Department of Botany and Plant Pathology, Oregon State University, Corvallis, OR, USA., Straub SCK; Department of Biology, Hobart and William Smith Colleges, Geneva, NY, USA. Electronic address: STRAUB@hws.edu.
Jazyk: angličtina
Zdroj: Molecular phylogenetics and evolution [Mol Phylogenet Evol] 2019 Oct; Vol. 139, pp. 106534. Date of Electronic Publication: 2019 Jun 15.
DOI: 10.1016/j.ympev.2019.106534
Abstrakt: Targeted genome sequencing approaches allow characterization of evolutionary relationships using a considerable number of nuclear genes and informative characters. However, most phylogenomic analyses only utilize single nucleotide polymorphisms (SNPs). Studies at the species level, especially in groups that have recently radiated, often recover low amounts of phylogenetically informative variation in coding regions, and require non-coding sequences, which are richer in indels, to resolve gene trees. Here, NGS-Indel Coder, a pipeline to detect and omit false positive indels inferred from assemblies of short read sequence data, was developed to resolve the relationships among and within major clades of the American milkweeds (Asclepias), which are the result of a rapid and recent evolutionary radiation, and whose phylogeny has been difficult to resolve. This pipeline was applied to a Hyb-Seq data set of 768 loci including targeted exons and flanking intron regions from 33 milkweed species. Robust species tree inference was improved by excluding small alignment partitions (<100 bp) that increased gene tree ambiguity and incongruence. To further investigate the robustness of indel coding, data sets that included small and large indels were explored, and species trees derived from concatenated loci versus coalescent methods based on gene trees were compared. The phylogeny of Asclepias obtained using nuclear data was well resolved, and phylogenetic information from indels improved resolution of specific nodes. The Temperate North American, Mexican Highland, and Incarnatae clades were well supported as monophyletic. Asclepias coulteri, which has been considered part of the Sonoran Desert clade based on plastome analyses, was placed as sister to all the other milkweed species studied here, rather than as a member of that clade. Two groups within the Temperate North American and Mexican clades were not resolved, and the inferred relationships strongly conflicted when comparing results based on data sets that did or did not include indel characters. This new pipeline represents a step forward in making maximal use of the information content in phylogenomic data sets.
(Copyright © 2019 Elsevier Inc. All rights reserved.)
Databáze: MEDLINE