Topiary: Pruning the manual labor from ancestral sequence reconstruction.

Autor: Orlandi KN; Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA.; Department of Biology, University of Oregon, Eugene, Oregon, USA., Phillips SR; Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA.; Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon, USA., Sailer ZR; Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA.; Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon, USA., Harman JL; Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA.; Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon, USA., Harms MJ; Institute of Molecular Biology, University of Oregon, Eugene, Oregon, USA.; Department of Chemistry and Biochemistry, University of Oregon, Eugene, Oregon, USA.
Jazyk: angličtina
Zdroj: Protein science : a publication of the Protein Society [Protein Sci] 2023 Feb; Vol. 32 (2), pp. e4551.
DOI: 10.1002/pro.4551
Abstrakt: Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships among protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: (1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; (2) Does taxonomically informed sequence quality control and redundancy reduction; (3) Constructs a multiple sequence alignment; (4) Generates a maximum-likelihood gene tree; (5) Reconciles the gene tree to the species tree; (6) Reconstructs ancestral amino acid sequences; and (7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, describe the specific design choices made in topiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download: https://github.com/harmslab/topiary.
(© 2022 The Protein Society.)
Databáze: MEDLINE