Topiary: Pruning the manual labor from ancestral sequence reconstruction
Autor: | Kona N. Orlandi, Sophia R. Phillips, Zachary R. Sailer, Joseph L. Harman, Michael J. Harms |
---|---|
Rok vydání: | 2023 |
Předmět: | |
Zdroj: | Protein Science. 32 |
ISSN: | 1469-896X 0961-8368 |
DOI: | 10.1002/pro.4551 |
Popis: | Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships between protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: 1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; 2) Does taxonomically informed sequence quality control and redundancy reduction; 3) Constructs a multiple sequence alignment; 4) Generates a maximum-likelihood gene tree; 5) Reconciles the gene tree to the species tree; 6) Reconstructs ancestral amino acid sequences; and 7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, describe the specific design choices made in topiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download: https://github.com/harmslab/topiary. This article is protected by copyright. All rights reserved. |
Databáze: | OpenAIRE |
Externí odkaz: |