Topiary: Pruning the manual labor from ancestral sequence reconstruction

Autor: Kona N. Orlandi, Sophia R. Phillips, Zachary R. Sailer, Joseph L. Harman, Michael J. Harms
Rok vydání: 2023
Předmět:
Zdroj: Protein Science. 32
ISSN: 1469-896X
0961-8368
DOI: 10.1002/pro.4551
Popis: Ancestral sequence reconstruction (ASR) is a powerful tool to study the evolution of proteins and thus gain deep insight into the relationships between protein sequence, structure, and function. A major barrier to its broad use is the complexity of the task: it requires multiple software packages, complex file manipulations, and expert phylogenetic knowledge. Here we introduce topiary, a software pipeline that aims to overcome this barrier. To use topiary, users prepare a spreadsheet with a handful of sequences. Topiary then: 1) Infers the taxonomic scope for the ASR study and finds relevant sequences by BLAST; 2) Does taxonomically informed sequence quality control and redundancy reduction; 3) Constructs a multiple sequence alignment; 4) Generates a maximum-likelihood gene tree; 5) Reconciles the gene tree to the species tree; 6) Reconstructs ancestral amino acid sequences; and 7) Determines branch supports. The pipeline returns annotated evolutionary trees, spreadsheets with sequences, and graphical summaries of ancestor quality. This is achieved by integrating modern phylogenetics software (Muscle5, RAxML-NG, GeneRax, and PastML) with online databases (NCBI and the Open Tree of Life). In this paper, we introduce non-expert readers to the steps required for ASR, describe the specific design choices made in topiary, provide a detailed protocol for users, and then validate the pipeline using datasets from a broad collection of protein families. Topiary is freely available for download: https://github.com/harmslab/topiary. This article is protected by copyright. All rights reserved.
Databáze: OpenAIRE