Link Your Sites (LYS) Scripts: Automated Search of Protein Structures and Mapping of Sites Under Positive Selection Detected by PAML

Autor: Rute R. da Fonseca, Lys Sanz Moreta
Rok vydání: 2020
Předmět:
Zdroj: Evolutionary Biology. 47:240-245
ISSN: 1934-2845
0071-3260
DOI: 10.1007/s11692-020-09507-9
Popis: The visualization of the molecular context of an amino acid mutation in a protein structure is crucial for the assessment of its functional impact and to understand its evolutionary implications. Currently, searches for fast evolving amino acid positions using codon substitution models like those implemented in PAML [1] are done in almost complete proteomes, generating large numbers of candidate proteins that require individual structural analyses. Here we present two python wrapper scripts as the package Link Your Sites (LYS). The first one i) mines the RCSB database [10] using the blast alignment tool to find the best matching homologous sequences, ii) fetches their domain positions by using Prosites [3,8,9], iii) parses the output of PAML extracting the positional information of fast-evolving sites and transform them into the coordinate system of the protein structure, iv) outputs a file per gene with the positions correlations to its homologous sequence. The second script uses the output of the first one to generate the protein’s graphical assessment. LYS can therefore generate figures to be used in publication highlighting the positively selected sites mapped on regions that are known to have functional relevance and/or be used to reduce the number of targets that will be further analyzed by providing a list of those for which structural information can be retrieved.MotivationAutomatizing the search for protein structures to assess the functional impact of sites found to be under positive selection by codeml, implemented in PAML [1]. Building publication-quality figures highlighting the sites on a protein structure model that are within and outside functional domains. reduces the workload associated with selecting proteins for which a functional assessment of the impact of mutations can be done using a protein structure. This is especially relevant when analyzing almost complete proteomes which is the case of large comparative genomic studies.SoftwareLYS scripts are executed in the command line. They automatically search for homologous proteins at the RSCB database [10], determine the functional domain locations and correlate the positions pointed by the M8 model [1], and output a data frame that can be used as the input by PyMOL [7] to generate a visualization of the results.AvailabilityLYS is easy to install and implement and they are available at https://github.com/LysSanzMoreta/LYSAutomaticSearch
Databáze: OpenAIRE