BASE: A novel workflow to integrate nonubiquitous genes in comparative genomics analyses for selection

Autor: Giovanni Piccinini, Giobbe Forni, Andrea Luchetti, Angelo Alberto Ruggeri
Přispěvatelé: Forni G., Ruggieri A.A., Piccinini G., Luchetti A.
Rok vydání: 2021
Předmět:
Zdroj: Ecology and Evolution
Ecology and Evolution, Vol 11, Iss 19, Pp 13029-13035 (2021)
ISSN: 2045-7758
Popis: Inferring the selective forces that orthologous genes underwent across different lineages can help us understand the evolutionary processes that have shaped their extant diversity and the phenotypes they underlie. The most widespread metric to estimate the selection regimes of coding genes—across sites and phylogenies—is the ratio of nonsynonymous to synonymous substitutions (dN/dS, also known as ω). Nowadays, modern sequencing technologies and the large amount of already available sequence data allow the retrieval of thousands of orthologous genes across large numbers of species. Nonetheless, the tools available to explore selection regimes are not designed to automatically process all genes, and their practical usage is often restricted to the single‐copy ones which are found across all species considered (i.e., ubiquitous genes). This approach limits the scale of the analysis to a fraction of single‐copy genes, which can be as low as an order of magnitude in respect to those which are not consistently found in all species considered (i.e., nonubiquitous genes). Here, we present a workflow named BASE that—leveraging the CodeML framework—eases the inference and interpretation of gene selection regimes in the context of comparative genomics. Although a number of bioinformatics tools have already been developed to facilitate this kind of analyses, BASE is the first to be specifically designed to allow the integration of nonubiquitous genes in a straightforward and reproducible manner. The workflow—along with all relevant documentation—is available at github.com/for‐giobbe/BASE.
Comparative genomics analyses for selection are often restricted to the subset of genes which is shared by all the species considered (i.e., ubiquitous genes) while those which are not found across some of them (i.e., nonubiquitous genes) are often overlooked, due to the lack of automated approaches to include them. Yet, disregarding such a large portion of genes may potentially conceal important evolutionary processes. For this reason, we developed a novel workflow—named BASE—which allows the integration of nonubiquitous genes in a straightforward and reproducible manner.
Databáze: OpenAIRE