Kmerator Suite: design of specific k -mer signatures and automatic metadata discovery in large RNA-seq datasets.
Autor: | Riquier S; IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France., Bessiere C; IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France., Guibert B; IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France., Bouge AL; SeqOne, 34000, Montpellier, France., Boureux A; IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France., Ruffle F; IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France., Audoux J; SeqOne, 34000, Montpellier, France., Gilbert N; IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France., Xue H; Institute for Integrative Biology of the Cell, CEA, CNRS, Université Paris-Saclay, 91198, Gif sur Yvette, France., Gautheret D; Institute for Integrative Biology of the Cell, CEA, CNRS, Université Paris-Saclay, 91198, Gif sur Yvette, France., Commes T; IRMB, University of Montpellier, INSERM, 80 rue Augustin Fliche, 34295, Montpellier, France. |
---|---|
Jazyk: | angličtina |
Zdroj: | NAR genomics and bioinformatics [NAR Genom Bioinform] 2021 Jun 23; Vol. 3 (3), pp. lqab058. Date of Electronic Publication: 2021 Jun 23 (Print Publication: 2021). |
DOI: | 10.1093/nargab/lqab058 |
Abstrakt: | The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K -mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k -mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k -mer signatures, quantify these k -mers into RNA-seq datasets and quickly visualize large dataset characteristics. The core tool, Kmerator, produces specific k -mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor gene-specific k -mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualized through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non-coding RNAs for human health applications. (© The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics.) |
Databáze: | MEDLINE |
Externí odkaz: |