Kmerator Suite: design of specific k-mer signatures and automatic metadata discovery in large RNA-Seq datasets
Autor: | Nicolas Gilbert, Thérèse Commes, Chloé Bessière, Anthony Boureux, Jérôme Audoux, Haoliang Xue, Benoit Guibert, Florence Ruffle, Sébastien Riquier, Anne-Laure Bougé, Daniel Gautheret |
---|---|
Přispěvatelé: | Cellules Souches, Plasticité Cellulaire, Médecine Régénératrice et Immunothérapies (IRMB), Centre Hospitalier Régional Universitaire [Montpellier] (CHRU Montpellier)-Institut National de la Santé et de la Recherche Médicale (INSERM)-Université de Montpellier (UM), SeqOne [CHRU Montpellier], Centre Hospitalier Régional Universitaire [Montpellier] (CHRU Montpellier)-Hôpital Saint Eloi (CHRU Montpellier), Centre Hospitalier Régional Universitaire [Montpellier] (CHRU Montpellier), Institut de Biologie Intégrative de la Cellule (I2BC), Commissariat à l'énergie atomique et aux énergies alternatives (CEA)-Université Paris-Saclay-Centre National de la Recherche Scientifique (CNRS), Agence Nationale de la recherche [ANR-10-INBS-09], Canceropole Grand Ouest [2017-EM24], Region Occitanie[R19073FF], ANR-10-INBS-0009,France-Génomique,Organisation et montée en puissance d'une Infrastructure Nationale de Génomique(2010) |
Rok vydání: | 2021 |
Předmět: |
0303 health sciences
Computer science Interface (computing) Suite Computational biology [SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM] Metadata discovery Set (abstract data type) Metadata 03 medical and health sciences Identification (information) 0302 clinical medicine k-mer [SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN] Human genome 030217 neurology & neurosurgery 030304 developmental biology |
Zdroj: | NAR Genomics and Bioinformatics NAR Genomics and Bioinformatics, 2021, 3 (3), pp.lqab058. ⟨10.1093/nargab/lqab058⟩ |
ISSN: | 2631-9268 |
DOI: | 10.1101/2021.05.20.444982 |
Popis: | The huge body of publicly available RNA-sequencing (RNA-seq) libraries is a treasure of functional information allowing to quantify the expression of known or novel transcripts in tissues. However, transcript quantification commonly relies on alignment methods requiring a lot of computational resources and processing time, which does not scale easily to large datasets. K-mer decomposition constitutes a new way to process RNA-seq data for the identification of transcriptional signatures, as k-mers can be used to quantify accurately gene expression in a less resource-consuming way. We present the Kmerator Suite, a set of three tools designed to extract specific k-mer signatures, quantify these k-mers into RNA-seq datasets and quickly visualize large dataset characteristics. The core tool, Kmerator, produces specific k-mers for 97% of human genes, enabling the measure of gene expression with high accuracy in simulated datasets. KmerExploR, a direct application of Kmerator, uses a set of predictor gene-specific k-mers to infer metadata including library protocol, sample features or contaminations from RNA-seq datasets. KmerExploR results are visualized through a user-friendly interface. Moreover, we demonstrate that the Kmerator Suite can be used for advanced queries targeting known or new biomarkers such as mutations, gene fusions or long non-coding RNAs for human health applications. |
Databáze: | OpenAIRE |
Externí odkaz: |