Machine learning algorithm for precise prediction of 2'-O-methylation (Nm) sites from experimental RiboMethSeq datasets.
Autor: | Pichot F; Institute of Pharmacy and Biochemistry, Johannes Gutenberg University Mainz, Mainz, Germany; Université de Lorraine, CNRS, INSERM, UAR2008/US40 IBSLor, EpiRNA-Seq Core facility, Nancy F-54000, France., Marchand V; Université de Lorraine, CNRS, INSERM, UAR2008/US40 IBSLor, EpiRNA-Seq Core facility, Nancy F-54000, France., Helm M; Institute of Pharmacy and Biochemistry, Johannes Gutenberg University Mainz, Mainz, Germany., Motorin Y; Université de Lorraine, CNRS, INSERM, UAR2008/US40 IBSLor, EpiRNA-Seq Core facility, Nancy F-54000, France; Université de Lorraine, CNRS, UMR7365 IMoPA, Nancy F-54000, France. Electronic address: motorine5@univ-lorraine.fr. |
---|---|
Jazyk: | angličtina |
Zdroj: | Methods (San Diego, Calif.) [Methods] 2022 Jul; Vol. 203, pp. 311-321. Date of Electronic Publication: 2022 Mar 18. |
DOI: | 10.1016/j.ymeth.2022.03.007 |
Abstrakt: | Analysis of epitranscriptomic RNA modifications by deep sequencing-based approaches brings an essential contribution to the general knowledge on their precise locations and relative stoichiometry in cellular RNAs. To reveal RNA modifications, several analytical approaches have been proposed, including antibody-driven enrichment, analysis of RT-signatures and specific chemical treatments. However, analysis and interpretation of these massive datasets, especially for low abundant cellular RNAs (e.g. mRNA and lncRNA) is not easy nor straightforward, since the insufficient specificity and selectivity are leading to massive false-positive and false-negative identifications. The main issue in the application of these methods relies on a subjective classification of potentially modified positions, mostly based on arbitrarily defined threshold values for different scores. Such approach using pre-defined scores' values was revealed to be appropriate for limited complexity datasets (for tRNA and/or rRNA analysis), but application to longer reference sequences requires much better classification algorithms. In this work we applied a machine learning algorithm (Random Forest, RF) to create a predictive model for analysis of 2'-O-methylated sites in RNA using RiboMethSeq datasets. Model's training was performed on a large collection of human rRNA datasets with well-known modification profiles and the performance of the prediction was assessed using experimentally defined profiles for other eukaryotic rRNAs (S.cerevisiae and A.thaliana). Application of this Random Forest prediction model for detection of other RNA modifications and to more complex datasets is discussed. (Copyright © 2022 Elsevier Inc. All rights reserved.) |
Databáze: | MEDLINE |
Externí odkaz: |