Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features

Autor: Hui Ding, Xiaoling Li, Lei Xu, Lijun Dou, Huaikun Xiang
Rok vydání: 2020
Předmět:
Zdroj: Molecular Therapy. Nucleic Acids
Molecular Therapy: Nucleic Acids, Vol 21, Iss, Pp 332-342 (2020)
ISSN: 2162-2531
DOI: 10.1016/j.omtn.2020.06.004
Popis: 5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensive. In addition, the number of RNA sequences shows explosive growth in the post-genomic era. Thus, machine-learning-based methods are urgently requested to quickly predict RNA m5C modifications with high accuracy. Here, we propose a noval support-vector-machine (SVM)-based tool, called iRNA-m5C_SVM, by combining multiple sequence features to identify m5C sites in Arabidopsis thaliana. Eight kinds of popular feature-extraction methods were first investigated systematically. Then, four well-performing features were incorporated to construct a comprehensive model, including position-specific propensity (PSP) (PSNP, PSDP, and PSTP, associated with frequencies of nucleotides, dinucleotides, and trinucleotides, respectively), nucleotide composition (nucleic acid, di-nucleotide, and tri-nucleotide compositions; NAC, DNC, and TNC, respectively), electron-ion interaction pseudopotentials of trinucleotide (PseEIIPs), and general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-general). Evaluated accuracies over 10-fold cross-validation and independent tests achieved 73.06% and 80.15%, respectively, which showed the best predictive performances in A. thaliana among existing models. It is believed that the proposed model in this work can be a promising alternative for further research on m5C modification sites in plant.
Graphical Abstract
5-Methylcytosine (m5C) is a well-known post-transcriptional modification, which plays a significant role in various biological processes. Dou et al. built a novel SVM-based predictor, called iRNA-m5C_SVM, to identify RNA m5C modifications using multiple sequence features. Corresponding performances were performed with other reported methods, which provided a competitive bioinformatic tool to predict m5C sites.
Databáze: OpenAIRE