Prediction of m5C Modifications in RNA Sequences by Combining Multiple Sequence Features
Autor: | Hui Ding, Xiaoling Li, Lei Xu, Lijun Dou, Huaikun Xiang |
---|---|
Rok vydání: | 2020 |
Předmět: |
0301 basic medicine
Computer science Computational biology Article Correlation 03 medical and health sciences chemistry.chemical_compound 0302 clinical medicine position-specific propensity Drug Discovery support vector machine Nucleotide 5-methylcytosine Sequence (medicine) chemistry.chemical_classification nucleotide composition lcsh:RM1-950 RNA Support vector machine electron-ion interaction pseudopotentials of trinucleotide 5-Methylcytosine lcsh:Therapeutics. Pharmacology 030104 developmental biology chemistry 030220 oncology & carcinogenesis Transfer RNA Nucleic acid PC-PseDNC-general Molecular Medicine |
Zdroj: | Molecular Therapy. Nucleic Acids Molecular Therapy: Nucleic Acids, Vol 21, Iss, Pp 332-342 (2020) |
ISSN: | 2162-2531 |
DOI: | 10.1016/j.omtn.2020.06.004 |
Popis: | 5-Methylcytosine (m5C) is a well-known post-transcriptional modification that plays significant roles in biological processes, such as RNA metabolism, tRNA recognition, and stress responses. Traditional high-throughput techniques on identification of m5C sites are usually time consuming and expensive. In addition, the number of RNA sequences shows explosive growth in the post-genomic era. Thus, machine-learning-based methods are urgently requested to quickly predict RNA m5C modifications with high accuracy. Here, we propose a noval support-vector-machine (SVM)-based tool, called iRNA-m5C_SVM, by combining multiple sequence features to identify m5C sites in Arabidopsis thaliana. Eight kinds of popular feature-extraction methods were first investigated systematically. Then, four well-performing features were incorporated to construct a comprehensive model, including position-specific propensity (PSP) (PSNP, PSDP, and PSTP, associated with frequencies of nucleotides, dinucleotides, and trinucleotides, respectively), nucleotide composition (nucleic acid, di-nucleotide, and tri-nucleotide compositions; NAC, DNC, and TNC, respectively), electron-ion interaction pseudopotentials of trinucleotide (PseEIIPs), and general parallel correlation pseudo-dinucleotide composition (PC-PseDNC-general). Evaluated accuracies over 10-fold cross-validation and independent tests achieved 73.06% and 80.15%, respectively, which showed the best predictive performances in A. thaliana among existing models. It is believed that the proposed model in this work can be a promising alternative for further research on m5C modification sites in plant. Graphical Abstract 5-Methylcytosine (m5C) is a well-known post-transcriptional modification, which plays a significant role in various biological processes. Dou et al. built a novel SVM-based predictor, called iRNA-m5C_SVM, to identify RNA m5C modifications using multiple sequence features. Corresponding performances were performed with other reported methods, which provided a competitive bioinformatic tool to predict m5C sites. |
Databáze: | OpenAIRE |
Externí odkaz: |