iRNA-m5C_NB: A Novel Predictor to Identify RNA 5-Methylcytosine Sites Based on the Naive Bayes Classifier
Autor: | Lei Xu, Xiaoling Li, Lijun Dou, Huaikun Xiang, Hui Ding |
---|---|
Rok vydání: | 2020 |
Předmět: |
0303 health sciences
General Computer Science Computer science business.industry Feature extraction General Engineering RNA Pattern recognition 03 medical and health sciences Naive Bayes classifier Bayes' theorem 0302 clinical medicine 030220 oncology & carcinogenesis Jackknife test General Materials Science Artificial intelligence business Jackknife resampling 030304 developmental biology |
Zdroj: | IEEE Access. 8:84906-84917 |
ISSN: | 2169-3536 |
DOI: | 10.1109/access.2020.2991477 |
Popis: | As one of the widespread RNA post-transcriptional modifications (PTCMs), 5-Methylcytosine (m5C) plays vital roles in better understanding of basic biological mechanisms and major disease treatments. In experiments, traditional high-throughput approaches to find m5C sites are usually expensive and laborious. Additionally, facing with a large number of RNA sequences, developing accurate computational methods to distinguish m5C and non-m5C sites is an efficient solution. Here we introduced a novel predictor, called iRNA-m5C_NB, to identify m5C sites in Home sapiens using Naive Bayes (NB) algorithm. In this method, unbalanced dataset Met935 is firstly analyzed using efficient hybrid-sampling strategy SMOTEEEN. Then top 57 features are selected by the ANOVA F-value from four kinds of well-performance feature extraction techniques, including Bi-profile Bayes (BPB), enhanced Nucleic Acid Composition (ENAC), electron-ion interaction pseudopotentials (EIIP) and mMGap_1. Based on the jackknife test, the evaluated recall for the unbalanced training dataset Met935 is up to 82.81% with MCC of 0.63. And for the independent dataset Test1157, the predictor still shows high recall of 70.06% and MCC of 0.34. It is the first m5C predictor constructed using the unbalanced dataset, and the recall scores are increased by 19.82% and 59.23% for jackknife and independent tests compared with the latest tool RNAm5CPred, respectively. We demonstrate that the proposed predictor iRNA-m5C_NB outperforms other state-of-art models, which hopes to be an efficient and reliable method to identify m5C sites. |
Databáze: | OpenAIRE |
Externí odkaz: |