Popis: |
Background: Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. Results: In this study, we present an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporates three different input modalities (i.e. OFH modality, k-mer modality, and sequence modality), then a multimodal deep learning framework is built for learning the high-level abstract representations and predicting the probability whether a transcript is lncRNA or not. Conclusions: LncRNA_Mdeep achieves 98.73% prediction accuracy in 10-fold cross-validation test on human. Compared with other eight state-of-the-art methods, lncRNA_Mdeep shows 93.12% prediction accuracy independent test on human, which is 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets show that lncRNA_Mdeep is a powerful predictor for identifying lncRNAs. The source code can be downloaded from https://github.com/NWPU-903PR/lncRNA_Mdeep. |