Bidirectional Long Short-Term Memory Network for Taxonomic Classification.

Autor: Soliman, Naglaa. F., Alhalem, Samia M. Abd, El-Shafai, Walid, Abdulrahman, Salah Eldin S. E., Ismaiel, N., El-Rabaie, El-Sayed M., Algarni, Abeer D., Algarni, Fatimah, El-Samie, Fathi E. Abd
Předmět:
Zdroj: Intelligent Automation & Soft Computing; 2022, Vol. 33 Issue 1, p103-116, 14p
Abstrakt: Identifying and classifying Deoxyribonucleic Acid (DNA) sequences and their functions have been considered as the main challenges in bioinformatics. Advances in machine learning and Deep Learning (DL) techniques are expected to improve DNA sequence classification. Since the DNA sequence classification depends on analyzing textual data, Bidirectional Long Short-Term Memory (BLSTM) algorithms are suitable for tackling this task. Generally, classifiers depend on the patterns to be processed and the pre-processing method. This paper is concerned with a new proposed classification framework based on Frequency Chaos Game Representation (FCGR) followed by Discrete Wavelet Transform (DWT) and BLSTM. Firstly, DNA strings are transformed into numerical matrices by FCGR. Then, the DWT is used instead of the pooling layer as a tool of data compression. The benefit of using the DWT is two-fold. It preserves the useful information only that enables the following BLSTM training, effectively. Besides, DWT adds more important details to the encoded sequences due to finding effective features in the DNA fragments. Finally, the BLSTM model is trained to classify the DNA sequences. Evaluation metrics such as F1 score and accuracy show that the proposed framework outperforms the state-of-the-art algorithms. Hence, it can be used in DNA classification applications. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index