Abstrakt: |
Within a gene, some nucleotides may be missing, extra nucleotides copies may be present, or certain parts may be deleted; which can cause diseases such as cardiovascular, alzheimer, cancer, etc. Nevertheless, In the one hand, identifying patients' disease from their gene structure remains a challenge since we cannot find associations between a given disease and genetic alteration. However, despite the biological research advances, we still have problems on large data analysis. In view of these factors, we thought about developing a new disease prediction algorithm based on a combination between features extraction and deep or machine learning techniques. To meet this goal, we took into account the fact that the complementarity between micro RiboNucleic Acid (miRNAs) and Messenger RiboNucleic Acid (mRNAs) target sites is critical for genes regulation. For this, we consider protein-coding genes whose expression is modified by abnormal miRNAs as preliminary data. The gene's Frequency Chaos Game Representation (FCGR), which gives the frequency words occurrence in the DNA sequences is then used to define the genes features. Afterwards, we apply and compare a classifier of each of deep and machine learning to predict altered genes. In terms of identifying altered genes, we demonstrate throughout the proposed study that residual neural network (ResNet) with 152 layers outperforms the Support Vectors Machine (SVM) with a sigmoid kernel as well as the deep Convolutional Neural Network (CNN) with three convolution layers. Indeed, we achieved an accuracy of 98% with F C G R 4 as an independent dataset against an accuracy of 90.69% and 96.88% for respectively SVM and CNN. [ABSTRACT FROM AUTHOR] |