DNA Sequences Analysis of Single-Gene Disorders and Prediction Model Construction Based on Machine Learning and Convolutional Neural Networks
Autor: | Min-Chen Lin, 林旻蓁 |
---|---|
Rok vydání: | 2019 |
Druh dokumentu: | 學位論文 ; thesis |
Popis: | 107 There are many types of single-gene disorders, which could affect a wide range of human bodies, include heart disease, metabolic abnormality, brain or neurological disorders, skin lesion, etc., and even lead to death. Nowadays, machine learning and deep learning techniques have been able to assist physicians in clinical diagnosis with objective and accurate advantages. In order to prevent diseases onset or from getting worse, these techniques could perform analysis of human genes and let patients to receive early treatment or adjust their habits of eating and living. NCBI GenBank database is applied to gather DNA sequences in this study. These sequences are transformed into global data and local data as inputs by multiple algorithms and tools. Convolutional Neural Networks, Naïve Bayes, Support Vector Machine, C4.5 algorithm and Random Forest are implemented to construct classification models of sequences of single-gene disorders. Performance of various models would be compared by validation indexes of confusion matrix. The experimental results show that when the global data is used as the input data, a higher classification effect could be obtained. Among all algorithms, Random Forest and Convolutional Neural Networks have the best performance with accuracy over 97%. Performances of other algorithms are sorted from best to worst in the following order: Naïve Bayes > C4.5 algorithm > Support Vector Machine. In the analysis of local data, the 10-second segmented audio signal images have the best classification effect in the Convolutional Neural Networks model with sensitivity 84.81%, F1 score 84.08%, MCC 82.64% and accuracy 84.28%. Multiple classification models of single-gene disorders are proposed in this study. The combination of algorithms and input data with best performance could be selected as a tool and direction for genetic disorders diagnosis and screening. This study expects that these classification models could assist physicians in clinical diagnosis of single-gene disorders and as a research basis of bioinformatics. |
Databáze: | Networked Digital Library of Theses & Dissertations |
Externí odkaz: |
Pro tento záznam nejsou dostupné žádné jednotky.