Determining the Presence of Metabolic Pathways using Machine Learning Approach
Autor: | Fazilah Haron, Yara Saud Aljarbou |
---|---|
Rok vydání: | 2020 |
Předmět: |
General Computer Science
Computer science Systems biology 0206 medical engineering Decision tree Metabolic network Feature selection 02 engineering and technology Machine learning computer.software_genre Genome 03 medical and health sciences Naive Bayes classifier BioCyc database collection 030304 developmental biology Whole genome sequencing 0303 health sciences business.industry Support vector machine Metabolic pathway ComputingMethodologies_PATTERNRECOGNITION Artificial intelligence business computer Classifier (UML) 020602 bioinformatics |
Zdroj: | International Journal of Advanced Computer Science and Applications. 11 |
ISSN: | 2156-5570 2158-107X |
DOI: | 10.14569/ijacsa.2020.0110845 |
Popis: | The reconstruction of the metabolic network of an organism based on its genome sequence is a key challenge in systems biology. One of the strategies that can be used to address this problem is the prediction of the presence or the absence of a metabolic pathway from a reference database of known pathways. Although, such models have been constructed manually, obviously such a method cannot be used to cover thousands of genomes that has been sequenced. Therefore, more advanced techniques are needed for computational representation of metabolic networks. In this research, we have explored machine learning approach to determine the presence or the absent of metabolic pathway based on its annotated genome. We have built our own dataset of 4978 instances of pathways. The dataset consists of 1585 pathways with each having 20 different representations from 20 organisms. The pathways were obtained from the BioCyc Database Collection. The pathway dataset also consists of 20 features used to describe each pathway. In order to identify the suitable classifier, we have experimented five machine learning algorithms with and without applying feature selection methods, namely Decision Tree, Naive Bayes, Support Vector Machine, K-Nearest Neighbor and Logistic Regression. Our experiments have shown that Support Vector Machine is the best classifier with an accuracy of 96.9%, while the maximum accuracy reached by the previous work is 91.2%. Hence, adding more data to the pathway dataset can improve the performance of the machine learning classifiers. |
Databáze: | OpenAIRE |
Externí odkaz: |