Determining the Presence of Metabolic Pathways using Machine Learning Approach

Autor: Fazilah Haron, Yara Saud Aljarbou
Rok vydání: 2020
Předmět:
Zdroj: International Journal of Advanced Computer Science and Applications. 11
ISSN: 2156-5570
2158-107X
DOI: 10.14569/ijacsa.2020.0110845
Popis: The reconstruction of the metabolic network of an organism based on its genome sequence is a key challenge in systems biology. One of the strategies that can be used to address this problem is the prediction of the presence or the absence of a metabolic pathway from a reference database of known pathways. Although, such models have been constructed manually, obviously such a method cannot be used to cover thousands of genomes that has been sequenced. Therefore, more advanced techniques are needed for computational representation of metabolic networks. In this research, we have explored machine learning approach to determine the presence or the absent of metabolic pathway based on its annotated genome. We have built our own dataset of 4978 instances of pathways. The dataset consists of 1585 pathways with each having 20 different representations from 20 organisms. The pathways were obtained from the BioCyc Database Collection. The pathway dataset also consists of 20 features used to describe each pathway. In order to identify the suitable classifier, we have experimented five machine learning algorithms with and without applying feature selection methods, namely Decision Tree, Naive Bayes, Support Vector Machine, K-Nearest Neighbor and Logistic Regression. Our experiments have shown that Support Vector Machine is the best classifier with an accuracy of 96.9%, while the maximum accuracy reached by the previous work is 91.2%. Hence, adding more data to the pathway dataset can improve the performance of the machine learning classifiers.
Databáze: OpenAIRE