Improvement of Protein Model Scoring Using Grouping and Interpreter for Machine Learning

Autor:	Catherine Spooner, Bogdan Czejdo, Sambit Bhattacharya
Rok vydání:	2019
Předmět:	0303 health sciences Artificial neural network Process (engineering) Computer science business.industry Feature vector Machine learning computer.software_genre Random forest Set (abstract data type) Data set Support vector machine 03 medical and health sciences 0302 clinical medicine Artificial intelligence business computer 030217 neurology & neurosurgery Interpreter 030304 developmental biology
Zdroj:	CCWC
DOI:	10.1109/ccwc.2019.8666524
Popis:	In this paper, we describe our protein folding research with the goal of improving protein model scoring by grouping of protein models and using an interpreter for machine learning (ML). The traditional approach is to use a handful of popular ML algorithms, such as Support Vector Machines (SVM), Random Forest and Neural Networks that are trained on a whole set of models. Our approach is to group the protein models and train the ML algorithms on each group separately. Our framework can be generalized to other application of ML where there is a strong diversification of data set. In this paper, we describe results of comparison of traditional vs. our grouping approach showing that some improvement in the scoring of protein models can be achieved. To further improve the scoring, an interpreter for machine learning is used. The interpreter is based on Local Interpretable Model-Agnostic Explanations (LIME) tool. In this paper it is used to determine feature vector for each group of protein models. Different feature vectors are then used for ML training on different groups of protein models allowing us to improve the ML algorithms. In addition, interpreter of ML can be used in the future to provide feedback for the process of protein models generation.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::6b31725d55e29f07822e8a9b2856ffda https://doi.org/10.1109/ccwc.2019.8666524 Zobrazit plný text záznamu