Improvement of Protein Model Scoring Using Grouping and Interpreter for Machine Learning

Autor: Catherine Spooner, Bogdan Czejdo, Sambit Bhattacharya
Rok vydání: 2019
Předmět:
Zdroj: CCWC
DOI: 10.1109/ccwc.2019.8666524
Popis: In this paper, we describe our protein folding research with the goal of improving protein model scoring by grouping of protein models and using an interpreter for machine learning (ML). The traditional approach is to use a handful of popular ML algorithms, such as Support Vector Machines (SVM), Random Forest and Neural Networks that are trained on a whole set of models. Our approach is to group the protein models and train the ML algorithms on each group separately. Our framework can be generalized to other application of ML where there is a strong diversification of data set. In this paper, we describe results of comparison of traditional vs. our grouping approach showing that some improvement in the scoring of protein models can be achieved. To further improve the scoring, an interpreter for machine learning is used. The interpreter is based on Local Interpretable Model-Agnostic Explanations (LIME) tool. In this paper it is used to determine feature vector for each group of protein models. Different feature vectors are then used for ML training on different groups of protein models allowing us to improve the ML algorithms. In addition, interpreter of ML can be used in the future to provide feedback for the process of protein models generation.
Databáze: OpenAIRE