Purely Structural Protein Scoring Functions Using Support Vector Machine and Ensemble Learning

Autor:	Silvia Crivelli, Shokoufeh Mirzaei, Tomer Sidi, Chen Keasar
Rok vydání:	2019
Předmět:	Support Vector Machine Computer science Active learning (machine learning) 0206 medical engineering Stability (learning theory) 02 engineering and technology Machine learning computer.software_genre Machine Learning Relevance vector machine Genetics Instance-based learning Databases Protein Structured support vector machine business.industry Applied Mathematics Computational Biology Proteins Online machine learning Ensemble learning Computational learning theory Artificial intelligence Data mining business computer Algorithms 020602 bioinformatics Biotechnology
Zdroj:	IEEE/ACM Transactions on Computational Biology and Bioinformatics. 16:1515-1523
ISSN:	2374-0043 1545-5963
DOI:	10.1109/tcbb.2016.2602269
Popis:	The function of a protein is determined by its structure, which creates a need for efficient methods of protein structure determination to advance scientific and medical research. Because current experimental structure determination methods carry a high price tag, computational predictions are highly desirable. Given a protein sequence, computational methods produce numerous 3D structures known as decoys. Selection of the best quality decoys is both challenging and essential as the end users can handle only a few ones. Therefore, scoring functions are central to decoy selection. They combine measurable features into a single number indicator of decoy quality. Unfortunately, current scoring functions do not consistently select the best decoys. Machine learning techniques offer great potential to improve decoy scoring. This paper presents two machine-learning based scoring functions to predict the quality of proteins structures, i.e., the similarity between the predicted structure and the experimental one without knowing the latter. We use different metrics to compare these scoring functions against three state-of-the-art scores. This is a first attempt at comparing different scoring functions using the same non-redundant dataset for training and testing and the same features. The results show that adding informative features may be more significant than the method used.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::85000f71dea043cb6a36fc46a6d04430 https://doi.org/10.1109/tcbb.2016.2602269 Zobrazit plný text záznamu