An evaluation of different classification algorithms for protein sequence-based reverse vaccinology prediction.

Autor: Heinson AI; Faculty of Medicine University of Southampton, Southampton, United Kingdom., Ewing RM; Department of Biological Sciences University of Southampton, Southampton, United Kingdom., Holloway JW; Faculty of Medicine, University of Southampton, Southampton, United Kingdom., Woelk CH; Merck Exploratory Science Center, Cambridge, United States of America., Niranjan M; Department of Electronics and Computer Science, University of Southampton, Southampton, United Kingdom.
Jazyk: angličtina
Zdroj: PloS one [PLoS One] 2019 Dec 13; Vol. 14 (12), pp. e0226256. Date of Electronic Publication: 2019 Dec 13 (Print Publication: 2019).
DOI: 10.1371/journal.pone.0226256
Abstrakt: Previous work has shown that proteins that have the potential to be vaccine candidates can be predicted from features derived from their amino acid sequences. In this work, we make an empirical comparison across various machine learning classifiers on this sequence-based inference problem. Using systematic cross validation on a dataset of 200 known vaccine candidates and 200 negative examples, with a set of 525 features derived from the AA sequences and feature selection applied through a greedy backward elimination approach, we show that simple classification algorithms often perform as well as more complex support vector kernel machines. The work also includes a novel cross validation applied across bacterial species, i.e. the validation proteins all come from a specific species of bacterium not represented in the training set. We termed this type of validation Leave One Bacteria Out Validation (LOBOV).
Competing Interests: The fact that CHW is now employed by Merck Research Laboratories does not alter our adherence to PLOS ONE policies and we have no other competing interests to declare either.
Databáze: MEDLINE
Nepřihlášeným uživatelům se plný text nezobrazuje