Autor: |
Scherr TF; Atticus Labs, Baltimore, MD 21212, USA., Douglas CE; Diagnostic Systems Division, U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, MD 21702, USA., Schaecher KE; Virology Division, U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, MD 21702, USA., Schoepp RJ; Diagnostic Systems Division, U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, MD 21702, USA., Ricks KM; Diagnostic Systems Division, U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, MD 21702, USA., Shoemaker CJ; Diagnostic Systems Division, U.S. Army Medical Research Institute of Infectious Diseases, Fort Detrick, MD 21702, USA. |
Abstrakt: |
In recent years, infectious disease diagnosis has increasingly turned to host-centered approaches as a complement to pathogen-directed ones. The former, however, typically requires the interpretation of complex multiple biomarker datasets to arrive at an informative diagnostic outcome. This report describes a machine learning (ML)-based classification workflow that is intended as a template for researchers seeking to apply ML approaches for developing host-based infectious disease biomarker classifiers. As an example, we built a classification model that could accurately distinguish between three disease etiology classes: bacterial, viral, and normal in human sera using host protein biomarkers of known diagnostic utility. After collecting protein data from known disease samples, we trained a series of increasingly complex Auto-ML models until arriving at an optimized classifier that could differentiate viral, bacterial, and non-disease samples. Even when limited to a relatively small training set size, the model had robust diagnostic characteristics and performed well when faced with a blinded sample set. We present here a flexible approach for applying an Auto-ML-based workflow for the identification of host biomarker classifiers with diagnostic utility for infectious disease, and which can readily be adapted for multiple biomarker classes and disease states. |