Autor: |
Galea D; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Inglese P; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Cammack L; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Strittmatter N; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Rebec M; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Mirnezami R; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Laponogov I; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Kinross J; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Nicholson J; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Takats Z; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom., Veselkov KA; Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London, London, United Kingdom. kirill.veselkov04@imperial.ac.uk. |
Abstrakt: |
Hierarchical classification (HC) stratifies and classifies data from broad classes into more specific classes. Unlike commonly used data classification strategies, this enables the probabilistic prediction of unknown classes at different levels, minimizing the burden of incomplete databases. Despite these advantages, its translational application in biomedical sciences has been limited. We describe and demonstrate the implementation of a HC approach for "omics-driven" classification of 15 bacterial species at various taxonomic levels achieving 90-100% accuracy, and 9 cancer types into morphological types and 35 subtypes with 99% and 76% accuracy, respectively. Unknown bacterial species were probabilistically assigned with 100% accuracy to their respective genus or family using mass spectra (n = 284). Cancer types were predicted by mRNA data (n = 1960) for most subtypes with 95-100% accuracy. This has high relevance in clinical practice where complete datasets are difficult to compile with the continuous evolution of diseases and emergence of new strains, yet prediction of unknown classes, such as bacterial species, at upper hierarchy levels may be sufficient to initiate antimicrobial therapy. The algorithms presented here can be directly translated into clinical-use with any quantitative data, and have broad application potential, from unlabeled sample identification, to hierarchical feature selection, and discovery of new taxonomic variants. |