Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox

Autor: Jakob Wirbel, Konrad Zych, Morgan Essex, Nicolai Karcher, Ece Kartal, Guillem Salazar, Peer Bork, Shinichi Sunagawa, Georg Zeller
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Zdroj: Genome Biology, Vol 22, Iss 1, Pp 1-27 (2021)
Druh dokumentu: article
ISSN: 1474-760X
DOI: 10.1186/s13059-021-02306-1
Popis: Abstract The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers using machine learning (ML). However, metagenomics-specific software is scarce, and overoptimistic evaluation and limited cross-study generalization are prevailing issues. To address these, we developed SIAMCAT, a versatile R toolbox for ML-based comparative metagenomics. We demonstrate its capabilities in a meta-analysis of fecal metagenomic studies (10,803 samples). When naively transferred across studies, ML models lost accuracy and disease specificity, which could however be resolved by a novel training set augmentation strategy. This reveals some biomarkers to be disease-specific, with others shared across multiple conditions. SIAMCAT is freely available from siamcat.embl.de .
Databáze: Directory of Open Access Journals