High-sensitivity pattern discovery in large, paired multi-omic datasets

Autor: Emma Schwager, Sucipto K, Gholamali Rahnavard, George Weingart, Levi Waldron, Yo Sup Moon, Lauren J. McIver, Xochitl C. Morgan, Curtis Huttenhower, Ghazi Ar, Eric A. Franzosa, Jason Lloyd-Price
Rok vydání: 2021
Předmět:
Popis: Modern biological screens yield enormous numbers of measurements, and identifying and interpreting statistically significant associations among features is essential. Here, we present a novel hierarchical framework, HAllA (Hierarchical All-against-All association testing), for structured association discovery between paired high-dimensional datasets. HAllA efficiently integrates hierarchical hypothesis testing with false discovery rate correction to reveal significant linear and non-linear block-wise relationships among continuous and/or categorical data. We optimized and evaluated HAllA using heterogeneous synthetic datasets of known association structure, where HAllA outperformed all-against-all and other block testing approaches across a range of common similarity measures. We then applied HAllA to a series of real-world multi-omics datasets, revealing new associations between gene expression and host immune activity, the microbiome and host transcriptome, metabolomic profiling, and human health phenotypes. An open-source implementation of HAllA is freely available at http://huttenhower.sph.harvard.edu/halla along with documentation, demo datasets, and a user group.Author SummaryModern scientific datasets increasingly include multiple measurements of many complementary data types. Here, we present HAllA, a method and implementation that overcomes the statistical challenges presented by data of this type by using feature similarity within each dataset to find statistically significant groups of features between them. We applied HAllA to simulated and real datasets, showing that HAllA outperformed existing procedures and identified compelling biological relationships. HAllA is widely applicable to diverse data structures and presents the user with grouped results that are easier to interpret than traditional methods.
Databáze: OpenAIRE