Popis: |
Extracting associations that recur across multiple studies while controlling the false discovery rate is a fundamental challenge. Here, we consider an extension of Efron's single-study two-groups model to allow joint analysis of multiple studies. We assume that given a set of p-values obtained from each study, the researcher is interested in associations that recur in at least $k>1$ studies. We propose new algorithms that differ in how the study dependencies are modeled. We compared our new methods and others using various simulated scenarios. The top performing algorithm, SCREEN (Scalable Cluster-based REplicability ENhancement), is our new algorithm that is based on three stages: (1) clustering an estimated correlation network of the studies, (2) learning replicability (e.g., of genes) within clusters, and (3) merging the results across the clusters using dynamic programming. We applied SCREEN to two real datasets and demonstrated that it greatly outperforms the results obtained via standard meta-analysis. First, on a collection of 29 case-control large-scale gene expression cancer studies, we detected a large up-regulated module of genes related to proliferation and cell cycle regulation. These genes are both consistently up-regulated across many cancer studies, and are well connected in known gene networks. Second, on a recent pan-cancer study that examined the expression profiles of patients with or without mutations in the HLA complex, we detected an active module of up-regulated genes that are related to immune responses. Thanks to our ability to quantify the false discovery rate, we detected thrice more genes as compared to the original study. Our module contains most of the genes reported in the original study, and many new ones. Interestingly, the newly discovered genes are needed to establish the connectivity of the module. |