A flexible and general semi-supervised approach to multiple hypothesis testing

Autor: Freestone, Jack, Noble, William Stafford, Keich, Uri
Rok vydání: 2024
Předmět:
Druh dokumentu: Working Paper
Popis: Standard multiple testing procedures are designed to report a list of discoveries, or suspected false null hypotheses, given the hypotheses' p-values or test scores. Recently there has been a growing interest in enhancing such procedures by combining additional information with the primary p-value or score. Specifically, such so-called ``side information'' can be leveraged to improve the separation between true and false nulls along additional ``dimensions'' thereby increasing the overall sensitivity. In line with this idea, we develop RESET (REScoring via Estimating and Training) which uses a unique data-splitting protocol that subsequently allows any semi-supervised learning approach to factor in the available side-information while maintaining finite-sample error rate control. Our practical implementation, RESET Ensemble, selects from an ensemble of classification algorithms so that it is compatible to a range of multiple testing scenarios without the need for the user to select the appropriate one. We apply RESET to both p-value and competition based multiple testing problems and show that RESET is (1) power-wise competitive, (2) fast compared to most tools and (3) is able to uniquely achieve finite sample FDR or FDP control, depending on the user's preference.
Databáze: arXiv