Drug discovery under covariate shift with domain-informed prior distributions over functions

Autor: Klarner, L, Rudner, T, Reutlinger, M, Schindler, T, Morris, G, Deane, CM, Yeh, YW
Přispěvatelé: Krause, A, Brunskill, E, Cho, K, Engelhardt, B, Sabato, S, Scarlett, J
Jazyk: angličtina
Rok vydání: 2023
Popis: Accelerating the discovery of novel and more effective therapeutics is an important pharmaceutical problem in which deep learning is playing an increasingly significant role. However, real-world drug discovery tasks are often characterized by a scarcity of labeled data and significant covariate shift—a setting that poses a challenge to standard deep learning methods. In this paper, we present Q-SAVI, a probabilistic model able to address these challenges by encoding explicit prior knowledge of the data-generating process into a prior distribution over functions, presenting researchers with a transparent and probabilistically principled way to encode data-driven modeling preferences. Building on a novel, gold-standard bioactivity dataset that facilitates a meaningful comparison of models in an extrapolative regime, we explore different approaches to induce data shift and construct a challenging evaluation setup. We then demonstrate that using Q-SAVI to integrate contextualized prior knowledge of drug-like chemical space into the modeling process affords substantial gains in predictive accuracy and calibration, outperforming a broad range of state-of-the-art self-supervised pre-training and domain adaptation techniques.
Databáze: OpenAIRE