ProfhEX: AI-based platform for small molecules liability profiling

Autor: Filippo Lunghini, Anna Fava, Vincenzo Pisapia, Francesco Sacco, Daniela Iaconis, Andrea Rosario Beccari
Rok vydání: 2022
DOI: 10.5281/zenodo.7925872
Popis: Background Drugs off-target interactions are one of the main reasons of candidate failure in the drug discovery process. Anticipating potential drug’s adverse effects in the early stages is necessary to minimize health risks on patients, animal testing, and economical costs. With the constantly increasing size of virtual screening libraries AI-driven methods can be exploited as first-tier screening tools proving liability estimation for drug candidates. Objectives We present ProfhEX, an AI-driven suite of 46 OECD-compliant machine learning models able to profile small molecules on 7 relevant liability groups, namely: cardiovascular, central nervous system, gastrointestional, endocrine disruption, renal, pumlonary and immune response toxicities. Methods Experimental affinity data was collected from public and commercial data sources. The entire chemical space comprised 289’202 activity data for a total of 210’116 unique compounds, spanning over 46 targets with dataset sizes ranging from 819 to 18896. Gradient boosting and random forest algorithms were initially employed and ensembled for the selection of a champion model. Models were validated according to the OECD principles, including robust internal (cross validation, bootstrap, y-scrambling) and external validation. Results Champion models achieved an average Pearson correlation coefficient of 0.81 (SD of 0.06) and a root mean squared error of 0.75 (SD of 0.09). All liability groups showed good hit-retrievement power with and average enrichment factor (at 5%) of 13.1 (SD of 3.0) and AUC of 0.92 (SD of 0.05). Conclusion ProfhEX would be a useful tool for large-scale liability profiling of small molecules. This suite will be further expanded with the inclusion of new targets and by complementary modelling approaches, including docking and pharmacophore-based models.
Databáze: OpenAIRE