Rethinking the applicability domain analysis in QSAR models.

Autor: Mora JR; Departamento de Ingeniería Química, Universidad San Francisco de Quito (USFQ), Instituto de Simulación Computacional (ISC- USFQ), Diego de Robles y Vía Interoceánica, Quito, 170901, Ecuador., Marquez EA; Grupo de Investigaciones en Química Y Biología, Departamento de Química Y Biología, Facultad de Ciencias Básicas, Universidad del Norte, Carrera 51B, Km 5, vía Puerto Colombia, Barranquilla, 081007, Colombia.; Departamento de Ciencias de la Computación, Centro de Investigación Científica y de Educación Superior de Ensenada (CICESE), Cátedras Conacyt, Ensenada, Baja California, México., Pérez-Pérez N; Colegio de Ciencias e Ingenierías 'El Politécnico', Universidad San Francisco de Quito (USFQ), Quito, Ecuador., Contreras-Torres E; Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador., Perez-Castillo Y; Bio-Chemoinformatics Research Group, Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito, 170504, Ecuador., Agüero-Chapin G; CIIMAR - Interdisciplinary Centre of Marine and Environmental Research, University of Porto, Terminal de Cruzeiros do Porto de Leixões, Av. General Norton de Matos s/n, Porto, 4450-208, Portugal.; Department of Biology, Faculty of Sciences, University of Porto, Rua do Campo Alegre, Porto, 4169- 007, Portugal., Martinez-Rios F; Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México., Marrero-Ponce Y; Grupo de Medicina Molecular y Traslacional (MeM&T), Universidad San Francisco de Quito, Escuela de Medicina, Colegio de Ciencias de la Salud (COCSA), Av. Interoceánica Km 12 1/2 y Av. Florencia, 17, Quito, 1200-841, Ecuador.; Facultad de Ingeniería, Universidad Panamericana, CDMX, Augusto Rodin No. 498, Insurgentes Mixcoac, Benito Juárez, Ciudad de México, 03920, México.; Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatics Research International Network (CAMD-BIR IN), Cumbayá, Quito, Ecuador., Barigye SJ; Departamento de Química Física Aplicada, Facultad de Ciencias, Universidad Autónoma de Madrid (UAM), Madrid, 28049, Spain. sjbarigye@gmail.com.
Jazyk: angličtina
Zdroj: Journal of computer-aided molecular design [J Comput Aided Mol Des] 2024 Feb 14; Vol. 38 (1), pp. 9. Date of Electronic Publication: 2024 Feb 14.
DOI: 10.1007/s10822-024-00550-8
Abstrakt: Notwithstanding the wide adoption of the OECD principles (or best practices) for QSAR modeling, disparities between in silico predictions and experimental results are frequent, suggesting that model predictions are often too optimistic. Of these OECD principles, the applicability domain (AD) estimation has been recognized in several reports in the literature to be one of the most challenging, implying that the actual reliability measures of model predictions are often unreliable. Applying tree-based error analysis workflows on 5 QSAR models reported in the literature and available in the QsarDB repository, i.e., androgen receptor bioactivity (agonists, antagonists, and binders, respectively) and membrane permeability (highest membrane permeability and the intrinsic permeability), we demonstrate that predictions erroneously tagged as reliable (AD prediction errors) overwhelmingly correspond to instances in subspaces (cohorts) with the highest prediction error rates, highlighting the inhomogeneity of the AD space. In this sense, we call for more stringent AD analysis guidelines which require the incorporation of model error analysis schemes, to provide critical insight on the reliability of underlying AD algorithms. Additionally, any selected AD method should be rigorously validated to demonstrate its suitability for the model space over which it is applied. These steps will ultimately contribute to more accurate estimations of the reliability of model predictions. Finally, error analysis may also be useful in "rational" model refinement in that data expansion efforts and model retraining are focused on cohorts with the highest error rates.
(© 2024. The Author(s), under exclusive licence to Springer Nature Switzerland AG.)
Databáze: MEDLINE