Explainable machine learning models for predicting the acute toxicity of pesticides to sheepshead minnow (Cyprinodon variegatus).

Autor: Sun T; School of Environmental and Municipal Engineering, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China., Wei C; School of Environmental and Municipal Engineering, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China., Liu Y; School of Environmental and Municipal Engineering, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China., Ren Y; School of Environmental and Municipal Engineering, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China; Ministry of Education Engineering Research Center of Water Resource Comprehensive Utilization in Cold and Arid Regions, Lanzhou Jiaotong University, 88 Anning West Rd., Lanzhou 730070, Gansu, PR China. Electronic address: renyueying@mail.lzjtu.cn.
Jazyk: angličtina
Zdroj: The Science of the total environment [Sci Total Environ] 2024 Dec 20; Vol. 957, pp. 177399. Date of Electronic Publication: 2024 Nov 16.
DOI: 10.1016/j.scitotenv.2024.177399
Abstrakt: A quantitative structure-activity relationship (QSAR) study was conducted on 313 pesticides to predict their acute toxicity to Sheepshead minnow (Cyprinodon variegatus) by using DRAGON descriptors. Essentials accounting for a reliable model were all considered carefully, giving full consideration to the OECD (Organization for Economic Co-operation and Development) principles for QSAR acceptability in regulation during the model construction and assessment process. Nine variables were selected through the forward stepwise regression method and used as inputs to construct both linear and nonlinear models. The obtained models were validated internally and externally. Generally, machine learning-based methods, namely support vector machine (SVM), random forest (RF), and projection pursuit regression (PPR), perform better than the multiple linear regression (MLR) model. The statistical results (R 2  = 0.682-0.933, Q 2 LOO  = 0.604-0.659, Q 2 F1  = 0.740-0.796, CCC = 0.861-0.882) of the developed models show that they are robust, reliable, reproducible, accurate and predictive. Comparatively, the RF model performs best, giving predictive correlation coefficient Q 2 of 0.814, root mean squared error (RMSE) of 0.658 and mean absolute error (MAE) of 0.534 for the test set, respectively. The RF model (as well as SVM and PPR models) was visualized and explained by using the SHapley Additive explanation (SHAP) analysis to enhance its transparency and credibility. In addition, the applicability domain (AD) range of the RF model was characterized by the Williams plot and the tree manifold approximation and projection (TMAP) technology was utilized to illustrate similarity and diversity of the entire data space, to assist in the analysis of the outliers. Activity cliff detection was investigated by using Arithmetic Residuals in K-groups Analysis (ARKA) descriptors. It was found that none of the pesticides was identified as an activity cliff in the training set or a potential prediction cliff in the test set. Therefore, the RF model fulfills each OECD principle in regulation for QSAR models. The research in this work will aid in the in silico QSAR prediction of the acute toxicity to Sheepshead minnow (Cyprinodon variegatus) for untested and new toxic pesticides and can also be extended to other studies.
Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2024 Elsevier B.V. All rights reserved.)
Databáze: MEDLINE