Prediction of pesticide acute toxicity using two-dimensional chemical descriptors and target species classification.

Autor: Martin TM; a US Environmental Protection Agency, Office of Research and Development, Sustainable Technology Division , Cincinnati , USA., Lilavois CR; b US Environmental Protection Agency, Office of Research and Development Gulf Ecology Division , Gulf Breeze , USA., Barron MG; b US Environmental Protection Agency, Office of Research and Development Gulf Ecology Division , Gulf Breeze , USA.
Jazyk: angličtina
Zdroj: SAR and QSAR in environmental research [SAR QSAR Environ Res] 2017 Jun; Vol. 28 (6), pp. 525-539. Date of Electronic Publication: 2017 Jul 13.
DOI: 10.1080/1062936X.2017.1343204
Abstrakt: Previous modelling of the median lethal dose (oral rat LD 50 ) has indicated that local class-based models yield better correlations than global models. We evaluated the hypothesis that dividing the dataset by pesticidal mechanisms would improve prediction accuracy. A linear discriminant analysis (LDA) based-approach was utilized to assign indicators such as the pesticide target species, mode of action, or target species - mode of action combination. LDA models were able to predict these indicators with about 87% accuracy. Toxicity is predicted utilizing the QSAR model fit to chemicals with that indicator. Toxicity was also predicted using a global hierarchical clustering (HC) approach which divides data set into clusters based on molecular similarity. At a comparable prediction coverage (~94%), the global HC method yielded slightly higher prediction accuracy (r 2 = 0.50) than the LDA method (r 2 ~ 0.47). A single model fit to the entire training set yielded the poorest results (r 2 = 0.38), indicating that there is an advantage to clustering the dataset to predict acute toxicity. Finally, this study shows that whilst dividing the training set into subsets (i.e. clusters) improves prediction accuracy, it may not matter which method (expert based or purely machine learning) is used to divide the dataset into subsets.
Databáze: MEDLINE