A systematic DNN-based QSPR modeling methodology for rapid and reliable prediction on flashpoints of chemicals

Autor: Zihao Wang, Mario R. Eden, Saimeng Jin, Weifeng Shen, Yang Su, Huaqiang Wen, Jingzheng Ren
Rok vydání: 2021
Předmět:
DOI: 10.22541/au.162206662.29993062/v1
Popis: Quantitative structure-property relationship (QSPR) studies based on deep neural networks (DNN) are receiving increasing attention due to their excellent performances. A systematic methodology coupling multiple machine learning technologies is proposed to solve vital problems including applicability domain and prediction uncertainty in DNN-based QSPRs. Key features are rapidly extracted from plentiful but chaotic descriptors by principal component analysis (PCA) and kernel PCA. Then, a detailed applicability domain (AD) is defined by K-means algorithm to avoid unreliable predictions and discover its potential impact on uncertainty. Moreover, prediction uncertainty is analyzed with dropout-embedded DNN by thousands of independent tests to assess the reliability of predictions. The prediction of flashpoint temperature is employed as a case study demonstrating that the model accuracy is remarkably improved comparing with the referenced model. More importantly, the proposed methodology breaks through difficulties in analyzing the uncertainty of DNN-based QSPRs and presents an AD correlated with the uncertainty.
Databáze: OpenAIRE