In Need of Bias Control: Evaluating Chemical Data for Machine Learning in Structure-Based Virtual Screening
Autor: | Jochen Sieg, Florian Flachsenberg, Matthias Rarey |
---|---|
Rok vydání: | 2019 |
Předmět: |
Databases
Factual Computer science General Chemical Engineering Context (language use) Library and Information Sciences Ligands Machine learning computer.software_genre 01 natural sciences Convolutional neural network Machine Learning Set (abstract data type) Structure-Activity Relationship Bias 0103 physical sciences Control (linguistics) Retrospective Studies Structure (mathematical logic) Virtual screening Molecular Structure 010304 chemical physics Basis (linear algebra) business.industry Chemical data General Chemistry 0104 chemical sciences Computer Science Applications Molecular Docking Simulation Benchmarking 010404 medicinal & biomolecular chemistry Artificial intelligence business computer |
Zdroj: | Journal of Chemical Information and Modeling. 59:947-961 |
ISSN: | 1549-960X 1549-9596 |
DOI: | 10.1021/acs.jcim.8b00712 |
Popis: | Reports of successful applications of machine learning (ML) methods in structure-based virtual screening (SBVS) are increasing. ML methods such as convolutional neural networks show promising results and often outperform traditional methods such as empirical scoring functions in retrospective validation. However, trained ML models are often treated as black boxes and are not straightforwardly interpretable. In most cases, it is unknown which features in the data are decisive and whether a model's predictions are right for the right reason. Hence, we re-evaluated three widely used benchmark data sets in the context of ML methods and came to the conclusion that not every benchmark data set is suitable. Moreover, we demonstrate on two examples from current literature that bias is learned implicitly and unnoticed from standard benchmarks. On the basis of these results, we conclude that there is a need for eligible validation experiments and benchmark data sets suited to ML for more bias-controlled validation in ML-based SBVS. Therefore, we provide guidelines for setting up validation experiments and give a perspective on how new data sets could be generated. |
Databáze: | OpenAIRE |
Externí odkaz: |