Comparing Fingerprints for Ligand-Based Virtual Screening: A Fast and Scalable Approach for Unbiased Evaluation
Autor: | Lewis J. Martin, Michael T. Bowen |
---|---|
Rok vydání: | 2020 |
Předmět: |
Virtual screening
Computer science General Chemical Engineering Fingerprint (computing) General Chemistry Replicate Library and Information Sciences Ligands computer.software_genre Computer Science Applications Identification (information) Scalability Code (cryptography) Embedding Data mining Cluster analysis computer |
Zdroj: | Journal of Chemical Information and Modeling. 60:4536-4545 |
ISSN: | 1549-960X 1549-9596 |
Popis: | Ligand-based virtual screening is a useful tool for drug and probe discovery due to its high accessibility and scalability. The recent identification of bias in many data sets that were used in performance evaluation, quantified by the asymmetric validation embedding (AVE) score, has prompted the reanalysis of models to determine which performs best. Based on the understanding that ligand data are made up of blocks of highly correlated instances, we introduce a technique that quickly generates splits with AVE distributed close to zero using a combination of clustering and removal of the most biased data. We used our technique to compare the performance of the Morgan and CATS fingerprints and show that, after debiasing, the implementation of the CATS fingerprint performs significantly better. The code to replicate these results and perform low-bias splits is available at https://github.com/ljmartin/fp_low_ave. |
Databáze: | OpenAIRE |
Externí odkaz: |