Chemoinformatic Classification Methods and their Applicability Domain

Autor:	Knut Baumann, Waldemar Klingspohn, Miriam Mathea
Rok vydání:	2015
Předmět:	0301 basic medicine Computer science Databases Pharmaceutical Word error rate Quantitative Structure-Activity Relationship computer.software_genre 01 natural sciences Novelty detection 03 medical and health sciences Structural Biology Molecular descriptor Drug Discovery Categorical variable Molecular Structure business.industry Organic Chemistry Pattern recognition 0104 chemical sciences Computer Science Applications 010404 medicinal & biomolecular chemistry 030104 developmental biology Models Chemical Pharmaceutical Preparations Cheminformatics Outlier Molecular Medicine Data mining Artificial intelligence business computer Classifier (UML) Algorithms Applicability domain
Zdroj:	Molecular informatics. 35(5)
ISSN:	1868-1751
Popis:	Classification rules are often used in chemoinformatics to predict categorical properties of drug candidates related to bioactivity from explanatory variables, which encode the respective molecular structures (i.e. molecular descriptors). To avoid predictions with an unduly large error probability, the domain the classifier is applied to should be restricted to the domain covered by the training set objects. This latter domain is commonly referred to as applicability domain in chemoinformatics. Conceptually, the applicability domain defines the region in space where the "normal" objects are located. Defining the border of the applicability domain may then be viewed as detecting anomalous or novel objects or as detecting outliers. Currently two different types of measures are in use. The first one defines the applicability domain solely in terms of the molecular descriptor space, which is referred to as novelty detection. The second type defines the applicability domain in terms of the expected reliability of the predictions which is referred to as confidence estimation. Both types are systematically differentiated here and the most popular measures are reviewed. It will be shown that all common chemoinformatic classifiers have built-in confidence scores. Since confidence estimation uses information of the class labels for computing the confidence scores, it is expected to be more efficient in reducing the error rate than novelty detection, which solely uses the information of the explanatory variables.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::5fc80004c11c4c20368fff70f05bb798 https://pubmed.ncbi.nlm.nih.gov/27492083 Zobrazit plný text záznamu Plný text