On Combining Recursive Partitioning and Simulated Annealing To Detect Groups of Biologically Active Compounds

Autor: Paul E. Blower, Jeffrey Bjoraker, Michael A. Fligner, Joseph S. Verducci
Rok vydání: 2002
Předmět:
Zdroj: Journal of Chemical Information and Computer Sciences. 42:393-404
ISSN: 0095-2338
DOI: 10.1021/ci0101049
Popis: Statistical data mining methods have proven to be powerful tools for investigating correlations between molecular structure and biological activity. Recursive partitioning (RP), in particular, offers several advantages in mining large, diverse data sets resulting from high throughput screening. When used with binary molecular descriptors, the standard implementation of RP splits on single descriptors. We use simulated annealing (SA) to find combinations of molecular descriptors whose simultaneous presence best separates off the most active, chemically similar group of compounds. The search is incorporated into a recursive partitioning design to produce a regression tree for biological activity on the space of structural fingerprints. Each node is characterized by a specific combination of structural features, and the terminal nodes with high average activities correspond, roughly, to different classes of compounds. Using LeadScope structural features as descriptors to mine a database from the National Cancer Institute, the merging of RP and SA consistently identifies structurally homogeneous classes of highly potent anticancer agents.
Databáze: OpenAIRE