Molecular function recognition by supervised projection pursuit machine learning
Autor: | Donald J. Jacobs, Chris S. Avery, John D. Patterson, Tyler Grear |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: |
0301 basic medicine
Multivariate statistics Mathematics and computing Computer science Science Feature extraction Biophysics Feature selection Context (language use) Machine learning computer.software_genre 01 natural sciences Article 03 medical and health sciences 0103 physical sciences Cluster analysis Multidisciplinary 010304 chemical physics Drug discovery Data stream mining business.industry Computational biology and bioinformatics 030104 developmental biology Recurrent neural network Categorization Projection pursuit Medicine Artificial intelligence business computer |
Zdroj: | Scientific Reports, Vol 11, Iss 1, Pp 1-15 (2021) Scientific Reports |
ISSN: | 2045-2322 |
Popis: | Identifying mechanisms that control molecular function is a significant challenge in pharmaceutical science and molecular engineering. Here, we present a novel projection pursuit recurrent neural network to identify functional mechanisms in the context of iterative supervised machine learning for discovery-based design optimization. Molecular function recognition is achieved by pairing experiments that categorize systems with digital twin molecular dynamics simulations to generate working hypotheses. Feature extraction decomposes emergent properties of a system into a complete set of basis vectors. Feature selection requires signal-to-noise, statistical significance, and clustering quality to concurrently surpass acceptance levels. Formulated as a multivariate description of differences and similarities between systems, the data-driven working hypothesis is refined by analyzing new systems prioritized by a discovery-likelihood. Utility and generality are demonstrated on several benchmarks, including the elucidation of antibiotic resistance in TEM-52 beta-lactamase. The software is freely available, enabling turnkey analysis of massive data streams found in computational biology and material science. |
Databáze: | OpenAIRE |
Externí odkaz: |