Behavior of Linear and Nonlinear Dimensionality Reduction for Collective Variable Identi�cation of Small Molecule Solution-Phase Reactions

Autor: Bala Krishnamoorthy, Aurora E. Clark, Nathan May, Hung M. Le, Ernesto Martinez-Baez, Ravishankar Sundararaman, Sushant Kumar
Rok vydání: 2021
Předmět:
DOI: 10.33774/chemrxiv-2021-ln16w
Popis: Identifying collective variables for chemical reactions is essential to reduce the 3$N$ dimensional energy landscape into lower dimensional basins and barriers of interest. However in condensed phase processes, the non-meaningful motions of bulk solvent often overpower the ability of dimensionality reduction methods to identify correlated motions that underpin collective variables. Yet solvent can play important indirect or direct roles in reactivity and much can be lost through treatments that remove or dampen solvent motion. This has been amply demonstrated within principal component analysis, although less is known about the behavior of nonlinear dimensionality reduction methods, e.g., UMAP, that have become more popular recently. The latter presents an interesting alternative to linear methods though often at the expense of interpretability. This work presents distance attenuated projection methods of atomic coordinates that facilitate the application of both PCA and UMAP to identify collective variables in solution, and further the specific identity of solvent molecules that participate in chemical reactions. The performance of both methods is examined in detail for two reactions where the explicit solvent plays very different roles within the collective variables. The first reaction consists of the dynamic exchange of a cation about a polyhydroxy anion that is facilitated by waters of solvation, while the second reaction consists of a nucleophilic attack of water upon ethylene to initiate cis/trans isomerization. When applied to raw data, both PCA and UMAP representations are dominated by bulk solvent motions. On the other hand, when applied to data preprocessed by our attenuated projection methods, both PCA and UMAP identify the appropriate collective variables in solution.
Databáze: OpenAIRE