Comparison of dimensionality reduction techniques for the visualisation of chemical space in organometallic catalysis

Autor:	Mario Villares, Carla M. Saunders, Natalie Fey
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Computational chemistry Organometallic catalysis Data science Dimensionality reduction Transition metal complexes Chemistry QD1-999 Electronic computers. Computer science QA75.5-76.95
Zdroj:	Artificial Intelligence Chemistry, Vol 2, Iss 1, Pp 100055- (2024)
Druh dokumentu:	article
ISSN:	2949-7477
DOI:	10.1016/j.aichem.2024.100055
Popis:	We have used a Ligand Knowledge Base for bidentate P,P-donor ligands of potential interest to homogeneous catalysis to compare three dimensionality reduction techniques, namely Principal Component Analysis (PCA), Uniform Manifold Approximation and Projection (UMAP) and t-distributed Stochastic Neighbor Embedding (t-SNE). While our previous work on Ligand Knowledge Bases has focused on PCA, here we compare this approach with more recently-published approaches and assess the information retention, visualization, clustering and interpretability which can be achieved for each approach. We find that potential advantages of t-SNE are not realized with a database of the current size (275 entries), and that there is a degree of complementarity between PCA and UMAP. The statistics underlying PCA rely on linear relationships, making interpretation of the resulting plots comparatively straightforward. Since much of chemistry relies on linear structure-property relationships and low-dimensional visualization, the explainability and information retention achieved is attractive. UMAP proved more challenging to interpret, but achieved clear clustering which was often chemically meaningful, and it would be a useful approach for ensuring that distinct subsets of compounds are sampled in a machine-learning context. This analysis also highlighted that the tunability of catalysis achieved through ligand exchange maps well onto some areas of chemical space where closely related ligands cluster, while others represent outliers; these arise from different combinations of steric and electronic effects which chemists will find intuitive.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/01e7247ef7574b6f99c3b8ce8bd8663a Zobrazit plný text záznamu View record in DOAJ