Exploring the chemical subspace of RPLC: A data driven approach.

Autor: van Herwerden D; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands. Electronic address: d.vanherwerden@uva.nl., Nikolopoulos A; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands., Barron LP; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands; MRC Centre for Environment and Health, Environmental Research Group, School of Public Health, Faculty of Medicine, Imperial College London, London, W12 0BZ, United Kingdom., O'Brien JW; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands; Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane, QLD, 4102, Australia., Pirok BWJ; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands., Thomas KV; Queensland Alliance for Environmental Health Sciences (QAEHS), The University of Queensland, Brisbane, QLD, 4102, Australia., Samanipour S; Van 't Hoff Institute for Molecular Sciences (HIMS), University of Amsterdam, Amsterdam, 1098 XH, the Netherlands; UvA Data Science Center, University of Amsterdam, Amsterdam, 1012 WP, the Netherlands. Electronic address: s.samanipour@uva.nl.
Jazyk: angličtina
Zdroj: Analytica chimica acta [Anal Chim Acta] 2024 Aug 15; Vol. 1317, pp. 342869. Date of Electronic Publication: 2024 Jun 20.
DOI: 10.1016/j.aca.2024.342869
Abstrakt: Background: The chemical space is comprised of a vast number of possible structures, of which an unknown portion comprises the human and environmental exposome. Such samples are frequently analyzed using non-targeted analysis via liquid chromatography (LC) coupled to high-resolution mass spectrometry often employing a reversed phase (RP) column. However, prior to analysis, the contents of these samples are unknown and could be comprised of thousands of known and unknown chemical constituents. Moreover, it is unknown which part of the chemical space is sufficiently retained and eluted using RPLC.
Results: We present a generic framework that uses a data driven approach to predict whether molecules fall 'inside', 'maybe' inside, or 'outside' of the RPLC subspace. Firstly, three retention index random forest (RF) regression models were constructed that showed that molecular fingerprints are able to predict RPLC retention behavior. Secondly, these models were used to set up the dataset for building an RPLC RF classification model. The RPLC classification model was able to correctly predict whether a chemical belonged to the RPLC subspace with an accuracy of 92% for the testing set. Finally, applying this model to the 91 737 small molecules (i.e., ≤1 000 Da) in NORMAN SusDat showed that 19.1% fall 'outside' of the RPLC subspace.
Significance and Novelty: The RPLC chemical space model provides a major step towards mapping the chemical space and is able to assess whether chemicals can potentially be measured with an RPLC method (i.e., not every RPLC method) or if a different selectivity should be considered. Moreover, knowing which chemicals are outside of the RPLC subspace can assist in reducing potential candidates for library searching and avoid screening for chemicals that will not be present in RPLC data.
Competing Interests: Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
(Copyright © 2024 The Authors. Published by Elsevier B.V. All rights reserved.)
Databáze: MEDLINE