Enhanced LC-MS/MS spectra matching through multitask neural networks and molecular fingerprints

Autor: Valsecchi, C, Baccolo, G, Caserta, M, Barbagallo, M, Gosetti, F, Consonni, V, Ballabio, D, Todeschini, R
Přispěvatelé: Valsecchi, C, Baccolo, G, Caserta, M, Barbagallo, M, Gosetti, F, Consonni, V, Ballabio, D, Todeschini, R
Jazyk: angličtina
Rok vydání: 2021
Předmět:
Popis: Liquid chromatography tandem mass spectrometry (LC-MS/MS) is routinely used in many clinical applications, including toxicology, drug monitoring, endocrinology, microbiology, and proteomics, thanks to its versatility and effectiveness in the determination of small molecules. The standard approach for analyzing spectral data is the spectral matching that is based on creating a library of annotated spectra against which individual spectrum can be searched for. Building an own spectral library is time-consuming and dependent on the LC-MS/MS instrumentation used, whereas freely available libraries only cover a limited number of molecules. In many cases and especially with newly synthetized substances, the spectrum could not be found in any spectral databases. In-silico fingerprints are binary vectors that, through a hashing algorithm, encode features of molecules. Due to their easy computation and instrument-independence, in-silico fingerprint databases are bigger and more updated than spectral ones. Prediction of molecular fingerprints starting from the LC-MS/MS spectra would consequently assist the match of any target compound under investigation, which would benefit from the increased dimension of fingerprint databases. In this study, we developed a multi-task neural network able to predict molecular fingerprints starting from the LC-MS/MS spectra. We initially collected and pruned around 70’000 MS spectra from available sources (MassBank of North America). For each compound, fingerprints were calculated (MACCS167 keys and Dragon ECFP of 512 bits) and then multi-task feedforward neural networks were trained to predict the binary bits of molecular fingerprints. Models were validated through specific validation protocols and demonstrated to have suitable performances in terms of predictive accuracy (>85% of bits correctly predicted). Thus, the proposed networks can represent a potential approach for the development of a reliable method to enhance matching of MS spectral data to larger molecular databases. Moreover, with respect to recent studies, the proposed model is simple and works with spectra obtained in different experimental conditions.
Databáze: OpenAIRE