The METLIN small molecule dataset for machine learning-based retention time prediction
Autor: | Emily I. Chen, J. Rafael Montenegro-Burke, Aries E. Aisporna, Elizabeth Billings, H. Paul Benton, Winnie Uritboonthai, Gary Siuzdak, Xavier Domingo-Almenara, Carlos Guijas |
---|---|
Rok vydání: | 2019 |
Předmět: |
0301 basic medicine
Time Factors Computer science Science education Datasets as Topic General Physics and Astronomy Machine learning computer.software_genre 01 natural sciences Article General Biochemistry Genetics and Molecular Biology Small Molecule Libraries 03 medical and health sciences Deep Learning METLIN Chromatography Reverse-Phase Multidisciplinary Mass spectrometry business.industry Extramural Cheminformatics Deep learning 010401 analytical chemistry Experimental data Scientific data General Chemistry Small molecule 0104 chemical sciences 030104 developmental biology Models Chemical Artificial intelligence business Retention time computer |
Zdroj: | Nature Communications, Vol 10, Iss 1, Pp 1-9 (2019) Nature Communications |
ISSN: | 2041-1723 |
Popis: | Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document}% of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction. The use of machine learning for identifying small molecules through their retention time’s predictions has been challenging so far. Here the authors combine a large database of liquid chromatography retention time with a deep learning approach to enable accurate metabolites’s identification. |
Databáze: | OpenAIRE |
Externí odkaz: |