Linking in silico MS/MS spectra with chemistry data to improve identification of unknowns.

Autor: McEachran AD; Oak Ridge Institute for Science and Education (ORISE) Research Participation Program, United States Environmental Protection Agency, 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA. admceachran@gmail.com.; National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA. admceachran@gmail.com., Balabin I; CSRA Inc., 109 T.W. Alexander Drive, Research Triangle Park, Durham, NC, 27711, USA., Cathey T; GDIT, 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA., Transue TR; GDIT, 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA., Al-Ghoul H; Oak Ridge Associated Universities (ORAU), 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA., Grulke C; National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA., Sobus JR; National Exposure Research Laboratory, Office of Research and Development, U.S. Environmental Protection Agency, 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA., Williams AJ; National Center for Computational Toxicology, Office of Research and Development, U.S. Environmental Protection Agency, 109 T.W. Alexander Dr., Research Triangle Park, Durham, NC, 27711, USA. Williams.antony@epa.gov.
Jazyk: angličtina
Zdroj: Scientific data [Sci Data] 2019 Aug 02; Vol. 6 (1), pp. 141. Date of Electronic Publication: 2019 Aug 02.
DOI: 10.1038/s41597-019-0145-z
Abstrakt: Confident identification of unknown chemicals in high resolution mass spectrometry (HRMS) screening studies requires cohesive workflows and complementary data, tools, and software. Chemistry databases, screening libraries, and chemical metadata have become fixtures in identification workflows. To increase confidence in compound identifications, the use of structural fragmentation data collected via tandem mass spectrometry (MS/MS or MS 2 ) is vital. However, the availability of empirically collected MS/MS data for identification of unknowns is limited. Researchers have therefore turned to in silico generation of MS/MS data for use in HRMS-based screening studies. This paper describes the generation en masse of predicted MS/MS spectra for the entirety of the US EPA's DSSTox database using competitive fragmentation modelling and a freely available open source tool, CFM-ID. The generated dataset comprises predicted MS/MS spectra for ~700,000 structures, and mappings between predicted spectra, structures, associated substances, and chemical metadata. Together, these resources facilitate improved compound identifications in HRMS screening studies. These data are accessible via an SQL database, a comma-separated export file (.csv), and EPA's CompTox Chemicals Dashboard.
Databáze: MEDLINE