Metabolite reporting in large-scale studies within different metabolomics communities: DO WE SPEAK THE SAME LANGUAGE?

Autor: Hajjar, Ghina, Benaben, David, Paulhe, Nils, Duperier, Christophe, Filangi, Olivier, Giacomoni, Franck, Comte, Blandine, Pujos-Guillot, Estelle
Přispěvatelé: Plateforme Exploration du Métabolisme (PFEM), Institut National de la Recherche Agronomique (INRA)-Université Blaise Pascal - Clermont-Ferrand 2 (UBP)-MetaboHUB-Clermont, MetaboHUB-MetaboHUB, Biologie du fruit et pathologie (BFP), Université de Bordeaux (UB)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE), Institut de Génétique, Environnement et Protection des Plantes (IGEPP), Université de Rennes (UR)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut Agro Rennes Angers, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), Hajjar, Ghina
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: Analytics 2022
Analytics 2022, Sep 2022, Nantes, France
Popis: International audience; Since the emergence of high throughput metabolomics, there has been a growing number of scientific communities performing metabolomic studies. Therefore, it has become crucial to standardize reporting and sharing of metabolites. Although minimum reporting standards for analytical practices and data processing are available, there are no established standards for metabolite reporting. In this context, our objective was to review the existing practices in terms of metabolite reporting in different scientific communities both in published results and across databases.In this context, we considered plasma metabolites reported in human large-scale studies from different communities, namely analytical chemistry, medicine and epidemiology. We focused only on metabolites reported as level 1 identification according to the Metabolomics Standard Initiative. We applied a data curation workflow on the list of annotated metabolites given by the authors. First, we performed a manual curation that included the addition of missing identifiers and the editing of some incoherent metadata. Second, we applied an automatic query algorithm in order to obtain additional information from available databases such as the compact hash code of the IUPAC International Chemical Identifier “InChIKey”. Identified metabolites were then compared between the selected studies using either the names given by the authors or the InChIKeys added after data curation. Regular inconsistencies were observed in metabolite reporting both in published results and across different databases. In the former, incoherence was observed in the metabolite information (identifiers not referring to the same isomer, metabolite name not corresponding to the molecular formula). Besides, isomers were listed with their corresponding retention times, yet without any indication of the isomers’ identity. On the other hand, cross-linking provided across databases presented some incoherent information regarding nomenclatures, optical isomerism, stereochemistry of asymmetric carbons, and molecular structure (acid/base; zwitterionic or canonical forms, molecules with a permanent charge) in addition to a mismatch between two structurally different compounds. The evaluation of metabolite reporting across different databases for instance HMDB, PubChem and ChEBI was performed with the help of the Metabolomics Semantic DataLake (MSD) team. Information was calculated from latest public versions of the aforementioned databases, under a Big Data infrastructure (Apache Spark) and Scala programming language. Based on the InChIKey, we were able to identify all incorrect metabolite matches in HMDB, PubChem and ChEBI and to categorize them into “structurally different compounds”, “optical isomerism” or “structural isomerism”.Although not yet required, the InChIKey was found to be the most suitable identifier for comparing reported metabolites between studies and across databases. It is therefore recommended either to use this identifier or to perform a deep data curation when reporting identified metabolites. This work will allow providing guidelines for a more effective and reproducible metabolomics data sharing.
Databáze: OpenAIRE