Predicting The Pathway Involvement Of Metabolites Based on Combined Metabolite and Pathway Features.

Autor: Huckvale ED; Markey Cancer Center, University of Kentucky, Lexington, KY 40506, USA., Moseley HNB; Markey Cancer Center, University of Kentucky, Lexington, KY 40506, USA.; Superfund Research Center, University of Kentucky, Lexington, KY 40506, USA.; Department of Toxicology and Cancer Biology, University of Kentucky, Lexington, KY 40536, USA.; Department of Molecular and Cellular Biochemistry, University of Kentucky, Lexington, KY 40506, USA.; Institute for Biomedical Informatics, University of Kentucky, Lexington, KY 40506, USA.
Jazyk: angličtina
Zdroj: BioRxiv : the preprint server for biology [bioRxiv] 2024 Apr 02. Date of Electronic Publication: 2024 Apr 02.
DOI: 10.1101/2024.04.01.587582
Abstrakt: A major limitation of most metabolomics datasets is the sparsity of pathway annotations of detected metabolites. It is common for less than half of identified metabolites in these datasets to have known metabolic pathway involvement. Trying to address this limitation, machine learning models have been developed to predict the association of a metabolite with a "pathway category", as defined by one of the metabolic knowledgebases like the Kyoto Encyclopedia of Gene and Genomes. Most of these models are implemented as a single binary classifier specific to a single pathway category, requiring a set of binary classifiers for generating predictions for multiple pathway categories. This single binary classifier per pathway category approach both multiplies the computational resources necessary for training while diluting the positive entries in gold standard datasets needed for training. To address the limitations of training separate classifiers, we propose a generalization of the metabolic pathway prediction problem using a single binary classifier that accepts both features representing a metabolite and features representing a generic pathway category and then predicts whether the given metabolite is involved in the corresponding pathway category. We demonstrate that this metabolite-pathway features-pair approach is not only competitive with the combined performance of training separate binary classifiers, but it outperforms the previous benchmark models.
Competing Interests: Conflicts of Interest: The authors declare no conflicts of interest.
Databáze: MEDLINE