A Multi-Label Learning Framework for Predicting Chemical Classes and Biological Activities of Natural Products from Biosynthetic Gene Clusters.

Autor: Mei, Suyu
Předmět:
Zdroj: Journal of Chemical Ecology; Dec2023, Vol. 49 Issue 11/12, p681-695, 15p
Abstrakt: Natural products (NP) or secondary metabolites, as a class of small chemical molecules that are naturally synthesized by chromosomally clustered biosynthesis genes (also called biosynthetic gene clusters, BGCs) encoded enzymes or enzyme complexes, mediates the bioecological interactions between host and microbiota and provides a natural reservoir for screening drug-like therapeutic pharmaceuticals. In this work, we propose a multi-label learning framework to functionally annotate natural products or secondary metabolites solely from their catalytical biosynthetic gene clusters without experimentally conducting NP structural resolutions. All chemical classes and bioactivities constitute the label space, and the sequence domains of biosynthetic gene clusters that catalyse the biosynthesis of natural products constitute the feature space. In this multi-label learning framework, a joint representation of features (BGCs domains) and labels (natural products annotations) is efficiently learnt in an integral and low-dimensional space to accurately define the inter-class boundaries and scale to the learning problem of many imbalanced labels. Computational results on experimental data show that the proposed framework achieves satisfactory multi-label learning performance, and the learnt patterns of BGCs domains are transferrable across bacteria, or even across kingdom, for instance, from bacteria to Arabidopsis thaliana. Lastly, take Arabidopsis thaliana and its rhizosphere microbiome for example, we propose a pipeline combining existing BGCs identification tools and this proposed framework to find and functionally annotate novel natural products for downstream bioecological studies in terms of plant-microbiota-soil interactions and plant environmental adaption. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index