mb-PHENIX: Diffusion and Supervised Uniform Manifold Approximation for denoising microbiota data

Autor: Padron-Manrique Cristian, Vázquez-Jiménez Aarón, Esquivel-Hernandez Diego Armando, Martinez Lopez Yoscelina Estrella, Neri-Rosario Daniel, Sánchez-Castañeda Jean Paul, Giron-Villalobos David, Resendis-Antonio Osbaldo
Rok vydání: 2022
DOI: 10.1101/2022.06.23.497285
Popis: MotivationMicrobiota data suffers from technical noise (reflected as excess of zeros in the count matrix) and the curse of dimensionality. This complicates downstream data analysis and compromises the scientific discovery’s reliability. Data sparsity makes it difficult to obtain a well-cluster structure and distorts the abundance distributions. Currently, there is a rised need to develop new algorithms with improved capacities to reduce noise and recover missing information.ResultsWe present mb-PHENIX, an open-source algorithm developed in Python, that recovers taxa abundances from the noisy and sparse microbiota data. Our method deals with sparsity in the count matrix (in 16S microbiota and shotgun studies) by applying imputation via diffusion onto the supervisedUniform Manifold Approximation Projection(sUMAP) space. Our hybrid machine learning approach allows the user to denoise microbiota data. Thus, the differential abundance of microbes is more accurate among study groups, where abundance analysis fails.AvailabilityThe mb-PHENIX algorithm is available athttps://github.com/resendislab/mb-PHENIX. An easy-to-use implementation is available on Google Colab (see GitHub)ContactOresendis@inmegen.gob.mxSupplementary informationSupplementary data are available atBioinformaticsonline.
Databáze: OpenAIRE