Overview of data preprocessing for machine learning applications in human microbiome research.
Autor: | Ibrahimi E; Department of Biology, Faculty of Natural Sciences, University of Tirana, Tirana, Albania., Lopes MB; Department of Mathematics, Center for Mathematics and Applications (NOVA Math), NOVA School of Science and Technology, Caparica, Portugal.; UNIDEMI, Department of Mechanical and Industrial Engineering, NOVA School of Science and Technology, Caparica, Portugal., Dhamo X; Department of Applied Mathematics, Faculty of Natural Sciences, University of Tirana, Tirana, Albania., Simeon A; BioSense Institute, University of Novi Sad, Novi Sad, Serbia., Shigdel R; Department of Clinical Science, University of Bergen, Bergen, Norway., Hron K; Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University Olomouc, Olomouc, Czechia., Stres B; Department of Catalysis and Chemical Reaction Engineering, National Institute of Chemistry, Ljubljana, Slovenia.; Faculty of Civil and Geodetic Engineering, Institute of Sanitary Engineering, Ljubljana, Slovenia.; Department of Automation, Biocybernetics and Robotics, Jožef Stefan Institute, Ljubljana, Slovenia.; Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia., D'Elia D; Department of Biomedical Sciences, National Research Council, Institute for Biomedical Technologies, Bari, Italy., Berland M; INRAE, MetaGenoPolis, Université Paris-Saclay, Jouy-en-Josas, France., Marcos-Zambrano LJ; Computational Biology Group, Precision Nutrition and Cancer Research Program, IMDEA Food Institute, Madrid, Spain. |
---|---|
Jazyk: | angličtina |
Zdroj: | Frontiers in microbiology [Front Microbiol] 2023 Oct 05; Vol. 14, pp. 1250909. Date of Electronic Publication: 2023 Oct 05 (Print Publication: 2023). |
DOI: | 10.3389/fmicb.2023.1250909 |
Abstrakt: | Although metagenomic sequencing is now the preferred technique to study microbiome-host interactions, analyzing and interpreting microbiome sequencing data presents challenges primarily attributed to the statistical specificities of the data (e.g., sparse, over-dispersed, compositional, inter-variable dependency). This mini review explores preprocessing and transformation methods applied in recent human microbiome studies to address microbiome data analysis challenges. Our results indicate a limited adoption of transformation methods targeting the statistical characteristics of microbiome sequencing data. Instead, there is a prevalent usage of relative and normalization-based transformations that do not specifically account for the specific attributes of microbiome data. The information on preprocessing and transformations applied to the data before analysis was incomplete or missing in many publications, leading to reproducibility concerns, comparability issues, and questionable results. We hope this mini review will provide researchers and newcomers to the field of human microbiome research with an up-to-date point of reference for various data transformation tools and assist them in choosing the most suitable transformation method based on their research questions, objectives, and data characteristics. Competing Interests: The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. (Copyright © 2023 Ibrahimi, Lopes, Dhamo, Simeon, Shigdel, Hron, Stres, D’Elia, Berland and Marcos-Zambrano.) |
Databáze: | MEDLINE |
Externí odkaz: |