Log-ratio analysis of microbiome data with many zeroes is library size dependent.

Autor: Te Beest DE; Biometris, Wageningen University & Research, Wageningen, The Netherlands., Nijhuis EH; Biointeractions and Plant Health, Wageningen University & Research, Wageningen, The Netherlands., Möhlmann TWR; Laboratory of Entomology, Wageningen University & Research, Wageningen, The Netherlands., Ter Braak CJF; Biometris, Wageningen University & Research, Wageningen, The Netherlands.
Jazyk: angličtina
Zdroj: Molecular ecology resources [Mol Ecol Resour] 2021 Aug; Vol. 21 (6), pp. 1866-1874. Date of Electronic Publication: 2021 May 03.
DOI: 10.1111/1755-0998.13391
Abstrakt: Microbiome composition data collected through amplicon sequencing are count data on taxa in which the total count per sample (the library size) is an artefact of the sequencing platform, and as a result, such data are compositional. To avoid library size dependency, one common way of analysing multivariate compositional data is to perform a principal component analysis (PCA) on data transformed with the centred log-ratio, hereafter called a log-ratio PCA. Two aspects typical of amplicon sequencing data are the large differences in library size and the large number of zeroes. In this study, we show on real data and by simulation that, applied to data that combine these two aspects, log-ratio PCA is nevertheless heavily dependent on the library size. This leads to a reduction in power when testing against any explanatory variable in log-ratio redundancy analysis. If there is additionally a correlation between the library size and the explanatory variable, then the type 1 error becomes inflated. We explore putative solutions to this problem.
(© 2021 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.)
Databáze: MEDLINE