Large-scale simultaneous hypothesis testing in monitoring carbon content from French soil database - A semi-parametric mixture approach
Autor: | Dominique Arrouays, Didier Chauveau, Thomas G. Orton, Nicolas Saby, Christian Walter, Blandine Lemercier |
---|---|
Přispěvatelé: | Mathématiques - Analyse, Probabilités, Modélisation - Orléans (MAPMO), Centre National de la Recherche Scientifique (CNRS)-Université d'Orléans (UO), InfoSol (InfoSol), Institut National de la Recherche Agronomique (INRA), Sol Agro et hydrosystème Spatialisation (SAS), Institut National de la Recherche Agronomique (INRA)-AGROCAMPUS OUEST, Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro)-Institut national d'enseignement supérieur pour l'agriculture, l'alimentation et l'environnement (Institut Agro), Université d'Orléans (UO)-Centre National de la Recherche Scientifique (CNRS), Unité INFOSOL (ORLEANS INFOSOL) |
Jazyk: | angličtina |
Rok vydání: | 2014 |
Předmět: |
False discovery rate
Soil test Soil Science EM algorithms 01 natural sciences FDR 010104 statistics & probability Statistics Econometrics 0101 mathematics Parametric statistics Statistical hypothesis testing Mathematics [STAT.AP]Statistics [stat]/Applications [stat.AP] semi-parametric mixtures Carbon content 04 agricultural and veterinary sciences finite mixture Mixture model Semiparametric model 13. Climate action soil monitoring Multiple comparisons problem 040103 agronomy & agriculture 0401 agriculture forestry and fisheries France Null hypothesis [STAT.ME]Statistics [stat]/Methodology [stat.ME] |
Zdroj: | Geoderma Geoderma, Elsevier, 2014, 219-220, pp.117-124. ⟨10.1016/j.geoderma.2013.12.016⟩ |
ISSN: | 0016-7061 1872-6259 |
DOI: | 10.1016/j.geoderma.2013.12.016⟩ |
Popis: | International audience; Investigating the information of the French National Soil Tests database for soil monitoring produces multiple hypothesis testing problems with hundreds or thousands of test responses to consider simultaneously. A largely used concept of error control in such multiple testing is the expected proportion of falsely rejected hypotheses, or False Discovery Rate (FDR). A related notion of local FDR (lFDR) can be appropriately represented by considering that the observed p-values come from a two-components mixture model where the component corresponding to the null hypothesis is known. In this work, we explore different solutions for FDR estimation. In particular, we introduce a specific version of a semi-parametric Expectation-Maximization (EM) algorithm for lFDR estimation, and compare it to classical lFDR estimation using parametric mixtures, and conventional FDR approaches. The performances of the different models for estimating the FDR and related criteria are first illustrated on the results of simulated multiple comparison tests. These approaches are then applied to soil carbon content monitoring on our database. The results show that not taking into account the FDR estimation can lead to over-estimation of the number of cantons (locations) subject to a significant change. However, we have detected large numbers of significant changes in the database that occured during the time period of this study. Globally, losses in organic carbon are observed in Northern France, along the Atlantic coastal regions, and to a less extend for the data collected over the North-Eastern regions. The OC increases are more scattered over the territory. We also use the data to estimate the minimum number of samples needed at each period to detect a given change. |
Databáze: | OpenAIRE |
Externí odkaz: |