A central limit theorem for Lp transportation cost on the real line with application to fairness assessment in machine learning

Autor: Paula Gordaliza, Jean-Michel Loubes, Eustasio del Barrio
Přispěvatelé: Institut de Mathématiques de Toulouse UMR5219 (IMT), Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Institut National des Sciences Appliquées (INSA)-Université Toulouse 1 Capitole (UT1), Université Fédérale Toulouse Midi-Pyrénées-Université Fédérale Toulouse Midi-Pyrénées-Université Toulouse - Jean Jaurès (UT2J)-Université Toulouse III - Paul Sabatier (UT3), Université Fédérale Toulouse Midi-Pyrénées-Centre National de la Recherche Scientifique (CNRS), ANR-19-P3IA-0004,ANITI,Artificial and Natural Intelligence Toulouse Institute(2019), Université Toulouse Capitole (UT Capitole), Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National des Sciences Appliquées - Toulouse (INSA Toulouse), Institut National des Sciences Appliquées (INSA)-Université de Toulouse (UT)-Institut National des Sciences Appliquées (INSA)-Université Toulouse - Jean Jaurès (UT2J), Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3), Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)
Jazyk: angličtina
Rok vydání: 2019
Předmět:
Zdroj: Information and Inference
Information and Inference, Oxford University Press (OUP), 2019, 8 (4), pp.817-849. ⟨10.1093/imaiai/iaz016⟩
Information and Inference, 2019, 8 (4), pp.817-849. ⟨10.1093/imaiai/iaz016⟩
ISSN: 2049-8764
2049-8772
DOI: 10.1093/imaiai/iaz016⟩
Popis: We provide a central limit theorem for the Monge–Kantorovich distance between two empirical distributions with sizes $n$ and $m$, $\mathcal{W}_p(P_n,Q_m), \ p\geqslant 1,$ for observations on the real line. In the case $p>1$ our assumptions are sharp in terms of moments and smoothness. We prove results dealing with the choice of centring constants. We provide a consistent estimate of the asymptotic variance, which enables to build two sample tests and confidence intervals to certify the similarity between two distributions. These are then used to assess a new criterion of data set fairness in classification.
Databáze: OpenAIRE