A Framework for Statistically-Sound Customer Segment Search Authors' Copy
Autor: | Amer-Yahia, Sihem, Berti-Equille, Laure, Chibah, Abdelouahab |
---|---|
Přispěvatelé: | Laboratoire d'Informatique de Grenoble (LIG), Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA), UMR 228 Espace-Dev, Espace pour le développement, Institut de Recherche pour le Développement (IRD)-Université de Perpignan Via Domitia (UPVD)-Avignon Université (AU)-Université de La Réunion (UR)-Université de Montpellier (UM)-Université de Guyane (UG)-Université des Antilles (UA), ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019), Université de Guyane (UG)-Université des Antilles (UA)-Institut de Recherche pour le Développement (IRD)-Université de Perpignan Via Domitia (UPVD)-Avignon Université (AU)-Université de La Réunion (UR)-Université de Montpellier (UM) |
Jazyk: | angličtina |
Rok vydání: | 2021 |
Předmět: | |
Zdroj: | The 8th IEEE International Conference on Data Science and Advanced Analytics The 8th IEEE International Conference on Data Science and Advanced Analytics, Oct 2021, Porto (virtual), Portugal. ⟨10.1109/DSAA53316.2021.9564199⟩ |
DOI: | 10.1109/DSAA53316.2021.9564199⟩ |
Popis: | International audience; We develop S4, a Statistically-Sound Segment Search framework that combines principled data partitioning and sound statistical testing to verify common hypotheses in retail data and return interpretable customer data segments. Our framework accommodates one-sample, two-sample, and multiple-sample testing, to provide various aggregations and comparisons of customer transactions. To control the proportion of false discoveries in multiple hypothesis testing, we enforce an FDR-controlling procedure and formulate a unified optimization problem that returns customer data segments that satisfy the test for a given significance level, maximize coverage of the input data, and are within a risk capital. We develop a greedy algorithm to explore different data partitions and test multiple hypotheses in a sound manner. Our extensive experiments on four retail data sets examine the interaction between significance, risk and coverage, and demonstrate the expressivity, usefulness, and scalability of S4 in practice. |
Databáze: | OpenAIRE |
Externí odkaz: |