A Pre-Screening Approach for Faster Bayesian Network Structure Learning

Autor:	Rahier, Thibaud, Marié, Sylvain, Forbes, Florence
Přispěvatelé:	Criteo AI Lab, Criteo [Paris], Schneider Electric ( SE), Modèles statistiques bayésiens et des valeurs extrêmes pour données structurées et de grande dimension (STATIFY), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)
Jazyk:	angličtina
Rok vydání:	2022
Předmět:	Structure learning Bayesian networks Information theory Determinism [MATH.MATH-ST]Mathematics [math]/Statistics [math.ST] Screening Conditional entropy Functional relations
Zdroj:	ECML-PKDD 2022-European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML-PKDD 2022-European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2022, Grenoble, France. pp.1-16
Popis:	International audience; Learning the structure of Bayesian networks from data is a NP-Hard problem that involves optimization over a super-exponential sized space. Still, in many real-life datasets a number of the arcs contained in the final structure correspond to strongly related pairs of variables and can be identified efficiently with information-theoretic metrics. In this work, we propose a meta-algorithm to accelerate any existing Bayesian network structure learning method. It contains an additional arc pre-screening step allowing to narrow the structure learning task down to a subset of the original variables, thus reducing the overall problem size. We conduct extensive experiments on both public benchmarks and private industrial datasets, showing that this approach enables a significant decrease in computational time and graph complexity for little to no decrease in performance score.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=od_______165::24bb505197fe742100f3582be98494f2 https://hal.science/hal-03873684v2/document Zobrazit plný text záznamu