A Pre-Screening Approach for Faster Bayesian Network Structure Learning

Autor: Rahier, Thibaud, Marié, Sylvain, Forbes, Florence
Přispěvatelé: Criteo AI Lab, Criteo [Paris], Schneider Electric ( SE), Modèles statistiques bayésiens et des valeurs extrêmes pour données structurées et de grande dimension (STATIFY), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)
Jazyk: angličtina
Rok vydání: 2022
Předmět:
Zdroj: ECML-PKDD 2022-European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
ECML-PKDD 2022-European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2022, Grenoble, France. pp.1-16
Popis: International audience; Learning the structure of Bayesian networks from data is a NP-Hard problem that involves optimization over a super-exponential sized space. Still, in many real-life datasets a number of the arcs contained in the final structure correspond to strongly related pairs of variables and can be identified efficiently with information-theoretic metrics. In this work, we propose a meta-algorithm to accelerate any existing Bayesian network structure learning method. It contains an additional arc pre-screening step allowing to narrow the structure learning task down to a subset of the original variables, thus reducing the overall problem size. We conduct extensive experiments on both public benchmarks and private industrial datasets, showing that this approach enables a significant decrease in computational time and graph complexity for little to no decrease in performance score.
Databáze: OpenAIRE