A Pre-Screening Approach for Faster Bayesian Network Structure Learning
Autor: | Rahier, Thibaud, Marié, Sylvain, Forbes, Florence |
---|---|
Přispěvatelé: | Criteo AI Lab, Criteo [Paris], Schneider Electric ( SE), Modèles statistiques bayésiens et des valeurs extrêmes pour données structurées et de grande dimension (STATIFY), Inria Grenoble - Rhône-Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP ), Université Grenoble Alpes (UGA) |
Jazyk: | angličtina |
Rok vydání: | 2022 |
Předmět: | |
Zdroj: | ECML-PKDD 2022-European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases ECML-PKDD 2022-European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Sep 2022, Grenoble, France. pp.1-16 |
Popis: | International audience; Learning the structure of Bayesian networks from data is a NP-Hard problem that involves optimization over a super-exponential sized space. Still, in many real-life datasets a number of the arcs contained in the final structure correspond to strongly related pairs of variables and can be identified efficiently with information-theoretic metrics. In this work, we propose a meta-algorithm to accelerate any existing Bayesian network structure learning method. It contains an additional arc pre-screening step allowing to narrow the structure learning task down to a subset of the original variables, thus reducing the overall problem size. We conduct extensive experiments on both public benchmarks and private industrial datasets, showing that this approach enables a significant decrease in computational time and graph complexity for little to no decrease in performance score. |
Databáze: | OpenAIRE |
Externí odkaz: |