Simple Pattern Minimality Problems: Integer Linear Programming Formulations and Covering-Based Heuristic Solving Approaches
Autor: | Claudio Sterle, Antonio Sforza, Maurizio Boccia |
---|---|
Přispěvatelé: | Boccia, M., Sforza, A., Sterle, C. |
Rok vydání: | 2020 |
Předmět: | |
Zdroj: | INFORMS Journal on Computing. |
ISSN: | 1526-5528 1091-9856 |
DOI: | 10.1287/ijoc.2019.0940 |
Popis: | The simple pattern minimality problem (SPMP) represents a central problem in the logical analysis of data and association rules mining, and it finds applications in several fields as logic synthesis, reliability analysis, and automated reasoning. It consists of determining the minimum number of patterns explaining all the observations of a data set, that is, a Boolean logic formula that is true for all the elements of the data set and false for all the unseen observations. We refer to this problem as covering SPMP (C-SPMP), because each observation can be explained (covered) by more than one pattern. Starting from a real industrial application, we also define a new version of the problem, and we refer to it as partitioning SPMP (P-SPMP), because each observation has to be covered just once. Given a propositional formula or a truth table, C-SPMP and P-SPMP coincide exactly with the problem of determining the minimum disjunctive and minimum exclusive disjunctive normal form, respectively. Both problems are known to be NP-hard and have been generally tackled by heuristic methods. In this context, the contribution of this work is twofold. On one side, it provides two original integer linear programming formulations for the two variants of the SPMP. These formulations exploit the concept of Boolean hypercube to build a graph representation of the problems and allow to exactly solve instances with more than 1,000 observations by using an MIP solver. On the other side, two effective and fast heuristics are proposed to solve relevant size instances taken from literature (SeattleSNPs) and from the industrial database. The proposed methods do not suffer from the same dimensional drawbacks of the methods present in the literature and outperform either existing commercial and freeware logic tools or the available industrial solutions in the number of generated patterns and/or in the computational burden. |
Databáze: | OpenAIRE |
Externí odkaz: |