Exact and approximate Boolean matrix decomposition with column-use condition
Autor: | Tsunehiko Kameda, Yuan Sun, Shiwei Ye, Yi Sun |
---|---|
Rok vydání: | 2016 |
Předmět: |
Discrete mathematics
Applied Mathematics Operator (physics) 0102 computer and information sciences 02 engineering and technology 01 natural sciences Measure (mathematics) Computer Science Applications Matrix decomposition Combinatorics Matrix (mathematics) Computational Theory and Mathematics 010201 computation theory & mathematics 020204 information systems Modeling and Simulation 0202 electrical engineering electronic engineering information engineering Ideal (order theory) Logical matrix Element (category theory) Heuristics Information Systems Mathematics |
Zdroj: | International Journal of Data Science and Analytics. 1:199-214 |
ISSN: | 2364-4168 2364-415X |
DOI: | 10.1007/s41060-016-0012-3 |
Popis: | An arbitrary $$m\times n$$ Boolean matrix M can be decomposed exactly as $$M =U\circ V$$ , where U (resp. V) is an $$m\times k$$ (resp. $$k\times n$$ ) Boolean matrix and $$\circ $$ denotes the Boolean matrix multiplication operator. The minimum k is called the Boolean rank of M, and it is known to be NP-hard to find it. With the interpretability issue in data mining applications in mind, we impose the column-use condition that the columns of U form a subset of the columns of the given M, and employ commonly used heuristics to find as small a k as possible.To this end, we first derive an exact closed-form formula, $$J=\overline{\overline{M}^\mathrm{T}\circ M}$$ , such that $$M =M\circ J^\mathrm{T}$$ holds, where J is maximal in the sense that if any 0 element in J is changed to a 1; then, this equality no longer holds. We measure the performance (in minimizing k) of our algorithms on several real benchmark datasets. The results demonstrate that one of our proposed algorithms performs as well or better on all but one of them than other representative heuristic algorithms, which do not impose the column-use condition and thus theoretically should find a smaller k.Boolean matrix decomposition with the column-use condition has wide applications. In educational databases, for example, the “ideal item response matrix” R, the “knowledge state matrix” A, and the “Q-matrix” Q play important roles. As they are related exactly by $$\overline{R}=\overline{A}\circ Q^\mathrm{T}$$ , given R, we can find A and Q with a small number (k) of interpretable “knowledge states,” using our heuristics. |
Databáze: | OpenAIRE |
Externí odkaz: |