Exact and approximate Boolean matrix decomposition with column-use condition

Autor:	Tsunehiko Kameda, Yuan Sun, Shiwei Ye, Yi Sun
Rok vydání:	2016
Předmět:	Discrete mathematics Applied Mathematics Operator (physics) 0102 computer and information sciences 02 engineering and technology 01 natural sciences Measure (mathematics) Computer Science Applications Matrix decomposition Combinatorics Matrix (mathematics) Computational Theory and Mathematics 010201 computation theory & mathematics 020204 information systems Modeling and Simulation 0202 electrical engineering electronic engineering information engineering Ideal (order theory) Logical matrix Element (category theory) Heuristics Information Systems Mathematics
Zdroj:	International Journal of Data Science and Analytics. 1:199-214
ISSN:	2364-4168 2364-415X
DOI:	10.1007/s41060-016-0012-3
Popis:	An arbitrary $$m\times n$$ Boolean matrix M can be decomposed exactly as $$M =U\circ V$$ , where U (resp. V) is an $$m\times k$$ (resp. $$k\times n$$ ) Boolean matrix and $$\circ $$ denotes the Boolean matrix multiplication operator. The minimum k is called the Boolean rank of M, and it is known to be NP-hard to find it. With the interpretability issue in data mining applications in mind, we impose the column-use condition that the columns of U form a subset of the columns of the given M, and employ commonly used heuristics to find as small a k as possible.To this end, we first derive an exact closed-form formula, $$J=\overline{\overline{M}^\mathrm{T}\circ M}$$ , such that $$M =M\circ J^\mathrm{T}$$ holds, where J is maximal in the sense that if any 0 element in J is changed to a 1; then, this equality no longer holds. We measure the performance (in minimizing k) of our algorithms on several real benchmark datasets. The results demonstrate that one of our proposed algorithms performs as well or better on all but one of them than other representative heuristic algorithms, which do not impose the column-use condition and thus theoretically should find a smaller k.Boolean matrix decomposition with the column-use condition has wide applications. In educational databases, for example, the “ideal item response matrix” R, the “knowledge state matrix” A, and the “Q-matrix” Q play important roles. As they are related exactly by $$\overline{R}=\overline{A}\circ Q^\mathrm{T}$$ , given R, we can find A and Q with a small number (k) of interpretable “knowledge states,” using our heuristics.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::1ffa9c4c66c9285ee019ea2ce3c9a608 https://doi.org/10.1007/s41060-016-0012-3 Zobrazit plný text záznamu Full text from SpringerLink