Exact and approximate Boolean matrix decomposition with column-use condition

Autor: Tsunehiko Kameda, Yuan Sun, Shiwei Ye, Yi Sun
Rok vydání: 2016
Předmět:
Zdroj: International Journal of Data Science and Analytics. 1:199-214
ISSN: 2364-4168
2364-415X
DOI: 10.1007/s41060-016-0012-3
Popis: An arbitrary $$m\times n$$ Boolean matrix M can be decomposed exactly as $$M =U\circ V$$ , where U (resp. V) is an $$m\times k$$ (resp. $$k\times n$$ ) Boolean matrix and $$\circ $$ denotes the Boolean matrix multiplication operator. The minimum k is called the Boolean rank of M, and it is known to be NP-hard to find it. With the interpretability issue in data mining applications in mind, we impose the column-use condition that the columns of U form a subset of the columns of the given M, and employ commonly used heuristics to find as small a k as possible.To this end, we first derive an exact closed-form formula, $$J=\overline{\overline{M}^\mathrm{T}\circ M}$$ , such that $$M =M\circ J^\mathrm{T}$$ holds, where J is maximal in the sense that if any 0 element in J is changed to a 1; then, this equality no longer holds. We measure the performance (in minimizing k) of our algorithms on several real benchmark datasets. The results demonstrate that one of our proposed algorithms performs as well or better on all but one of them than other representative heuristic algorithms, which do not impose the column-use condition and thus theoretically should find a smaller k.Boolean matrix decomposition with the column-use condition has wide applications. In educational databases, for example, the “ideal item response matrix” R, the “knowledge state matrix” A, and the “Q-matrix” Q play important roles. As they are related exactly by $$\overline{R}=\overline{A}\circ Q^\mathrm{T}$$ , given R, we can find A and Q with a small number (k) of interpretable “knowledge states,” using our heuristics.
Databáze: OpenAIRE