FastRCA-Seq: An efficient approach for extracting hierarchies of multilevel closed partially-ordered patterns
Autor: | Cristina Nica, Adrian Groza, Victor-Petru Almăşan |
---|---|
Rok vydání: | 2020 |
Předmět: |
Hierarchy
Information Systems and Management Sequence database Computer science 02 engineering and technology computer.software_genre Management Information Systems TheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGES Artificial Intelligence 020204 information systems 0202 electrical engineering electronic engineering information engineering Formal concept analysis Benchmark (computing) Key (cryptography) 020201 artificial intelligence & image processing State (computer science) Data mining Sequential Pattern Mining computer Software Hardware_LOGICDESIGN Interpretability |
Zdroj: | Knowledge-Based Systems. 210:106533 |
ISSN: | 0950-7051 |
DOI: | 10.1016/j.knosys.2020.106533 |
Popis: | Discovering concise representations of sequential patterns in sequential data is a well-established data mining task. Recently, Nica et al. have put forward an original approach RCA-Seq for directly extracting a hierarchy of multilevel closed partially-ordered patterns (MCPO-patterns) from a sequence database within the Relational Concept Analysis (RCA) framework. RCA-Seq has been applied successfully to small ( ∼ 1 , 000 sequences) but interesting real hydro-ecological datasets. However, RCA-Seq only focuses on providing comprehensible results to the detriment of performance. To improve the performance of RCA-Seq , we propose a new approach FastRCA-Seq that stems from RCA-Seq , and whose contributions are beneficial for two fields: Formal Concept Analysis, namely the RCA extension, and sequential pattern mining. FastRCA-Seq spans two key steps: the exploration of sequential data based on RCA, and the extraction of MCPO-patterns by navigating the RCA result. Firstly, our approach introduces an effective RCA implementation based on bit-array representations, bitwise operations, parallel computing, and several new properties of RCA that may prevent expensive computations. In addition, we state the bottleneck of RCA. Secondly, FastRCA-Seq is a self-contained approach for directly and efficiently mining hierarchies of MCPO-patterns from sequential data. We assess FastRCA-Seq on various benchmark datasets, precisely Gazelle, Kosarak, and FIFA. The results show that FastRCA-Seq outperforms RCA-Seq in terms of execution time (in average ∼ 169 times faster) and memory usage (in average with ∼ 42 % less) while preserving the benefits of interpretability and usability of results by stakeholders. |
Databáze: | OpenAIRE |
Externí odkaz: |