Efficiently mining frequent itemsets applied for textual aggregation

Autor: Mustapha Bouakkaz, Youcef Ouinten, Philippe Fournier-Viger, Sabine Loudcher
Přispěvatelé: Université Amar Telidji - Laghouat, Entrepôts, Représentation et Ingénierie des Connaissances (ERIC), Université Lumière - Lyon 2 (UL2)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Université de Moncton
Jazyk: angličtina
Rok vydání: 2017
Předmět:
Zdroj: Applied Intelligence
Applied Intelligence, Springer Verlag (Germany), 2017, ⟨10.1007/s10489-017-1050-9⟩
ISSN: 0924-669X
1573-7497
DOI: 10.1007/s10489-017-1050-9⟩
Popis: International audience; Abstract Text mining approaches are commonly used to discover relevant information and relationships in hugeamounts of text data. The term data mining refers to methods for analyzing data with the objective of finding patternsthat aggregate the main properties of the data. The merger between the data mining approaches and on-line analyticalprocessing (OLAP) tools allows us to refine techniques used in textual aggregation. In this paper, we propose a novel aggregation function for textual data based on the discovery of frequent closed patterns in a generated documents/keywords matrix. Our contribution aims at using a data mining technique, mainly a closed pattern mining algorithm, to aggregate keywords. An experimental study on areal corpus of more than 700 scientific papers collected on Microsoft Academic Search shows that the proposed algorithm largely outperforms four state-of-the-art textual aggregation methods in terms of recall, precision, F-measure and runtime.
Databáze: OpenAIRE