Efficiently mining frequent itemsets applied for textual aggregation
Autor: | Mustapha Bouakkaz, Youcef Ouinten, Philippe Fournier-Viger, Sabine Loudcher |
---|---|
Přispěvatelé: | Université Amar Telidji - Laghouat, Entrepôts, Représentation et Ingénierie des Connaissances (ERIC), Université Lumière - Lyon 2 (UL2)-Université Claude Bernard Lyon 1 (UCBL), Université de Lyon-Université de Lyon, Université de Moncton |
Jazyk: | angličtina |
Rok vydání: | 2017 |
Předmět: |
Information retrieval
OLAP [SHS.STAT]Humanities and Social Sciences/Methods and statistics business.industry Computer science Data stream mining media_common.quotation_subject Online analytical processing Aggregate (data warehouse) Closed keywords 02 engineering and technology computer.software_genre Term (time) Textual aggregation Text mining Artificial Intelligence 020204 information systems 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing Data mining business Function (engineering) computer media_common |
Zdroj: | Applied Intelligence Applied Intelligence, Springer Verlag (Germany), 2017, ⟨10.1007/s10489-017-1050-9⟩ |
ISSN: | 0924-669X 1573-7497 |
DOI: | 10.1007/s10489-017-1050-9⟩ |
Popis: | International audience; Abstract Text mining approaches are commonly used to discover relevant information and relationships in hugeamounts of text data. The term data mining refers to methods for analyzing data with the objective of finding patternsthat aggregate the main properties of the data. The merger between the data mining approaches and on-line analyticalprocessing (OLAP) tools allows us to refine techniques used in textual aggregation. In this paper, we propose a novel aggregation function for textual data based on the discovery of frequent closed patterns in a generated documents/keywords matrix. Our contribution aims at using a data mining technique, mainly a closed pattern mining algorithm, to aggregate keywords. An experimental study on areal corpus of more than 700 scientific papers collected on Microsoft Academic Search shows that the proposed algorithm largely outperforms four state-of-the-art textual aggregation methods in terms of recall, precision, F-measure and runtime. |
Databáze: | OpenAIRE |
Externí odkaz: |