Key Concept Identification: A Comprehensive Analysis of Frequency and Topical Graph-Based Approaches

Autor:	Said Jadid Abdul Kadir, Israr Ullah, M.M. Aman, Abas Md Said
Jazyk:	angličtina
Rok vydání:	2018
Předmět:	keyphrase extraction key concept extraction information retrieval empirical analysis text mining Computer science 02 engineering and technology Machine learning computer.software_genre Frequency 0202 electrical engineering electronic engineering information engineering Concept extraction Ontology learning lcsh:T58.5-58.64 business.industry lcsh:Information technology 05 social sciences Graph based Digital library Information extraction Graph (abstract data type) 020201 artificial intelligence & image processing Artificial intelligence 0509 other social sciences Sources of error 050904 information & library sciences business computer Information Systems
Zdroj:	Information, Vol 9, Iss 5, p 128 (2018) Information; Volume 9; Issue 5; Pages: 128
ISSN:	2078-2489
Popis:	Automatic key concept extraction from text is the main challenging task in information extraction, information retrieval and digital libraries, ontology learning, and text analysis. The statistical frequency and topical graph-based ranking are the two kinds of potentially powerful and leading unsupervised approaches in this area, devised to address the problem. To utilize the potential of these approaches and improve key concept identification, a comprehensive performance analysis of these approaches on datasets from different domains is needed. The objective of the study presented in this paper is to perform a comprehensive empirical analysis of selected frequency and topical graph-based algorithms for key concept extraction on three different datasets, to identify the major sources of error in these approaches. For experimental analysis, we have selected TF-IDF, KP-Miner and TopicRank. Three major sources of error, i.e., frequency errors, syntactical errors and semantical errors, and the factors that contribute to these errors are identified. Analysis of the results reveals that performance of the selected approaches is significantly degraded by these errors. These findings can help us develop an intelligent solution for key concept extraction in the future.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_dedup___::3cfe74f394050d1b133b6753878b2083 http://www.mdpi.com/2078-2489/9/5/128 Zobrazit plný text záznamu Plný text ve formátu PDF Plný text ve formátu HTML
Nepřihlášeným uživatelům se plný text nezobrazuje	K zobrazení výsledku je třeba se přihlásit.