Mining Keywords from Short Text Based on LDA-Based Hierarchical Semantic Graph Model

Autor:	Zhengtao Yu, Chen Wei, Yonghua Wen, Wang Zhenhan, Yantuan Xian
Rok vydání:	2020
Předmět:	Information Systems and Management Sorting algorithm Computer science Strategy and Management Association (object-oriented programming) 02 engineering and technology Management Science and Operations Research computer.software_genre Management Information Systems Ranking (information retrieval) Set (abstract data type) 0202 electrical engineering electronic engineering information engineering 0501 psychology and cognitive sciences Word2vec Layer (object-oriented design) 050107 human factors business.industry 05 social sciences ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Key (cryptography) Graph (abstract data type) 020201 artificial intelligence & image processing Artificial intelligence business computer Natural language processing Information Systems
Zdroj:	International Journal of Information Systems in the Service Sector. 12:76-87
ISSN:	1935-5696 1935-5688
DOI:	10.4018/ijisss.2020040106
Popis:	Extracting keywords from a text set is an important task. Most of the previous studies extract keywords from a single text. Using the key topics in the text collection, the association relationship between the topic and the topic in the cross-text, and the association relationship between the words and the words in the cross-text has not played an important role in the previous method of extracting keywords from the text collection. In order to improve the accuracy of extracting keywords from text collections, using the semantic relationship between topics and topics in texts and highlighting the semantic relationship between words and words under the key topics, this article proposes an unsupervised method for mining keywords from short text collections. In this method, a two level semantic association model is used to link the semantic relations between topics and the semantic relations between words, and extract the key words based on the combined action. First, the text is represented with LDA; the authors used word2vec to calculate the semantic association between topic and topic, and build a semantic relation graph between topics, that is the upper level graph, and use a graph ranking algorithm to calculate each topic score. In the lower layer, the semantic association between words and words is calculated by using the topic scores and the relationship between topics in the upper network allow a graph to be constructed. Using a graph sorting algorithm sorts the words in short text sets to determine the keywords. The experimental results show that the method is better for extracting keywords from the text set, especially in short articles. In the text, the important topics, the relationship between topics and the correlation between words can improve the accuracy of extracting keywords from the text set.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::d85c603b27c54d193b2e02366fc2c621 https://doi.org/10.4018/ijisss.2020040106 Zobrazit plný text záznamu