Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words

Autor: Shaobin Huang, Li Rongsheng, Xiangke Mao, Linshan Shen
Jazyk: angličtina
Rok vydání: 2020
Předmět:
Zdroj: IEEE Access, Vol 8, Pp 117528-117538 (2020)
ISSN: 2169-3536
Popis: Automatic keywords extraction is a method that extracts words or phrases from a document which can express the main idea of the document. In this paper, we propose an unsupervised keywords extraction framework for individual documents, which improves the keywords extraction from two aspects. In the step of candidate keywords selection, we use the methods of removing the stopwords, regular matching, and length filtering to reduce the number of candidate keywords, but improve the quality. In the step of scoring words, we use word co-occurrence, semantic relationships (WordNet, Word Embedding, Normalized Google Distance), and three ways to combine word co-occurrence and semantic relationships to measure the weight of edges in the graph model. In experiments, we use Precision, Recall, and F1-measure values as evaluation criteria to compare all keywords extraction methods we proposed with other strong baseline methods in two datasets. According to the results of experiments, methods under our proposed framework achieve good results. We verify that the methods of using both word co-occurrence and semantic relationships have a better effect on keywords extraction than using co-occurrence or semantic relationships only. At the same time, we also find that for the keywords extraction of individual documents, the method of using co-occurrence between words has a better effect than semantic relationships.
Databáze: OpenAIRE