Automatic Keywords Extraction Based on Co-Occurrence and Semantic Relationships Between Words
Autor: | Shaobin Huang, Li Rongsheng, Xiangke Mao, Linshan Shen |
---|---|
Jazyk: | angličtina |
Rok vydání: | 2020 |
Předmět: |
semantic similarity
Word embedding General Computer Science Matching (graph theory) Recall business.industry Computer science General Engineering Co-occurrence WordNet computer.software_genre word co-occurrence Automatic keywords extraction Selection (linguistics) General Materials Science graph model Artificial intelligence TextRank lcsh:Electrical engineering. Electronics. Nuclear engineering Normalized Google distance business computer lcsh:TK1-9971 Word (computer architecture) Natural language processing |
Zdroj: | IEEE Access, Vol 8, Pp 117528-117538 (2020) |
ISSN: | 2169-3536 |
Popis: | Automatic keywords extraction is a method that extracts words or phrases from a document which can express the main idea of the document. In this paper, we propose an unsupervised keywords extraction framework for individual documents, which improves the keywords extraction from two aspects. In the step of candidate keywords selection, we use the methods of removing the stopwords, regular matching, and length filtering to reduce the number of candidate keywords, but improve the quality. In the step of scoring words, we use word co-occurrence, semantic relationships (WordNet, Word Embedding, Normalized Google Distance), and three ways to combine word co-occurrence and semantic relationships to measure the weight of edges in the graph model. In experiments, we use Precision, Recall, and F1-measure values as evaluation criteria to compare all keywords extraction methods we proposed with other strong baseline methods in two datasets. According to the results of experiments, methods under our proposed framework achieve good results. We verify that the methods of using both word co-occurrence and semantic relationships have a better effect on keywords extraction than using co-occurrence or semantic relationships only. At the same time, we also find that for the keywords extraction of individual documents, the method of using co-occurrence between words has a better effect than semantic relationships. |
Databáze: | OpenAIRE |
Externí odkaz: |