Ranking in keyphrase extraction problem: is it suitable to use statistics of words occurrences?

Autor: S. V. Popova, I. A. Khodyrev
Jazyk: English<br />Russian
Rok vydání: 2018
Předmět:
Zdroj: Труды Института системного программирования РАН, Vol 26, Iss 4, Pp 123-136 (2018)
Druh dokumentu: article
ISSN: 2079-8156
2220-6426
DOI: 10.15514/ISPRAS-2014-26(4)-10
Popis: The paper deals with keyphrase extraction problem for single documents, e.g. scientific abstracts. Keyphrase extraction task is important and its results could be used in a variety of applications: data indexing, clustering and classification of documents, meta-information extraction, automatic ontologies creation etc. In the paper we discuss an approach to keyphrase extraction, itsтАЩ first step is building of candidate phrases which are then ranked and the best are selected as keyphrases. The paper is focused on the evaluation of weighting approaches to candidate phrases in the unsupervised ex-traction methods. A number of in-phrase word weighting procedures is evaluated. Unsuitable approaches to weighting are identified. Testing of some approaches shows their equivalence as applied to keyphrase extraction. A feature, which allows to increase the quality of extracted keyphrases and shows better results in comparison to the state of the art, is proposed. Experiments are based on Inspec dataset.
Databáze: Directory of Open Access Journals