Estimating the Quality of a Selection of Scientific Papers Using a Collection of Short Texts.

Autor: Mikhaylov, D. V., Emelyanov, G. M.
Zdroj: Pattern Recognition & Image Analysis; Sep2023, Vol. 33 Issue 3, p568-575, 8p
Abstrakt: The selection of papers on a specified topic requires not only analysis of the vocabulary of every paper on the topic of interest to an end user for relevance, but also taking into account the ultimate goal of the user as such (i.e., what exactly is the problem for which this selection is made to find a solution?). The formation of such a selection on the basis of the occurrence of words in the text of the analyzed paper from the abstracts of papers selected by an expert makes it possible to take into account not only the terms, but also the narration language, which is important, in particular, for the preparation of e-learning courses. The submitted paper is devoted to the relevant problem of estimating the quality of the selection thus formed and the explanation of the solutions obtained. For a document added to the selection, it is proposed to estimate the sentences incorporated into it cumulatively by the share of words with nonzero TF measure values (term frequency), the length of a sentence, and the sum of nonzero values for the term frequency of words incorporated into it. The sentences selected by the criterion of a maximum for the mentioned parameters are used to construct an essay, which is further evaluated for originality with respect to each abstract from the collection formed by an expert and for semantic connectedness in the sentences incorporated into it (with the use of neural network models of the BERT family). Based on the totality of these two estimations, a conclusion on the significance is made for each abstract selected by an expert when including the paper analyzed into the selection. In this case, the generated essay is also used to explain the obtained solution. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index