Term Frequency and Estimating the Closeness of Short Texts to the Semantic Standard.

Autor: Mikhaylov, D. V., Emelyanov, G. M.
Zdroj: Pattern Recognition & Image Analysis; Mar2023, Vol. 33 Issue 1, p22-27, 6p
Abstrakt: This work deals with the interrelated problems of assessing the closeness of a text to the most rational (reference) form of conveying its sense and the formation of a reference text collection, in relation to which the assessment itself is performed. The texts under analysis for closeness to the semantic standard are abstracts of scientific articles together with their titles. The solution is based on the comparison of values for the 5th percentile of the empirical distribution corresponding to an array of fractions for nonzero values of the term frequency (TF) for separate phrases within each abstract relative to each document under consideration for inclusion into the reference collection. A variant for numerical estimation the significance of the abstract for calculating the mentioned percentile for candidate documents with maximum precision in the case of selection of the most significant for the reference collection is offered. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index