Hierarchization of Topical Texts Based on the Estimate of Proximity to the Semantic Pattern without Paraphrasing.

Autor: Mikhaylov, D. V., Emelyanov, G. M.
Zdroj: Pattern Recognition & Image Analysis; Jul2020, Vol. 30 Issue 3, p440-449, 10p
Abstrakt: The paper is devoted to the problem of numerically estimating the mutual semantic dependence of topical texts with respect to the most rational (i.e., standard) variants for describing the knowledge fragments they represent. The proximity of the text to the standard is evaluated without searching for paraphrases. This problem is relevant in determining the significance of information sources regarding tasks performed by the user. At this point, an example is the search for the optimal order of working with primary sources in the formation of the individual educational trajectory of a student. In the proposed solution, the basis for assessing the proximity of a text to the standard is the division of the words of each of its phrases into classes according to the value of the TF-IDF measure relative to the texts of the corpus, which was previously formed by an expert. The analyzed texts are the abstracts of scientific articles together with their titles. The principles of ranking and subsequent hierarchization of texts of an original collection based on the assessment variants relative to the title and phrase with the closest proximity to the standard are considered. The semantic images of the texts that are the closest to the standard are determined by the words with the highest TF-IDF values, which, when located next to each other in a linear row of a phrase, are most likely related by meaning and form key combinations together with the words that are close to the average value of the specified measure. An analysis of the occurrence of words with the highest TF-IDF values in different texts of the collection assesses the relationship of their standards as the basis for assessing the complementarity of texts in meaning. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index