Selection of Tools for Preprocessing and Thematic Modeling of Scientific Articles from the Data Lake.

Autor: Gayanova, M. M., Sazonova, E. Yu., Smetanina, O. N., Sulejmanov, A. K.
Zdroj: Pattern Recognition & Image Analysis; Sep2023, Vol. 33 Issue 3, p313-323, 11p
Abstrakt: The article deals with the choice of models and methods of preprocessing of texts from scientific articles automatically extracted from open sources and loaded in the data lake, and their thematic modeling. The results of the choice are embedded in the software solution created, which makes it possible to collect scientific publications and store them in the data lake, carry out prework of the text and directly thematic modeling with the ability to visualize the results. The article includes the current state of the problem, the formulation of the problem and the approach to its solution, conducting experimental research, and creating a software solution based on the selected models and methods. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index