Abstrakt: |
Over the past few decades, there has been a tendency to increase the number of ongoing studies of varying complexity and focus, which is directly related to significant progress in various fields of science. As a result, the growth of ongoing research is directly proportional to the growth of scientific papers presented to scientists in the form of scientific articles published in various journals or scientific and educational publications. However, such many scientific papers entails the formation of the problem of finding and selecting materials that are potentially useful for research conducted by young scientists or scientists working in new areas of science. As a solution to this problem, this work is proposed for consideration, devoted to the implementation of its own system designed for natural language processing of scientific publications, to group input data materials into main categories, by extracting attributes from texts using a matrix of terms, reducing the dimension by the Correlation-based approach feature selection and clustering by the k-means algorithm. The result of this solution will be a structured data cascade, consisting of the main topics containing research papers belonging to the input data array. [ABSTRACT FROM AUTHOR] |