Быстрый алгоритм анализа словоформ естественного языка с трехуровневой моделью словаря начальных форм

Jazyk: ruština
Rok vydání: 2016
Předmět:
Zdroj: Cloud of science.
Popis: Рассмотрен подход к определению форм слов естественных языков с постфиксным словоизменением. Предложен вариант представления правил формообразования с помощью встречных префиксных деревьев. Приводятся результаты измерения скорости определения форм слов.
In the field of Natural Language Processing, identifying word forms and, more precisely, identifying part-of-speech and grammatical information for each of the words in the input text usually comprises the very first level of text processing (or immediately follows splitting the text into words, should such task be non-trivial), therefore development of approaches to speed up the word form analysis pose significant interest. In this work, by using the work [1] as a basis, we present an approach to analysis of word forms for natural languages with postfix inflection, following the work done in [3]. We propose a way of representing the postfix inflection rules associated with a natural language and an algorithm for word form analysis based on it. In conclusion, we provide the benchmark data indicating the increase in speed compared to known analysis methods.
Databáze: OpenAIRE