KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language

Autor: Zhanibek Kozhirbayev, Aibek Makazhanov, Zhandos Yessenbayev
Rok vydání: 2020
Předmět:
Zdroj: Speech and Computer ISBN: 9783030602758
SPECOM
DOI: 10.1007/978-3-030-60276-5_63
Popis: We present the current results of our ongoing work on develop-ing tools and algorithms for processing Kazakh language in the framework of KazNLP project. The project is motivated by the need in accessible, easy to use, cross-platform, and well-documented automated text processing tools for Kazakh, particularly user generated text, which includes transliteration, code switching, and other artifacts of language-specific raw data that needs pre-processing. Thus, apart from a basic tokenization-tagging-parsing pipeline, and downstream applications such as named entity recognition and spell checking, KazNLP offers pre-processing tools such as text normalization and language identification. All of the KazNLP tools are released under the Creative Commons license. Since the detailed description of the methods and algorithms that were used in KazNLP are published or to be published in various venues, reference to which is given in the corresponding sections, this work provides just an overview of the tools and their performance level.
Databáze: OpenAIRE