KazNLP: A Pipeline for Automated Processing of Texts Written in Kazakh Language
Autor: | Zhanibek Kozhirbayev, Aibek Makazhanov, Zhandos Yessenbayev |
---|---|
Rok vydání: | 2020 |
Předmět: |
050101 languages & linguistics
Information retrieval Language identification Computer science 05 social sciences 02 engineering and technology computer.software_genre Pipeline (software) Text processing Named-entity recognition 0202 electrical engineering electronic engineering information engineering Text normalization Transliteration 020201 artificial intelligence & image processing 0501 psychology and cognitive sciences Computational linguistics Raw data computer |
Zdroj: | Speech and Computer ISBN: 9783030602758 SPECOM |
DOI: | 10.1007/978-3-030-60276-5_63 |
Popis: | We present the current results of our ongoing work on develop-ing tools and algorithms for processing Kazakh language in the framework of KazNLP project. The project is motivated by the need in accessible, easy to use, cross-platform, and well-documented automated text processing tools for Kazakh, particularly user generated text, which includes transliteration, code switching, and other artifacts of language-specific raw data that needs pre-processing. Thus, apart from a basic tokenization-tagging-parsing pipeline, and downstream applications such as named entity recognition and spell checking, KazNLP offers pre-processing tools such as text normalization and language identification. All of the KazNLP tools are released under the Creative Commons license. Since the detailed description of the methods and algorithms that were used in KazNLP are published or to be published in various venues, reference to which is given in the corresponding sections, this work provides just an overview of the tools and their performance level. |
Databáze: | OpenAIRE |
Externí odkaz: |