A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language

Autor:	Shaheen Ullah, Riaz Ahmad, Abdallah Namoun, Siraj Muhammad, Khalil Ullah, Ibrar Hussain, Isa Ali Ibrahim
Jazyk:	angličtina
Rok vydání:	2024
Předmět:	Artificial intelligence document image analysis handwritten text natural language processing optical character recognition speech recognition Electrical engineering. Electronics. Nuclear engineering TK1-9971
Zdroj:	IEEE Access, Vol 12, Pp 86355-86364 (2024)
Druh dokumentu:	article
ISSN:	2169-3536
DOI:	10.1109/ACCESS.2024.3412175
Popis:	A fundamental task in natural language processing (NLP) is part of speech (PoS) tagging. PoS tagging is crucial to many NLP applications, including question answering, machine translation, syntactic parsing, speech recognition, and semantic parsing. PoS tagging is a task for labeling sequences in which a tagger/system tags each word with its appropriate part of speech label. In NLP, PoS tagging is often considered as a language-specific task. Similarly, Pashto is a language that has not been explored regarding PoS tagging. Therefore, this research focuses on the PoS tagging considering the Pashto language and provides a baseline accuracy. The research has twofold benefits. First, it introduces a Pashto tag set that contains 2,81,205 words of the Pashto language. All these words are tagged with 17 unique PoS tags. Second, it proposes a deep learning-based model by examining classic Recursive Neural Networks (RNN) and Bidirectional Long Short Term Memory Networks (BLSTM). The results show promising performances when used with the word embedding technique. The proposed approach achieved 98.82% accuracy as a baseline on the test dataset by using the BLSTM model along with word embedding.
Databáze:	Directory of Open Access Journals
Externí odkaz:	https://doaj.org/article/434fd3c1c15042deb80c5bb27c390bd4 Zobrazit plný text záznamu View record in DOAJ