A Deep Learning-Based Approach for Part of Speech (PoS) Tagging in the Pashto Language

Autor: Shaheen Ullah, Riaz Ahmad, Abdallah Namoun, Siraj Muhammad, Khalil Ullah, Ibrar Hussain, Isa Ali Ibrahim
Jazyk: angličtina
Rok vydání: 2024
Předmět:
Zdroj: IEEE Access, Vol 12, Pp 86355-86364 (2024)
Druh dokumentu: article
ISSN: 2169-3536
DOI: 10.1109/ACCESS.2024.3412175
Popis: A fundamental task in natural language processing (NLP) is part of speech (PoS) tagging. PoS tagging is crucial to many NLP applications, including question answering, machine translation, syntactic parsing, speech recognition, and semantic parsing. PoS tagging is a task for labeling sequences in which a tagger/system tags each word with its appropriate part of speech label. In NLP, PoS tagging is often considered as a language-specific task. Similarly, Pashto is a language that has not been explored regarding PoS tagging. Therefore, this research focuses on the PoS tagging considering the Pashto language and provides a baseline accuracy. The research has twofold benefits. First, it introduces a Pashto tag set that contains 2,81,205 words of the Pashto language. All these words are tagged with 17 unique PoS tags. Second, it proposes a deep learning-based model by examining classic Recursive Neural Networks (RNN) and Bidirectional Long Short Term Memory Networks (BLSTM). The results show promising performances when used with the word embedding technique. The proposed approach achieved 98.82% accuracy as a baseline on the test dataset by using the BLSTM model along with word embedding.
Databáze: Directory of Open Access Journals