KAROMA: Karonese Morphological Analyzer Based on Graph Theory.

Autor: Karo, Ichwanul Muslim Karo, Md. Fudzee, Mohd Farhan, Kasim, Shahreen, Ramli, Azizul Azhar
Předmět:
Zdroj: Journal of Soft Computing & Data Mining (JSCDM); 2024, Vol. 5 Issue 1, p91-103, 13p
Abstrakt: A morphological analyzer is essential for increasing the quality of natural language processing (NLP) research in national and local languages. Karonese is a local language of Karo ethnics from north Sumatra, Indonesia. Karonese terms have unique phonology, exhibiting variations in spellings and pronunciations while retaining the same meaning and time. Several NLP studies with Karonese case studies have limited access to Karonese morphology analyzers. This study aims to suggest a Karonese morphological analyzer based on graph theory (KAROMA). The KAROMA idea adopts a word-based morphological approach whereby the Karonese terms are expressed in a completed graph. The outcome's set of completed graphs then comprises the Karonese WordNet and is compiled for use as KAROMA. This study also provides two KAROMA evaluators: member checking-based and text similarity-based by modified cosine similarity. The KAROMA evaluation process involves synthetic sentences of Karonese to calculate its text similarity. As a result, KAROMA can detect the uniqueness of Karonese terms and normalize them. The performance of KAROMA is 99% based on member-checking and 97.16% of text similarity-based. Therefore, KAROMA has emerged as a research finding that could be applied as a Karonese stemming and lemmatization technique for a variety of NLP challenges. Furthermore, the evaluator serves as a research contribution that other researchers can apply. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index