Text-based Language Identifier using Multinomial Naïve Bayes Algorithm

Autor: Sunita Rawat, Lakshita Werulkar, Sagarika Jaywant
Rok vydání: 2023
Zdroj: International Journal of Next-Generation Computing.
ISSN: 0976-5034
2229-4678
Popis: Language Identification is among the crucial steps in any NLP based application. Text - based documents and webpages are rapidly increasing in the modern Internet. It is simple to locate documents written in different languages from all across the world that are available with just one click. Therefore, a language identifier is absolutely necessary in order to help the user interpret the content. Language identification has so far tended to be more concentrated on European languages and is still rather limited for Indian Traditional Languages. Many researchers have become more interested in the study of language identification for similar languages from popular languages. In this paper, Multinomial Na¨ıve Bayes Algorithm is used for detecting languages in Devanagari like Marathi, Sanskrit and Hindi, and three European languages French, Italian and English. An experiment done ondatasets of each language has produced satisfactorily accurate results after training and testing the model.
Databáze: OpenAIRE