NPABT: Naming Pattern Analysis Of Bengali Text To Detect Various Community Using Machine Learning Approach

Autor: Jannatul Ferdous Ani, Nushrat Jahan Ria, Mirajul Islam, Sheikh Abujar, Tanupriya Choudhury, Abu Kaisar Mohammad Masum
Rok vydání: 2021
Předmět:
Zdroj: ICCCNT
DOI: 10.1109/icccnt51525.2021.9580046
Popis: Natural Language processing is an important part of Artificial Intelligence which enriches languages. It creates a bridge between humans and machines to communicate. In this paper, we have proposed a method that can automatically predict a person's community where he may belong by using their name. Based on the research available on online literature, we are the first to do research on it. We have collected more than 8,000 names of the four major communities of Bangladesh, including both males and females. Then data has been preprocessed to clean those texts by following the natural language processing approach. The six most popular machine learning classifiers have been used to train and test this data. Usually, human names have some prefix or suffix of their community. From that prefix or suffix, this method determines the community of that person's or their predecessors' community. The used approaches worked really well for these data and give higher accuracy for the models. The total performance of those classifiers is up to 65.12%-78.25%. Random Forest (RF) classifiers achieved the highest accuracy which is 78.25%.
Databáze: OpenAIRE