Bangla Document Classification using Character Level Deep Learning
Autor: | Rifat Sadik, Al Amin Biswas, Md. Mahbubur Rahman |
---|---|
Rok vydání: | 2020 |
Předmět: |
Computer science
Character (computing) business.industry Deep learning Document classification Search engine indexing computer.software_genre Convolutional neural network language.human_language Task (project management) Data set Bengali language Artificial intelligence business computer Natural language processing |
Zdroj: | 2020 4th International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). |
DOI: | 10.1109/ismsit50672.2020.9254416 |
Popis: | Last few decades, the availability and accessibility of the Bangla document and its content have rapidly increased due to the rapid technological advancement. Intense research needs to be performed on various Bangla documents due to the diversity of the language and associated sentiment. Document classification is one of the fundamental problems of Natural Language Processing. To handle miss-classification and convenient indexing and searching of Bangla documents on the web, researchers nowadays exploring different fields of computer science to classify Bangla documents. In this paper, Deep Learning based approaches are implemented to classify Bangla text documents. Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) is used here for the classification task. Here we have implemented an advanced technique that encoded the documents at their character level. Documents from three different data sources are used to validate and test of the working models. The highest classification accuracy is 95.42% that is achieved on the Prothom Alo data set using LSTM. Furthermore, we presented a comparison between two models and explained how well the classification task can be carried out using our character level approach with higher accuracy. |
Databáze: | OpenAIRE |
Externí odkaz: |