Bangla Documents Classification using Transformer Based Deep Learning Models

Autor: Md. Aktaruzzaman Pramanik, Rifat Sadik, Mahbubur Rahman, Monikrishna Roy, Partha Chakraborty
Rok vydání: 2020
Předmět:
Zdroj: 2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI).
DOI: 10.1109/sti50764.2020.9350394
Popis: Document classification or categorization assign documents to a predefined domain category. The improvement of document classification techniques has been noticeable worldwide recently. Many transformer-based models have been introduced for different languages, which shows significant improvement in this area of Natural Language Processing. In this paper, we have classified Bangla text documents with the most recent transformer or attention mechanism-based models. We have applied the BERT (Bidirectional Encoder Representations from Transformers) and ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) model for Bangla text classification. Both of them are pre-training text encoders and we have applied fine-tuning approach for the downstream(classification) task. Here, we have used three different Bangla text datasets for our experiment. Both of the models provide outstanding performance for two out of three datasets we have used.
Databáze: OpenAIRE