Bangla Documents Classification using Transformer Based Deep Learning Models

Autor:	Md. Aktaruzzaman Pramanik, Rifat Sadik, Mahbubur Rahman, Monikrishna Roy, Partha Chakraborty
Rok vydání:	2020
Předmět:	business.industry Computer science Document classification Deep learning computer.software_genre Data modeling Domain (software engineering) Categorization Task analysis Artificial intelligence business Encoder computer Natural language processing Transformer (machine learning model)
Zdroj:	2020 2nd International Conference on Sustainable Technologies for Industry 4.0 (STI).
DOI:	10.1109/sti50764.2020.9350394
Popis:	Document classification or categorization assign documents to a predefined domain category. The improvement of document classification techniques has been noticeable worldwide recently. Many transformer-based models have been introduced for different languages, which shows significant improvement in this area of Natural Language Processing. In this paper, we have classified Bangla text documents with the most recent transformer or attention mechanism-based models. We have applied the BERT (Bidirectional Encoder Representations from Transformers) and ELECTRA (Efficiently Learning an Encoder that Classifies Token Replacements Accurately) model for Bangla text classification. Both of them are pre-training text encoders and we have applied fine-tuning approach for the downstream(classification) task. Here, we have used three different Bangla text datasets for our experiment. Both of the models provide outstanding performance for two out of three datasets we have used.
Databáze:	OpenAIRE
Externí odkaz:	https://explore.openaire.eu/search/publication?articleId=doi_________::912eeb3f246d6f9c33d03a2f7c02d74a https://doi.org/10.1109/sti50764.2020.9350394 Zobrazit plný text záznamu