Abstrakt: |
Text classification is a significant part of the business world. In the news classification world, detection of the subject is an important issue that can lead to the recognition of news trends and junk news. There are different algorithms of deep learning to process text classification. In this paper, specific algorithms have been implemented and compared to obtain the subject of the text in the Persian news corpus. The best results belong to BiGRU with the attention mechanism and CapsNet (BiGRUACaps) method. The GRU network outperforms LSTM because of fewer gates and, therefore, fewer parameters. In the GRU, the flow control is done without a memory unit, and this network has shown that it has better performance in case of existing less data. Moreover, given that long sentences are used in the news texts, the existence of the attention mechanism has made important words more relevant and has solved the problem in the long sequences data. The most significant problem in classifying Persian texts was the lack of a suitable dataset. One of the contributions of this work is scraped data. Collecting 20,726 records from Persian news websites is the best Persian news dataset with the category. Due to the lack of appropriate pre-trained Persian models and also the combination of various neural networks with these models, and determining the optimal model to identify the subject of Persian text, has been another problem of this research. The use of Model CapsNet in Persian data has also been looked into, which has had exciting results. The results of the comparison show the improvement of the classification performance of the Persian texts. The best result obtained the combination of BiGRUACaps with 0.8608 in F Measure [ABSTRACT FROM AUTHOR] |