Autor: |
Habbat, Nassera, Anoun, Houda, Hassouni, Larbi, Nouri, Hicham |
Předmět: |
|
Zdroj: |
AIP Conference Proceedings; 2023, Vol. 2814 Issue 1, p1-8, 8p |
Abstrakt: |
Unsupervised machine learning is used in the topic modeling process to uncover hidden topics in a large number of documents. The topic model aids in the organization, comprehension, and summarization of vast volumes of text and the discovery of hidden topics that differ among texts in a corpus. The model's coherence can be improved by adding more contextual knowledge. Recently, topic models based on the neural network have become available, and the neural model's development level has improved using the text representation based on BERT. In this work, we suggest a model identify topics from the news posted on the francophone Moroccan Facebook pages. The neural topic model named ProdLDA and the French Pre-training BERT model (CamemBERT) are combined in our approach. With a topic coherence up to 0.75, the suggested method generates more consistent and expressive topics than Doc2vec, utilizing several topic model algorithms (NMF, LDA, and BERTopic) on two French datasets. [ABSTRACT FROM AUTHOR] |
Databáze: |
Complementary Index |
Externí odkaz: |
|