Topic Analysis of Indonesian Comment Text Using the Latent Dirichlet Allocation

Autor: Muhammad Abdul Aziz, Muhamad Nur Gunawan, Syopiansyah Jaya Putra
Rok vydání: 2021
Předmět:
Zdroj: 2021 9th International Conference on Cyber and IT Service Management (CITSM).
DOI: 10.1109/citsm52892.2021.9588870
Popis: People's ideas expressed in the social media comments section are so diverse and numerous that it is difficult to identify the topic of discussion. The purpose of this work is to apply the Latent Dirichlet Allocation method to analyze conversational topics in Indonesian texts via comments on social media YouTube. The stages of the research procedure began with data collecting, preprocessing, matrix term documents, topic modeling, and categorization. The dataset utilized is Indonesian commentary text on YouTube about COVID-19, which totaled 142,790 comments from 371 videos between March and September 2020. This research yielded six issue categories: “Hope - Prayer,” “Policy - Regulation,” “Pandemic Impact,” “News - Information,” “Health Protocol,” and “Development of Cases.” The modeling findings have an Adjusted Rand Index (ARI) score of 0.84 and a coherence value of 0.572. As a result, the findings indicate that the use of LDA is appropriate for Indonesian comments with short texts and extensive data coverage. The main contribution of this study is to investigate the main themes that are frequently discussed by users in the commentary text in order to discover the opinions that emerge in the community. The findings of this study can provide insight into the key thought themes that emerge in society.
Databáze: OpenAIRE