Pengelompokan Jurnal Ilmiah Berdasarkan Judul Menggunakan LDA

Autor: R Setiawan Aji Nugroho, Yosefina Oktaviani Santoso
Rok vydání: 2021
Předmět:
Zdroj: Proxies : Jurnal Informatika. 3:32-42
ISSN: 2301-9220
Popis: Scientific journals develop very rapidly along with the development of science. Reporting from labs.semanticscholar.org/corpus, the number of scientific journals has reached over 39 million. The large number of scientific journals makes it challenging to grouping scientific journals. Grouping become more difficult because each scientific journal can have more than one topic. Therefore, special methods are needed to group the scientific journals.One of the well-known topic modeling methods is Latent Dirichlet Allocation (LDA). This research is an implementation of the LDA algorithm to do topic modeling in scientific journals. The topic modeling in this study uses the title as a corpus. Various titles are processed into bag of words in the pre-processing process so that they can be used to distribute. The results of the distribution stage are used for sampling with the Gibbs Sampling method. Through the sampling process, testing can also be done to determine the optimal parameters. The testing in this study used perplexity to find the most optimal number of iterations and topics. The result from this research are that LDA Algorithm successfully performs topic modeling in scientific journals by generating a list of keywords for each topic and grouping documents on each topic. The optimal parameters based on the results of perplexity comparison are 3 topics and 500 iterations.
Databáze: OpenAIRE