Surrogate-based optimization of learning strategies for additively regularized topic models.

Autor: Khodorchenko, Maria, Butakov, Nikolay, Sokhin, Timur, Teryoshkin, Sergey
Předmět:
Zdroj: Logic Journal of the IGPL; Apr2023, Vol. 31 Issue 2, p287-299, 13p
Abstrakt: Topic modelling is a popular unsupervised method for text processing that provides interpretable document representation. One of the most high-level approaches is additively regularized topic models (ARTM). This method features better quality than other methods due to its flexibility and advanced regularization abilities. However, it is challenging to find an optimal learning strategy to create high-quality topics because a user needs to select the regularizers with their values and determine the order of application. Moreover, it may require many real runs or model training which makes this task time consuming. At the current moment, there is a lack of research on parameter optimization for ARTM-based models. Our work proposes an approach that formalizes the learning strategy into a vector of parameters which can be solved with evolutionary approach. We also propose a surrogate-based modification which utilizes machine learning methods that makes the approach for parameters search time efficient. We investigate different optimization algorithms (evolutionary and Bayesian) and their modifications with surrogates in application to topic modelling optimization using the proposed learning strategy approach. An experimental study conducted on English and Russian datasets indicates that the proposed approaches are able to find high-quality parameter solutions for ARTM and substantially reduce the execution time of the search. [ABSTRACT FROM AUTHOR]
Databáze: Complementary Index
Nepřihlášeným uživatelům se plný text nezobrazuje