Combined document embedding and hierarchical topic model for social media texts analysis

Autor: Amir Uteuov, Anna V. Kalyuzhnaya
Rok vydání: 2018
Předmět:
Zdroj: Procedia Computer Science. 136:293-303
ISSN: 1877-0509
Popis: Exploring customer interests from open source information has become a significant issue. On the one hand, consumers deepen their engagement with the brands which values matter to them. On the other hand, annoying marketing calls and polls do not reflect real customers’ needs and wants. This article considers topic modeling in application to social media analysis. We have received interpretable topics related to users preferences. Crawled posts texts and texts obtaining from images by an optical character recognition were used as datasets. Focusing on two approaches: probabilistic (LDA, ARTM) and neural network based (doc2vec, word2vec), we suggest the combined model deARTM. Hierarchical ARTM model allows us to obtain relations between texts in several abstraction levels which we used as vector representation. To avoid misspelling sensitivity, our model includes document embedding. In the experimental part, we show that our model can improve results of topic modeling on social media datasets.
Databáze: OpenAIRE