Combined document embedding and hierarchical topic model for social media texts analysis
Autor: | Amir Uteuov, Anna V. Kalyuzhnaya |
---|---|
Rok vydání: | 2018 |
Předmět: |
Topic model
Information retrieval Computer science Probabilistic logic 02 engineering and technology Optical character recognition 010501 environmental sciences computer.software_genre 01 natural sciences 0202 electrical engineering electronic engineering information engineering General Earth and Planetary Sciences Embedding 020201 artificial intelligence & image processing Social media Word2vec Representation (mathematics) computer 0105 earth and related environmental sciences General Environmental Science Abstraction (linguistics) |
Zdroj: | Procedia Computer Science. 136:293-303 |
ISSN: | 1877-0509 |
Popis: | Exploring customer interests from open source information has become a significant issue. On the one hand, consumers deepen their engagement with the brands which values matter to them. On the other hand, annoying marketing calls and polls do not reflect real customers’ needs and wants. This article considers topic modeling in application to social media analysis. We have received interpretable topics related to users preferences. Crawled posts texts and texts obtaining from images by an optical character recognition were used as datasets. Focusing on two approaches: probabilistic (LDA, ARTM) and neural network based (doc2vec, word2vec), we suggest the combined model deARTM. Hierarchical ARTM model allows us to obtain relations between texts in several abstraction levels which we used as vector representation. To avoid misspelling sensitivity, our model includes document embedding. In the experimental part, we show that our model can improve results of topic modeling on social media datasets. |
Databáze: | OpenAIRE |
Externí odkaz: |