An approach to the use of word embeddings in an opinion classification task

Autor: Enríquez de Salamanca Ros, Fernando, Troyano Jiménez, José Antonio, López Solaz, Tomás
Přispěvatelé: Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos, Junta de Andalucía
Rok vydání: 2016
Předmět:
Zdroj: idUS. Depósito de Investigación de la Universidad de Sevilla
instname
idUS: Depósito de Investigación de la Universidad de Sevilla
Universidad de Sevilla (US)
Popis: In this paper we show how a vector-based word representation obtained via word2vec can help to im- prove the results of a document classifier based on bags of words. Both models allow obtaining nu- meric representations from texts, but they do it very differently. The bag of words model can representdocuments by means of widely dispersed vectors in which the indices are words or groups of words.word2vec generates word level representations building vectors that are much more compact, where in- dices implicitly contain information about the context of word occurrences. Bags of words are very effec- tive for document classification and in our experiments no representation using only word2vec vectorsis able to improve their results. However, this does not mean that the information provided by word2vecis not useful for the classification task. When this information is used in combination with the bags ofwords, the results are improved, showing its complementarity and its contribution to the task. We havealso performed cross-domain experiments in which word2vec has shown much more stable behaviorthan bag of words models. Junta de Andalucía P11-TIC-7684 MO
Databáze: OpenAIRE