PAWE

Autor: David M. J. Tax, Taygun Kekec, Laurens van der Maaten
Rok vydání: 2018
Předmět:
Zdroj: ICISDM
DOI: 10.1145/3206098.3206101
Popis: Word embedding models learn a distributed vectorial representation for words, which can be used as the basis for (deep) learning models to solve a variety of natural language processing tasks. One of the main disadvantages of current word embedding models is that they learn a single representation for each word in a metric space, as a result of which they cannot appropriately model polysemous words. In this work, we develop a new word embedding model that can accurately represent such words by automatically learning multiple representations for each word, whilst remaining computationally efficient. Without any supervision, our model learns multiple, complementary embeddings that all capture different semantic structure. We demonstrate the potential merits of our model by training it on large text corpora, and evaluating it on word similarity tasks. Our proposed embedding model is competitive with the state of the art and can easily scale to large corpora due to its computational simplicity.
Databáze: OpenAIRE