Improving Document Ranking with Dual Word Embeddings
Autor: | Nick Craswell, Rich Caruana, Eric Nalisnick, Bhaskar Mitra |
---|---|
Rok vydání: | 2016 |
Předmět: |
Information retrieval
Word embedding Computer science 02 engineering and technology Term (time) Ranking (information retrieval) Ranking 020204 information systems 0202 electrical engineering electronic engineering information engineering Embedding 020201 artificial intelligence & image processing Word2vec Relevance (information retrieval) tf–idf Word (computer architecture) |
Zdroj: | WWW (Companion Volume) |
Popis: | This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking. In contrast to NLP applications of word2vec, which tend to use only the input embeddings, we retain both the input and the output embeddings, allowing us to calculate a different word similarity that may be more suitable for document ranking. We map the query words into the input space and the document words into the output space, and compute a relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding Space Model (DESM) provides evidence that a document is about a query term, in addition to and complementing the traditional term frequency based approach. |
Databáze: | OpenAIRE |
Externí odkaz: |