Inferring Search Queries from Web Documents via a Graph-Augmented Sequence to Attention Network
Autor: | Yu Xu, Di Niu, Fred X. Han, Yancheng He, Weidong Guo, Kunfeng Lai |
---|---|
Rok vydání: | 2019 |
Předmět: |
Information retrieval
Web search query Natural language user interface Computer science Keyword extraction Inference 02 engineering and technology 010501 environmental sciences 01 natural sciences Search engine Generative model Recurrent neural network Ranking 0202 electrical engineering electronic engineering information engineering 020201 artificial intelligence & image processing 0105 earth and related environmental sciences Transformer (machine learning model) |
Zdroj: | WWW |
DOI: | 10.1145/3308558.3313746 |
Popis: | We study the problem of search query inference from web documents, where a short, comprehensive natural language query is inferred from a long article. Search query generation or inference is of great value to search engines and recommenders in terms of locating potential target users and ranking content. Despite being closely related to other NLP tasks like abstract generation and keyword extraction, we point out that search query inference is, in fact, a new problem, in that the generated natural language query, which consists of a few words, is expected to be comprehensive enough to lead to the click-through of the corresponding document. Therefore, query generation requires an accurate inference of query words, as well as a deeper level of understanding on document semantic structures. Toward this end, we propose a novel generative model called the Graph-augmented Sequence to Attention (G-S2A) network. Adopting an Encoder-Decoder architecture, G-S2A incorporates a sentence-level Graph Convolutional Network (GCN), a keyword-level GCN, as well as a hierarchical recurrent neural network (RNN) into the encoder to generate structural document representations. An attentional Transformer decoder is then applied to combine different types of encoded features to generate a target query. On a query-document dataset from a real-world search engine, our model outperforms several neural generative models on a wide range of metrics. |
Databáze: | OpenAIRE |
Externí odkaz: |