Finding similar research papers using language models

Autor: Hurtado Martín, G., Steven Schockaert, Cornelis, C., Naessens, H.
Jazyk: angličtina
Rok vydání: 2011
Předmět:
Zdroj: 2nd workshop on semantic personalized information management : retrieval and recommendation, Proceedings
Scopus-Elsevier
Popis: The task of assessing the similarity of research papers is of interest in a variety of application contexts. It is a challenging task, however, as the full text of the papers is often not available, and similarity needs to be determined based on the papers' abstract, and some additional features such as authors, keywords, and journal. Our work explores the possibility of adapting language modeling techniques to this end. The basic strategy we pursue is to augment the information contained in the abstract by interpolating the corresponding language model with language models for the authors, keywords and journal of the paper. This strategy is then extended by finding topics and additionally interpolating with the resulting topic models. These topics are found using an adaptation of Latent Dirichlet Allocation (LDA), in which the keywords that were provided by the authors are used to guide the process.
Databáze: OpenAIRE