Context Resolution Strategies for Automatic Wikipedia Learning

Autor: Granitzer, Michael, Seifert, Christin, Zechner, Mario
Jazyk: angličtina
Rok vydání: 2008
Předmět:
Zdroj: ISSUE=7;TITLE=7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008
Popis: Automatically linking Wikipedia pages is done mostly by two strategies: (i) a content based strategy based on word similarities or (ii) a structural similarity exploiting link characteristics. In our approach we focus on a content based strategy by finding anchors using the title of candidate Wikipedia pages and resolving matching links by taking the context of the link anchor, i.e. its surrounding text, into account. Bestentry-points are estimated on a combination of title and content based similarity. Our goal was to evaluate syntactic title matching properties and the influence of the context around anchors for disambiguation and best-entry-point detection. Results show, that the whole Wikipedia page provides the best context for resolving links and that simple inverse document frequency based scoring of anchor texts is also capable of achieving high accuracy.
Databáze: OpenAIRE