Autor: |
Granitzer, Michael, Seifert, Christin, Zechner, Mario |
Jazyk: |
angličtina |
Rok vydání: |
2008 |
Předmět: |
|
Zdroj: |
ISSUE=7;TITLE=7th International Workshop of the Initiative for the Evaluation of XML Retrieval, INEX 2008 |
Popis: |
Automatically linking Wikipedia pages is done mostly by two strategies: (i) a content based strategy based on word similarities or (ii) a structural similarity exploiting link characteristics. In our approach we focus on a content based strategy by finding anchors using the title of candidate Wikipedia pages and resolving matching links by taking the context of the link anchor, i.e. its surrounding text, into account. Bestentry-points are estimated on a combination of title and content based similarity. Our goal was to evaluate syntactic title matching properties and the influence of the context around anchors for disambiguation and best-entry-point detection. Results show, that the whole Wikipedia page provides the best context for resolving links and that simple inverse document frequency based scoring of anchor texts is also capable of achieving high accuracy. |
Databáze: |
OpenAIRE |
Externí odkaz: |
|