Dynamic Element Retrieval in a Semi-structured Collection

Autor: Murthy Ganapathibhotla, Vishal Bakshi, Carolyn J. Crouch, Donald B. Crouch
Rok vydání: 2007
Předmět:
Zdroj: Comparative Evaluation of XML Information Retrieval Systems ISBN: 9783540738879
INEX
DOI: 10.1007/978-3-540-73888-6_9
Popis: This paper describes our methodology for the dynamic retrieval of XML elements, an overview of its implementation in a structured environment, and the challenges introduced by applying it to the INEX Wikipedia [4] collection, which can more aptly be described as semi-structured. Our system is based on the vector space model [9] and its basic functions are performed using the Smart experimental retrieval system [8]. A major change in the system this year is the incorporation of a method for the dynamic computation of query term weights [6] to be correlated with the dynamically generated and weighted element vectors. Dynamic element retrieval requires only a single indexing of the document collection at the level of the basic indexing node (in this case, the paragraph). It returns a rank-ordered list of elements equivalent to that produced by the same query against an all-element index of the collection. (A detailed description of this method appears in [1].) As we move from a well structured collection, such as the INEX IEEE documents, to Wikipedia, changes in the structure of the articles must be accommodated.
Databáze: OpenAIRE