SIRIUS: A Lightweight XML Indexing and Approximate Search System at INEX 2005
Autor: | Eugen Popovici, Gildas Ménier, Pierre-François Marteau |
---|---|
Rok vydání: | 2006 |
Předmět: |
Document Structure Description
Matching (statistics) Information retrieval computer.internet_protocol Computer science InformationSystems_INFORMATIONSTORAGEANDRETRIEVAL Search engine indexing Inverted index Set (abstract data type) ComputingMethodologies_DOCUMENTANDTEXTPROCESSING Object model Document retrieval computer XML |
Zdroj: | Lecture Notes in Computer Science ISBN: 9783540349624 INEX Lecture Notes in Computer Science ISBN: 3540349626 |
Popis: | This paper reports on SIRIUS, a lightweight indexing and search engine for XML documents. The retrieval approach implemented is document oriented. It involves an approximate matching scheme of the structure and textual content. Instead of managing the matching of whole DOM trees, SIRIUS splits the documents object model in a set of paths. In this view, the request is a path-like expression with conditions on the attribute values. In this paper, we present the main functionalities and characteristics of this XML IR system and second we relate on our experience on adapting and using it for the INEX 2005 ad-hoc retrieval task. Finally, we present and analyze the SIRIUS retrieval performance obtained during the INEX 2005 evaluation campaign and show that despite the lightweight characteristics of SIRIUS we were able to retrieve highly relevant non overlapping XML elements and obtained quite good precision at low recall values. |
Databáze: | OpenAIRE |
Externí odkaz: |