Encoding XML in Vector Spaces.

Autor: Losada, David E., Fernández-Luna, Juan M., Kakade, Vinay, Raghavan, Prabhakar
Zdroj: Advances in Information Retrieval (9783540252955); 2005, p96-111, 16p
Abstrakt: We develop a framework for representing XML documents and queries in vector spaces and build indexes for processing text-centric semi-structured queries that support a proximity measure between XML documents. The idea of using vector spaces for XML retrieval is not new. In this paper we (i) unify prior approaches into a single framework; (ii) develop techniques to eliminate special purpose auxiliary computations (outside the vector space) used previously; (iii) give experimental evidence on benchmark queries that our approach is competitive in its retrieval quality and (iv) as an immediate consequence of the framework, are able to classify and cluster XML documents. [ABSTRACT FROM AUTHOR]
Databáze: Supplemental Index