Compression and an IR Approach to XML Retrieval
Vo Ngoc Anh
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. of the First Workshop of the Initiative for
the Evaluation of XML Retrieval (INEX),
Dagstuhl, Germany, December 2002, 99-104.
Abstract
A two-phase evaluation scheme is proposed for XML retrieval. In the
first phase, a modified vector space model is employed to obtain
similarity scores for the textual nodes of XML trees.
In the second stage, the scores are propagated upward in the XML trees,
with scores of the textual nodes being modified and scores of other
nodes being generated. As a result,
while a vector space ranking is used, the final scores computed do not
truly reflect the vector space scores.
In addition to the two-phase evaluation, an integrated compressed
file system is proposed for both storing and retrieving XML documents.
This leads to an efficient representation of XML repositories.