Homepage Finding and Topic Distillation using
a Common Retrieval Strategy
Vo Ngoc Anh
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
To appear in the 2002 TREC Notebook Proceedings.
Abstract
For the TREC-2002 Web track the University of Melbourne
experimented with a system designed primarily for topic relevance
tasks, and applied it directly to the homepage finding and topic
distillation tasks.
Our intention was to process queries regardless of their
classification, as this information may be unavailable in practice.
An integral weighting scheme reported in earlier work was employed,
modified to take into account anchor text and many of the
metadata fields, but not the URL text, and not the link structure
information.
Our experiments were carried out using a distributed retrieval
system, with data spread across a sixteen node cluster.
Indexing and query processing is fast, and the total index size is
small.