Performance and Cost Tradeoffs in Web Search
Nick Craswell
CSIRO -- ICT Center, GPO Box 664, Canberra, ACT 2601, Australia.
Francis Crimmins
CSIRO -- ICT Center, GPO Box 664, Canberra, ACT 2601, Australia.
David Hawking
CSIRO -- ICT Center, GPO Box 664, Canberra, ACT 2601, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 15th Australasian Database Conference,
Dunedin, New Zealand, January 2004, pages 161-169.
Abstract
Web search engines crawl the web to fetch the data that they index.
In this paper we re-examine that need, and evaluate the network costs
associated with data acquisition, and alternative ways in which a
search service might be supported.
As a concrete example, we make use of the Research Finder search
service provided at http://rf.panopticsearch.com,
and information derived from its crawl and query logs.
Based upon an analysis of the Research Finder system we introduce a
hybrid arrangement, in which queries are evaluated partially by
reference to a centrally maintained index representing a subset of
the collection, and partially by referring them on to the local
search services maintained by the balance of the collection.
We also examine various ways in which crawling costs can be reduced.