Performance and Cost Tradeoffs in Web Search


Nick Craswell
CSIRO -- ICT Center, GPO Box 664, Canberra, ACT 2601, Australia.

Francis Crimmins
CSIRO -- ICT Center, GPO Box 664, Canberra, ACT 2601, Australia.

David Hawking
CSIRO -- ICT Center, GPO Box 664, Canberra, ACT 2601, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. 15th Australasian Database Conference, Dunedin, New Zealand, January 2004, pages 161-169.

Abstract

Web search engines crawl the web to fetch the data that they index. In this paper we re-examine that need, and evaluate the network costs associated with data acquisition, and alternative ways in which a search service might be supported. As a concrete example, we make use of the Research Finder search service provided at http://rf.panopticsearch.com, and information derived from its crawl and query logs. Based upon an analysis of the Research Finder system we introduce a hybrid arrangement, in which queries are evaluated partially by reference to a centrally maintained index representing a subset of the collection, and partially by referring them on to the local search services maintained by the balance of the collection. We also examine various ways in which crawling costs can be reduced.