Needles and Haystacks:
A Search Engine for Personal Information Collections
Owen de Kretser
Department of Computer Science and Software Engineering,
The University of Melbourne,
Parkville 3052, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Parkville 3052, Australia.
Status
Proc. 23nd Australasian Computer Science Conference,
Canberra, Australia, February 2000, 58-65.
Abstract
Information retrieval systems can be partitioned into two main classes:
large-scale systems that make use of an inverted index or some other
auxiliary data structure, intended for massive volumes of data;
and the small-scale systems based upon sequential pattern matching
that most computer users employ when hunting for missing email
and news items.
In this paper we describe a hybrid approach that offers the ranked queries
and similarity matching
of a genuine information retrieval system,
but does so without any need for an index to be precomputed.
This software tool, which we call seft, offers performance that in
a retrieval effectiveness sense matches conventional information
retrieval systems, and in a resource efficiency sense,
while considerably slower than grep-like tools, is fast enough
to be useful
on hundreds of megabytes of text.
Software
The seft software is available here.