SIGIR'98 demonstrations: Cafe: An indexed approach to searching genomic databases
Cafe: An indexed approach to searching genomic databases
Hugh Williams
Department of Computer Science,
RMIT, GPO Box 2476V,
Melbourne 3001, Australia.
Abstract
Genomic databases assist molecular biologists in understanding the
biochemical function, chemical structure, and evolutionary history of
organisms.
Popular systems for searching genomic databases match queries to answers by
comparing a query to each of the sequences in the database.
Efficiency in such exhaustive systems is crucial, since some servers process
over 40,000 queries per day, and resolution of each query can require
comparison to over one gigabyte of genomic sequence data.
While exhaustive systems are practical at present, they may become
prohibitively expensive as genomic databases continue to double in size
yearly, and as user numbers and query rates grow.
We demonstrate our successful, novel indexing and retrieval techniques
for querying genomic databases, which are embodied in a full-scale prototype
retrieval system, Cafe.
The principal features of Cafe are the incorporation of novel and efficient
data structures for query resolution and the demonstration that, despite
earlier negative results, indexing can be successfully applied to genomic
databases.
We demonstrate the Cafe search capabilities and show that it has the
requisite properties of scalability and efficiency in space and speed, as
well as comparable accuracy to existing search systems.
SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.