SIGIR'98 demonstrations: Cafe: An indexed approach to searching genomic databases

Cafe: An indexed approach to searching genomic databases


Hugh Williams
Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia.


Abstract

Genomic databases assist molecular biologists in understanding the biochemical function, chemical structure, and evolutionary history of organisms. Popular systems for searching genomic databases match queries to answers by comparing a query to each of the sequences in the database. Efficiency in such exhaustive systems is crucial, since some servers process over 40,000 queries per day, and resolution of each query can require comparison to over one gigabyte of genomic sequence data. While exhaustive systems are practical at present, they may become prohibitively expensive as genomic databases continue to double in size yearly, and as user numbers and query rates grow.

We demonstrate our successful, novel indexing and retrieval techniques for querying genomic databases, which are embodied in a full-scale prototype retrieval system, Cafe. The principal features of Cafe are the incorporation of novel and efficient data structures for query resolution and the demonstration that, despite earlier negative results, indexing can be successfully applied to genomic databases. We demonstrate the Cafe search capabilities and show that it has the requisite properties of scalability and efficiency in space and speed, as well as comparable accuracy to existing search systems.


SIGIR'98
24-28 August 1998
Melbourne, Australia.
sigir98@cs.mu.oz.au.