Efficient Retrieval of Partial Documents
Justin Zobel
Department of Computer Science,
RMIT, GPO Box 2476V,
Melbourne 3001, Australia.
Alistair Moffat
Department of Computer Science,
The University of Melbourne,
Parkville 3052, Australia.
Ross Wilkinson
Department of Computer Science,
RMIT, GPO Box 2476V,
Melbourne 3001, Australia.
Ron Sacks-Davis
Faculty of Applied Science,
RMIT, GPO Box 2476V,
Melbourne 3001, Australia.
Status
Information Processing and Management, 31(3):361-377, 1995.
Abstract
Management and retrieval of large volumes of text can be expensive in
both space and time. Moreover, the range of document sizes in a large
collection such as {\trec} presents difficulties for both the retrieval
mechanism and the user. We consider division of documents into parts
as a solution to the problem of the range of document sizes and show
that, for databases with long documents, use of document parts can
improve the quality of the information presented to the user. We also
describe the compressed text database system we use to manage the
{\trec} collection; the compressed inverted files with which it is
indexed; and the techniques we use to evaluate the {\trec} queries,
both on whole documents and on document parts.