Efficient Retrieval of Partial Documents


Justin Zobel
Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia.

Alistair Moffat
Department of Computer Science, The University of Melbourne, Parkville 3052, Australia.

Ross Wilkinson
Department of Computer Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia.

Ron Sacks-Davis
Faculty of Applied Science, RMIT, GPO Box 2476V, Melbourne 3001, Australia.


Status

Information Processing and Management, 31(3):361-377, 1995.

Abstract

Management and retrieval of large volumes of text can be expensive in both space and time. Moreover, the range of document sizes in a large collection such as {\trec} presents difficulties for both the retrieval mechanism and the user. We consider division of documents into parts as a solution to the problem of the range of document sizes and show that, for databases with long documents, use of document parts can improve the quality of the information presented to the user. We also describe the compressed text database system we use to manage the {\trec} collection; the compressed inverted files with which it is indexed; and the techniques we use to evaluate the {\trec} queries, both on whole documents and on document parts.