SIGIR'98 Tutorial T7
SIGIR'98 logo

21st Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval

Melbourne, Australia

August 24 - 28, 1998

TUTORIAL T7

Practical Digital Libraries



Presenters

Ian H. Witten and Rodger J. McNab,
Waikato University, New Zealand.

Time

Monday 24 August, 2:00pm--5:30pm.

Description

This tutorial will follow the structure of Practical Digital Libraries, a recent book by Mike Lesk (Morgan Kaufmann, 1997). In twelve chapters, this widely acclaimed book covers the major issues surrounding digital libraries, with an emphasis on practical technologies that are available today to solve the problems of building such libraries. But what is described in the book, although practical, is not presented in the context of an actual digital library implementation. The tutorial will fill the gap by describing an existing, operational, comprehensive, economical digital library system, the New Zealand Digital Library, and relating it to the structure of the book -- thus providing a concrete demonstration of the application of Lesk's ideas. The attendee will gain both an overview of the field of digital libraries, and a detailed look at a particular, freely-accessible, implementation that demonstrates the concepts.

Topics to be covered include the evolution of a digital library and the hardware technology required to support it; the problems of accommodating different text formats including the extraction of text from PostScript; experience with large-scale OCR including the use of language models to improve the results; the use of images extracted from PostScript and text extracted from Adobe Acrobat image formats; textual image compression and document structure extraction; audio collections and the indexing of music; the software architecture for a digital libraries from requirements to implementation; rudimentary textual processing such as stemming and case-folding; phrase extraction and hrowsing; distribution of digital library collections via Internet and CD-ROM; user interfaces for query specification and visualizing sequences of queries; a description of the New Zealand Digital Library project including resources available and the scope of the collections; a discussion of scalability and the limits to growth.

Audience

This tutorial should appeal to people seeking practical know-how concerning the construction of a particular digital library system. No special background is required of attendees other than general familiarity with the World-Wide Web.

Biographies of presenters

Ian H. Witten received degrees in Mathematics from Cambridge University, Computer Science from the University of Calgary, and Electrical Engineering from Essex University, England. A Lecturer at Essex from 1970, he returned to Calgary in 1980 and in 1992 moved to New Zealand to take up a position as Professor of Computer Science at Waikato University.

The underlying theme of his research is the exploitation of information about the past to expedite interaction in the future. He has worked in machine learning, which seeks ways to summarize, restructure, and generalize past experience; adaptive text compression, which uses information about past text to encode upcoming characters; and user modeling, or the general area of characterizing user behavior. He is director of two large research projects at Waikato: one on machine learning, the another in digital libraries with activities in the area of document compression, indexing, and retrieval.

He has published around 200 refereed papers on machine learning, speech synthesis and signal processing, text compression, hypertext, and computer typography. He has written six books, the latest being Managing Gigabytes: Compressing and Indexing Documents and Images (Van Nostrand Reinhold, 1994, co-authored with A. Moffat and T. Bell).

Rodger J. McNab received Bachelor and Master Degreees in Computer Science from Waikato University, New Zealand. His Masters thesis was on music transcription and its applications. In 1996 he took up a research programming position for the New Zealand Digital Library where he is pursuing work on digital libraries and information retrieval. He is particularly interested in the architecture of digital libraries, indexing, and digital music libraries.

More information

Much of the tutorial is illustrated with examples from the New Zealand Digital Library.

Cost

The charge for registration is $A150 per tutorial. Registrants will receive a copy of the notes for the tutorial, and morning/afternoon tea. All tutorials are offered on an only-if-demand-warrants basis; and full refunds will be given for tutorials that are cancelled because of low enrolments. Tutorial notes will also be available for sale on an individual basis at the conference registration desk.
sigir98@cs.mu.oz.au,
20 April 1998.