SIGIR'98 Tutorial T2
SIGIR'98 logo

21st Annual International ACM SIGIR Conference on
Research and Development in Information Retrieval

Melbourne, Australia

August 24 - 28, 1998

TUTORIAL T2

Models in Information Retrieval



Presenter

Fredric C. Gey,
UC Berkeley

Time

Monday 24 August, 9:00am--12:30pm.

Location

Melbourne Town Hall, Swanston Street, Melbourne.

Description

The three major theoretical models in information retrieval are Boolean/logic, vector space, and probabilistic. This tutorial will explain the unique characteristics and problems of each model and how each model has evolved along different lines. Modern variants of the basic models are explained. The attendees of this tutorial will obtain a basic understanding of the major theoretical models upon which modern text retrieval software is based. The tutorial should provide each participant with a starting point for further self-education.

Schedule
15 min.

  • Background and historical development
  • Luhn and statistical text characteristics
  • Statistical weights and the IDF concept
    45 min
  • Boolean set and logic models
  • Fuzzy logic (RUBRIC/TOPIC)
  • Weighted boolean and P-Norm (INQUERY)
  • Recent logic models
    45 min
  • Vector space and geometric models
  • Basic vector similarity measures
  • Generalized vector space model
  • Latent Semantic Indexing
  • Pivoted normalization similarity
    45 min
  • Probabilistic models
  • Probabilistic indexing and querying
  • 2- Poisson and OKAPI
  • Relevance weights and relevance feedback
  • Inference nets and neural network approaches
  • Regression models
    15 min
  • Performance measurement and analysis
  • Recall, precision, fallout measures
  • Limitations to performance assessment --
  • Interjudge consistency, completeness
  • Statistical significance tests
    Materials: 110 course overheads will be provided as well as 4 pages of bibliography of references covered

    Audience

    This course is designed to provide a fast-paced yet rigorous introduction to the basic models of Information Retrieval for academic and industrial research and development computer scientists whose background lies outside the Information Retrieval area.

    Biography of presenter

    Fredric Gey's research specializes in probabilistic document retrieval using logistic regression techniques. He directs the UC Berkeley entries to the TREC conferences, and will be the General Chairman for SIGIR99 to be held at the University of California, Berkeley during the summer of 1999. He holds a PhD in Information Science from UC Berkeley.

    Cost

    The charge for registration is $A150 per tutorial. Registrants will receive a copy of the notes for the tutorial, and morning/afternoon tea. All tutorials are offered on an only-if-demand-warrants basis; and full refunds will be given for tutorials that are cancelled because of low enrolments. Tutorial notes will also be available for sale on an individual basis at the conference registration desk.
    sigir98@cs.mu.oz.au,
    27 April 1998.