The tutorial will start with basic research concepts and their application in IR evaluation. Approaches adopted in various classic retrieval experiments will be presented and their limitations will be discussed. More recent evaluative studies conducted at Oregon Health Sciences University, City University London, Rutgers University, and TREC will be used to illustrate efforts towards more user-centered evaluation. The final discussion will sum up the issues and consider future directions in accommodating both system and user oriented evaluation in IR.
An outline of the tutorial is as follows:
1. Overview of basic research concepts (30 min)
a. The operationalization of research questions
b. Experimental design
c. Truth vs. error - causes of error: bias vs. chance
d. Types of bias - selection, measurement, confounding
e. Chance - role of statistics
f. Validity - internal, external
2. Approaches to IR evaluation (30 min)
a. Laboratory vs. operational experiments
b. Batch mode vs. interactive searching
c. Black-box vs. diagnostic analysis
d. Test collections
e. Introduction to relevance
f. Recall and precision measures
3. Limitations of current approaches (30 min)
a. External validity of IR evaluations
b. Problems with batch evaluation
c. Limitations of recall and precision
d. Relevance - topical vs. situational
e. Alternatives to recall and precision
BREAK - (30 min)
4. IR evaluation in medical settings at Oregon Health Sciences University
(25 min)
a. Some specific problems and motivations in medical IR - language,
access speed
b. Review of end-user IR studies in medical settings
c. Development of new approaches to evaluation in medical settings
5. Evaluation experiments from other institutions (25 min)
a. The Okapi experiments: An interactive IR evaluation
facility at City University London
b. User-oriented evaluation studies from Rutgers University
6. The interactive evaluation at TREC (25 min)
a. Dilemmas in experimental design
b. Description of experiments and results from TREC-6
c. Future plans
7. Discussion and future directions for IR evaluation (25 min)