Efficient Extended Boolean Retrieval
Stefan Pohl
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Justin Zobel
Department of Computer Science and Software Engineering,
The University of Melbourne,
Victoria 3010, Australia.
Status
IEEE Trans. Knowledge and Data Engineering,
24(6):1014-1024, June 2012,
published online 10 March 2011.
Abstract
Extended Boolean retrieval (EBR) models were proposed nearly three
decades ago, but have had little practical impact, despite their
significant advantages compared to either ranked keyword or pure
Boolean retrieval.
In particular, EBR models produce meaningful rankings; their query
model allows the representation of complex concepts in an and--or
format; and they are scrutable, in that the score assigned to a
document depends solely on the content of that document, unaffected
by any collection statistics or other external factors.
These characteristics make EBR models attractive in domains typified
by medical and legal searching, where the emphasis is on iterative
development of reproducible complex queries of dozens or even
hundreds of terms.
However, EBR is much more computationally expensive than the
alternatives.
We consider the implementation of the p-norm approach to EBR,
and demonstrate that ideas used in the maxscore and
wand exact optimization techniques for ranked keyword retrieval
can be adapted to allow selective bypass of documents via a low-cost
screening process for this and similar retrieval models.
We also propose term-independent bounds that are able to further
reduce the number of score calculations for short, simple queries
under the extended Boolean retrieval model.
Together, these methods yield an overall saving of 50% to 80% of
the evaluation cost on test queries drawn from biomedical search.
Full text
http://dx.doi.org/10.1109/TKDE.2011.63