Efficient and Effective Higher Order Proximity Modeling
Xiaolu Lu
School of Computer Science and Information Technology,
RMIT University,
Victoria 3001, Australia.
Alistair Moffat
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Shane Culpepper
School of Computer Science and Information Technology,
RMIT University,
Victoria 3001, Australia.
Status
Proc. ACM Int. Conf. on Theory of Information Retrieval,
Newark, Delaware, September 2016, pages 21-30.
Abstract
Bag-of-words retrieval models are widely used, and provide a robust
trade-off between efficiency and effectiveness.
These models often make simplifying assumptions about relations
between query terms, and treat term statistics independently.
However, query terms are rarely independent, and previous work has
repeatedly shown that term dependencies can be critical to improving
the effectiveness of ranked retrieval results.
Among all term-dependency models, the Markov Random Field (MRF)
[Metzler and Croft, SIGIR, 2005] model has received the most
attention in recent years.
Despite clear effectiveness improvements, these models are not
deployed in performance-critical applications because of the
potentially high computational costs.
As a result, bigram models are generally considered to be the best
compromise between full term dependence, and term-independent models
such as BM25.
Here we provide further evidence that term-dependency features not
captured by bag-of-words models can reliably improve retrieval
effectiveness.
We also present a new variation on the highly-effective MRF model
that relies on a BM25-derived potential.
The benefit of this approach is that it is built from feature
functions which require no higher-order global statistics.
We empirically show that our new model reduces retrieval costs by up
to 60%, with no loss in effectiveness compared to previous
approaches.
Full text
http://dx.doi.org/10.1145/2970398.2970404.