On Identifying Phrases Using Collection Statistics
Simon Gog
Institute of Theoretical Informatics,
Karlsruhe Institute of Technology, Germany;
and
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Alistair Moffat
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Matthias Petri
Department of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Status
"On Identifying Phrases Using Collection Statistics",
Gog, Petri, Moffat,
Proc. 37th European Conf. Information Retrieval",
Vienna, April 2015, pages 278-283.
Abstract
The use of phrases as part of similarity computations can enhance
search effectiveness.
But the gain comes at a cost, either in terms of index size, if all
word-tuples are treated as queryable objects; or in terms of
processing time, if postings lists for phrases are constructed at
query time.
There is also a lack of clarity as to which phrases are
"interesting", in the sense of capturing useful information.
Here we explore several techniques for recognizing phrases using
statistics of large-scale collections, and evaluate their quality.
Full text
http://dx.doi.org/10.1007/978-3-319-16354-3_30