On Identifying Phrases Using Collection Statistics


Simon Gog
Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Germany; and Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.

Matthias Petri
Department of Computing and Information Systems, The University of Melbourne, Victoria 3010, Australia.


Status

"On Identifying Phrases Using Collection Statistics", Gog, Petri, Moffat, Proc. 37th European Conf. Information Retrieval", Vienna, April 2015, pages 278-283.

Abstract

The use of phrases as part of similarity computations can enhance search effectiveness. But the gain comes at a cost, either in terms of index size, if all word-tuples are treated as queryable objects; or in terms of processing time, if postings lists for phrases are constructed at query time. There is also a lack of clarity as to which phrases are "interesting", in the sense of capturing useful information. Here we explore several techniques for recognizing phrases using statistics of large-scale collections, and evaluate their quality.

Full text

http://dx.doi.org/10.1007/978-3-319-16354-3_30