ANS-Based Index Compression
Alistair Moffat
School of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Matthias Petri
School of Computing and Information Systems,
The University of Melbourne,
Victoria 3010, Australia.
Status
Proc. 26th ACM CIKM Int. Conf. on Information and
Knowledge Management
Singapore, November 2017,
to appear.
Abstract
Techniques for effectively representing the postings lists associated
with inverted indexes have been studied for many years.
Here we combine the recently developed "asymmetric numeral systems"
(ANS) approach to entropy coding and a range of previous index
compression methods, including VByte, Simple, and Packed.
The ANS mechanism allows each of them to provide markedly improved
compression effectiveness, at the cost of slower decoding rates.
Using the 426 GOV2 collection, we show that the
combination of blocking and ANS-based entropy-coding against a set
of 16 magnitude-based probability models yields compression
effectiveness superior to most previous mechanisms, while still
providing reasonable decoding speed.
Full text
http://doi.acm.org/10.1145/3132847.3132888.
(Author version (PDF)).