Enhanced Word-Based Block-Sorting Text Compression


R. Yugo Kartono Isal
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alwin C. H. Ngai
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. 25th Australasian Computer Science Conference, Melbourne, January 2002, pages 129-138.

Abstract

The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to parse text into a stream of words, and then employ block sorting to identify and so exploit any conditioning relationships between words. In this paper we build upon the previous work of two of the authors, describing several further recency rank transformations, and considering also the role of the entropy coder. By combining the best of the new recency transformations with an entropy coder that conditions ranks upon gross characteristics of previous ones, we are able to obtain improved compression on typical text files.