This paper describes work on the block sorting algorithm, especially establishing its relation to other compressors and attempting to improve its compression performance. It is already known to be a form of statistical compressor with unbounded contexts; we show that the contexts are so completely restructured by the sorting that many standard file compression techniques are no longer appropriate. Various approaches are investigated in an attempt to improve the compression, most of which involve a hierarchy of coding models to allow fast adaptation to local contexts. The better coding techniques include one derived from work of Shannon in 1951 in establishing the entropy of English text, while the best employs a novel model especially designed for skewed probability distributions.
The work in this paper confirms block-sorting as a viable text compression technique, with a compression approaching that of the currently best compressors while being much faster than many other compressors of comparable performance.