Effective Compression for the Web: Exploiting Document Linkages


Raymond Wan
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.

Alistair Moffat
Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia.


Status

Proc. 12th Australasian Datacase Conference, Gold Coast, Australia, February 2001, 68-75.

Abstract

Providing the infrastructure that supports the World-Wide Web is expensive. The costs incurred in setting up a web site include those associated with the content being served; those associated with the hardware necessary to support the site; and the network costs incurred in transmitting that content to the end consumers. In this work we examine mechanisms for compressing web content so as to reduce the third of these three costs, and describe a scheme that exploits the known connectivities between web pages to derive improved transmission cost savings compared to the obvious approach of simply compressing each page on the site using a standard tool such as GZip. Experiments on a medium-sized web site confirm our claims that considerable reductions in network bandwidth requirements can be achieved.