W3C HTTP Performance

Simple Test of Compressing HTML Using ZLib

These are some very simple results of how case sensitivity affects the zlib compression algorithm. The test is done on the microscape top page which was part of the test performed in the paper "Network Performance Effects of HTTP/1.1, CSS1, and PNG".

We changed the case of the HTML tag names themselves and their attributes like for example "align" and "width". In cases where the right hand side of the name-attribute like attributes where not in quotation marks, these values were also changed.

The original size of the microscape top page is 42919 and neither of the case alternations change this.

Compression Results
  Default Compression Compressed for Speed Compressed for Size
Deflated Size  Factor Deflated Size Factor Deflated Size Factor
Lowercase Only

11468

0.27

13332

0.31

11440

0.27

Uppercase Only

11579

0.27

13416

0.31

11556

0.27

Alternating Upper and Lowercase

12203

0.28

14145

0.33

12190

0.28

Binary Mixed Casing

15203

0.35

17822

0.42

15199

0.35

Summary

From this very small test, it looks like lowercase canonicalization of HTML tags gives the best performance. This is not surprising as most of the actual text in the document is lowercase and hence the probability that lowercase HTML tag names can be reused in the dictionary is bigger than if using uppercase tags. Optimizing the compression algorithm for size does not have a significant impact compared to the default compression.

Take this result with a grain of salt as much more testing is needed in order to get more solid data but we believe that we can at least see a trend in the figures obtained above.

Other Things to Test

Things that we haven't done but would be of interest:

Tools

htmlCase.pl
A perl scrip written by Eric Prud'hommeaux that can filter HTML files and force a specific case on HTML tags using stdin and stdout.
deflate
A simple C program that calls the zlib library in order to deflate from stdin to stdout


Henrik Frystyk Nielsen and Eric Prud'hommeaux,
@(#) $Id: HTMLCanon.html,v 1.3 1997/08/09 17:55:48 fillault Exp $