These are some very simple results of how case sensitivity affects the zlib compression algorithm. The test is done on the microscape top page which was part of the test performed in the paper "Network Performance Effects of HTTP/1.1, CSS1, and PNG".
We changed the case of the HTML tag names themselves and their attributes like for example "align" and "width". In cases where the right hand side of the name-attribute like attributes where not in quotation marks, these values were also changed.
The original size of the microscape top page is 42919 and neither of the case alternations change this.
Default Compression | Compressed for Speed | Compressed for Size | ||||
---|---|---|---|---|---|---|
Deflated Size | Factor | Deflated Size | Factor | Deflated Size | Factor | |
Lowercase Only | 11468 |
0.27 |
13332 |
0.31 |
11440 |
0.27 |
Uppercase Only | 11579 |
0.27 |
13416 |
0.31 |
11556 |
0.27 |
Alternating Upper and Lowercase | 12203 |
0.28 |
14145 |
0.33 |
12190 |
0.28 |
Binary Mixed Casing | 15203 |
0.35 |
17822 |
0.42 |
15199 |
0.35 |
From this very small test, it looks like lowercase canonicalization of HTML tags gives the best performance. This is not surprising as most of the actual text in the document is lowercase and hence the probability that lowercase HTML tag names can be reused in the dictionary is bigger than if using uppercase tags. Optimizing the compression algorithm for size does not have a significant impact compared to the default compression.
Take this result with a grain of salt as much more testing is needed in order to get more solid data but we believe that we can at least see a trend in the figures obtained above.
Things that we haven't done but would be of interest: