This document is a NOTE made available by the W3 Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will. A list of current NOTEs can be found at: /TR/
Since NOTEs are subject to frequent change, you are advised to reference the above URL, rather than the URLs for NOTEs themselves. The results contained in this note are preliminary - as we perform further experiments it will continue to evolve. When this work is complete and results considered by us to be "final", the status of this Note will be updated to reflect its completion. In particular, further experimentation with range requests is planned soon. Please check back again for further later results.
The results here are provided for community interest, though it has not been rigorously validated and should not alone be used to make commercial decisions. In addition, the exact results are obviously a function of the tests performed; your mileage will vary.
Submitted to ACM SIGCOMM '97.
We describe our investigation of the effect of persistent connections, pipelining and link level document compression on our client and server HTTP implementations. A simple test setup is used to verify HTTP/1.1's design and understand HTTP/1.1 implementation strategies. We present TCP and real time performance data between the libwww robot and both the Jigsaw and Apache HTTP servers using HTTP/1.0, HTTP/1.1 with persistent connections, HTTP/1.1 with pipelined requests, and HTTP/1.1 with pipelined requests and deflate data compression [22]. We also investigate whether the TCP Nagle algorithm has an effect on HTTP/1.1 performance. While somewhat artificial and possibly overstating the benefits of HTTP/1.1, we believe the tests and results approximate some common behavior seen in browsers. The results confirm that HTTP/1.1 is meeting its major design goals. Our experience has been that implementation details are very important to achieve all of the benefits of HTTP/1.1.
For all our tests, a pipelined HTTP/1.1 implementation outperformed HTTP/1.0, even when the HTTP/1.0 implementation used multiple connections in parallel, under all network environments tested. The savings were at least a factor of two, and sometimes as much as than a factor of ten, in terms of packets transmitted. Elapsed time improvement is less dramatic, and strongly depends on your network connection.
Note that the savings in network traffic and performance shown in this document are solely due to the effects of pipelining, persistent connections and transport compression. Some data is presented showing further savings possible by the use of CSS1 style sheets [10], and the more compact PNG [20] image representation that are enabled by recent recommendations at higher levels than the base protocol. Time did not allow full end to end data collection on these cases. The results show that HTTP/1.1 and changes in Web content will have dramatic results in Internet and Web performance as HTTP/1.1 and related technologies deploy over the near future. Universal use of style sheets, even without deployment of HTTP/1.1, would cause a very significant reduction in network traffic.
This paper does not investigate further performance and network savings enabled by the improved caching facilities provided by the HTTP/1.1 protocol, or by sophisticated use of range requests.
The intent of this paper is to present some of the thought processes that we used to test and optimize our implementations in the hopes it may guide others through their own implementation efforts, rather than just present final polished results, which would not serve as guidance to others.
HTTP/1.1 [4] is an upward compatible protocol to HTTP/1.0 [3]. Both HTTP/1.0 and HTTP/1.1 use the TCP protocol [12] for data transport. The effects of HTTP/1.0's use of TCP on the Internet have resulted in major problems caused by congestion and unnecessary overhead [6].
Major HTTP/1.1 goals include:
HTTP/1.1 includes a number of new elements that together should have a major effect on Internet traffic. These include:
HTTP must become a good network citizen to overcome the current Internet congestion problems. The current "World Wide Wait" can only be solved if both HTTP and the content are significantly changed to improve unneeded overhead. If end user performance does not improve, it is unlikely that HTTP/1.1 will be deployed, and is therefore vital to its success.
Protocol elements often interact and have multiple uses; for example, range requests may be very useful to retrieve the remainder of cached images after a communications failure or user interrupted transfer, avoiding retransmission data already successfully transferred. They may also be used to avoid excessive serialization of requests behind a large transfer (see the range requests and validation section). Finally, as an example of another use of range requests, the directory of an Adobe PDF document might be retrieved from the end of a document.
HTTP/1.1 does not attempt to solve some commonly seen problems, such as hot spots, or "flash crowds" at popular web sites, but will at least help these problems.
Simultaneously to the deployment of HTTP/1.1, the Web will see the deployment of style sheets and new image and animation formats, which will also change the nature of the content that HTTP/1.1 will transport. This paper presents measured results of the consequences of HTTP/1.1 protocol additions, and some computed data on effects that the deployment of new content are likely to have in addition, as well as some speculations on how these changes to the Web will affect Internet behavior.
To test the effects of some of these changes, we took data on two tests that simulate contrasting behavior of clients: visiting a site for the first time, where nothing is in a client cache, and revalidating cached items when a site is revisited. We do so in three network environments we believe span common situations of web use: a local Ethernet, a wide area network connected to a local Ethernet, and a dialup PPP connection using a 28.8Kbaud modem.
While both the first time and revalidate tests are likely common simulation of client behavior seen on the web, we have no idea which is more common, and the performance of HTTP/1.1 will likely change client and user behavior to such a large degree that it is impossible to extrapolate from these tests any numeric results on the Internet.
A number of analyses of HTTP/1.0 and proposals influenced HTTP/1.1's design and the work described in this paper.
We synthesized a test web site serving data by combining data (HTML and GIF image data) from two very heavily used home pages (Netscape and Microsoft) into one; hereafter called "Microscape". The initial layout of the Microscape web site was a single page containing typical HTML totalling 42K with 41 GIF inlined images totalling 125K.
The first time test is equivalent to a browser visiting a site for the first time, e.g. its cache is empty and it has to retrieve the top page and all the embedded objects. In HTTP, this is equivalent to 42 GET requests.
This test is equivalent to revisiting a home page where the contents are already available in a local cache. The initial page and all embedded objects are validated, resulting in no actual transfer of the HTML or the embedded objects. In HTTP, this is equivalent to 42 Conditional GET requests. HTTP/1.1 supports two mechanisms for cache validation: etags and date stamps whereas HTTP/1.0 only supports the latter. This test is roughly equivalent to pressing "reload" on a browser, and, depending on configuration, what a browser may do when revisiting a site which has not changed.
HTTP/1.0 support was provided by an old version of libwww (version 4.1D) which supported plain HTTP/1.0 with multiple simultaneous connections between two peers and no persistent cache. In this case we simulated the cache validation behavior by issuing HEAD requests on the images instead of Conditional GET requests. The profile of the HTTP/1.0 revalidation requests therefore was a total of 42 associated with the top page with one GET (HTML) and 41 HEAD requests (images), in the initial tests. The HTTP/1.1 implementation of libwww (version 5.1) differs from the HTTP/1.0 implementation. It uses a full HTTP/1.1 compliant persistent cache generating 42 Conditional GET requests with appropriate cache validation headers to make the test more similar to likely browser behavior. Therefore the number of packets in the results reported below for HTTP/1.0 are higher than of the correct cache validation data reported for HTTP/1.1. We believe cache validation is a very common operation, and will become even more common, given HTTP/1.1's dramatic improvement in semantics and performance in this area.
In order to measure the performance in commonly used different network environments found in today's Internet, we used the following three combinations of bandwidth and latency:
| High bandwidth, low latency | LAN - 10Mbit Ethernet |
| High bandwidth, high latency | WAN - Massachusetts (MIT/LCS) to California (LBL) |
| Low bandwidth, high latency | PPP - 28.8k modem line using a telephone switch simulator |
Several platforms were used in the initial stage of the experiments for running the HTTP servers. However, we ended up using relatively fast machines to try to prevent unforeseen bottlenecks in the applications used. Jigsaw is written entirely in Java and relies on specific network features for controlling TCP provided only by JDK 1.1.
| Component | Type and Version |
|---|---|
| Server Hardware | www26.w3.org (Sparc Ultra-1, Solaris 2.5) |
| LAN Client Hardware | zorch.w3.org (Digital AlphaStation 400 4/233, Digital UNIX 4.0a) |
| WAN Client Hardware | turn.ee.lbl.gov (Digital AlphaStation 3000, Digital UNIX 4.0) |
| PPP Client Hardware | big.w3.org (Pentium Pro PC, Windows NT Server 4.0) |
| HTTP Server Software | Jigsaw 1.05 and Apache 1.2b7 + patches |
| HTTP Client Software | libwww robot and Netscape Communicator 4.0 beta 1 on Windows NT |
None of the machines were under significant load while the tests were run. The server is identical through our final tests - only the client changes connectivity and behavior. Both Jigsaw and Libwww are currently available with HTTP/1.1 implementations without support for the features described in this paper and Apache is in beta release. During the experiments changes were made to all three applications. These changes will be made available through normal release procedures for each of the applications.
In order to get tcpdumps of the PPP packets over a modem line, we had to route the packets through a UNIX system where we could obtain tcpdumps. We set up a Linux system with both a PPP interface and an Ethernet interface and connected the Windows NT system to the network interface and the PPP interface to the telephone system and changed the routes accordingly. Due to wet weather we constantly had problems with the public telephone lines. We therefore set up our own four port telephone switch simulator for handling PPP data internally.
The HTTP/1.0 robot was set to use plain HTTP/1.0 requests using a TCP connection per request. We set the maximum number of simultaneous connections to 6 (by default, existing HTTP/1.0 applications like Netscape Navigator often use 4 simultaneous connections between client and server; many users, however raise this to a larger number). Using 6 instead of 4 gives parallel connections an edge (at least on higher speed networks) over default behavior exhibited by many browsers.
After testing HTTP/1.0, we then to run the robot as a simple HTTP/1.1 client using persistent connections. That is, the request / response sequence looks identical to HTTP/1.0 but all communication happens on the same TCP connection instead of 6, hence serializing all requests. The results as seen in the table below was a significant saving in TCP packets using HTTP/1.1 but also a big drop in elapsed time.
As a means to lower the elapsed time and improve the efficiency, we introduced pipelining into libwww. That is, instead of waiting on a response to arrive before issuing new requests, as many requests as possible are issued at once. The responses are still serialized and no changes were made to the HTTP messages; only the timing has changed as the robot has multiple outstanding requests on the same connection as illustrated in the figure below:
The robot generates quite small HTTP requests - our library implementation is very careful not to generate unnecessary headers and not to waste bytes on white space. The result is an average request size of around 190 bytes, which is smaller than many existing HTTP implementations.
The requests are buffered before transmission so that multiple HTTP requests can be sent with the same TCP segment. This has a significant impact on the number of packets required to transmit the payload and lowers CPU usage by both client and server. A consequence of the output buffering is that we need a mechanism to flush the output buffer. First we implemented a version with two mechanisms:
| Raw tcpdumps | |||
| Max simultaneous sockets | |||
| Total number of sockets used | |||
| Packets from client to server | |||
| Packets from server to client | |||
| Total number of packets | |||
| Total elapsed time [secs] |
We were simultaneously very happy and quite disappointed with the initial results above, taken late at night on a quiet Ethernet. Elapsed time performance of HTTP/1.1 with pipelining was worse than HTTP/1.0 in this initial implementation, though the number of packets used were dramatically better. We scratched our heads for a day, then convinced ourselves that on a local Ethernet, there was no reason that HTTP/1.1 should ever perform more slowly than HTTP/1.0 (since the network overhead is so much lower and the local Ethernet cannot suffer from fairness problems that might give multiple connections a performance edge in a long haul network), so we dug into our implementation further.
After study, we realized that the application (the robot) has much more knowledge about the requests than libwww, and by introducing an explicit flush mechanism in the application, we could get significantly better performance. We modified the robot to force a flush after issuing the first request on the HTML document and then buffer the following requests on the inlined images. While a HTTP library can be arranged to automatically flush its buffers automatically after some timeout (when the data above was taken, the timeout was set to 1 second), taking advantage of knowledge in the application can result in a considerably faster implementation than relying on such a timeout. The final results show HTTP/1.1 elapsed time performance significantly faster than HTTP/1.0 on even a local network.
We expected due to experience of one of the authors that a pipelined implementation of HTTP might encounter the Nagle algorithm [2] [5] in TCP. The Nagle algorithm was introduced in TCP as a means of reducing the number of small TCP segments by delaying their transmission in hopes of further data becoming available, as commonly occurs in telnet or rlogin traffic. As our implementation can generate data asynchronously without waiting for a response, the Nagle algorithm could be a bottleneck. In order to test this we turned the Nagle algorithm off in both the client and the server. This was the first change to the server - all other changes were made in the client. In our initial tests, we did not observe significant problems introduced by Nagle's algorithm, though with hindsight, this was the result of our pipelined implementation and the specific test cases chosen, since with effective buffering, the segment sizes are large, avoiding Nagle's algorithm. In later experiments in which the buffering behavior of the implementations were changed, we did observe significant (sometimes dramatic) transmission delays due to Nagle; we recommend therefore that HTTP/1.1 implementations that buffer output disable Nagle's algorithm (set the TCP_NODELAY socket option). This confirms the experiences of Touch [7].
We then improved Jigsaw's output buffering behavior. For each connection, it maintains a response buffer that it flushes either when full, or when there is no more requests coming in on that connection, or before it goes idle. This allows aggregating responses (for example, cache validation responses) into fewer packets even on a high-speed network, and saving CPU time for the server.
We also performed some tests against the Apache 1.2b2 server, which also supports HTTP/1.1, and observed essentially similar results to Jigsaw. Its output buffering in that initial beta test release was not yet as good as our revised version of Jigsaw, and in that release it processes at most five requests before terminating a TCP connection. When using pipelining, the number of HTTP requests served is often a poor indicator for when to close the connection. We discussed these results with Dean Gaudet and others of the Apache group and similar changes were made to the Apache server; our final results below are using a version of Apache 1.2b7 plus patches provided by Dean Gaudet.
With the modified applications, we took a complete set of data, both the first time retrieval and cache validation, in all three network environments. At the same time, to make the tests closer to a real implementation, we took the opportunity to change the HTTP/1.1 version of the robot to issue full HTTP/1.1 cache validation requests which use the If-None-Match header and opaque validators, rather than the HEAD requests used in our HTTP/1.0 version of the robot.
It was easiest to implement this functionality by enabling persistent caching in libwww, but this had unexpected consequences; an initial performance run resulted in worse performance than our first set of tests. Further analysis showed that libwww's implementation of persistent caching on disk is written for ease of porting and implementation rather than performance. Each cached object contains two independent files: one containing the cacheable message headers and the other containing the message body. This would be an area that one would optimize carefully in a product implementation; the overhead in our implementation became a performance bottleneck in our HTTP/1.1 tests. Time and resources did not permit optimizing this code. Our final measurements use correct HTTP/1.1 cache validation requests, and are run with a persistent cache on a memory file system to reduce the disk performance problems that we observed.
The measurements below therefore represent a second round of data collection, against servers (both Jigsaw and Apache) which have optimized output buffering. While Jigsaw had outperformed Apache in the first round of tests, Apache now outperforms Jigsaw.
After having determined that HTTP/1.1 outperforms HTTP/1.0 we decided to try other means of optimizing the performance. We therefore investigated how much we would gain by using data compression of the HTTP message body. That is, we do not compress the HTTP headers, but only the body using the "Content-Encoding" header to describe the encoding mechanism. We use the zlib compression library version 1.04, which is a freely available C based code base. It has a stream based interface which interacts nicely with the libwww stream model. Note that the PNG library uses zlib, so common implementations will share the same data compression code. Implementation was at most a day or two.
The client indicates that it is capable of handling the "deflate" content coding by sending an "Accept-Encoding: deflate" header in the requests. In our test, the server does not perform on-the-fly compression but sends out a precomputed deflated version of the Microscape page. The client performs on-the-fly inflation and parses the inflated HTML using its normal HTML parser.
The zlib library has several flags for how to optimize the compression algorithm, however we used the default values for both deflating and inflating. In our case this caused the Microscape HTML page to be compressed more than a factor of three from 42K to 11K. We believe that this is a typical factor of gain using this algorithm on HTML files.
The result was positive for all three types of connections: LAN, WAN, and PPP. On a LAN the number of saved TCP packets amounted to 28 which is a 17% gain and on a WAN we saw a gain of 14% and also gain in time to handle the data transfer.
The tables shown in these tables are a summary of the more detailed data acquisition overview. In all cases, the traces were taken on client side, as this is where the interesting delays are. Each run was repeated 5 times in order to make up for network fluctuations.
As a means to compare the various experiments, we define the "efficiency" as the ratio of the number of bytes in the payload to the total number of bytes transmitted:
Efficiency = Bytes in Payload / Total number of bytes
| Cache validation | ||||||||
|---|---|---|---|---|---|---|---|---|
| Packets | bytes | time [sec] | efficiency | Packets | bytes | time [sec] | efficiency | |
| HTTP/1.0 | 455.2 | 187525.6 | 362.8 | 58993.0 | ||||
| HTTP/1.1 | 234.4 | 189938.0 | 88.4 | 16878.0 | ||||
| HTTP/1.1 Pipelined | 168.0 | 189646.0 | 27.6 | 16878.0 | ||||
| HTTP/1.1 Pipelined and compression |
140.4 | 158460.0 | 27.2 | 16873.0 | ||||
| Cache validation | ||||||||
|---|---|---|---|---|---|---|---|---|
| Packets | bytes | time [sec] | efficiency | Packets | bytes | time [sec] | efficiency | |
| HTTP/1.0 | 455.4 | 191808.2 | 339.6 | 60745.0 | ||||
| HTTP/1.1 | 254.4 | 190965.2 | 90.0 | 16916.4 | ||||
| HTTP/1.1 Pipelined | 210.6 | 190635.8 | 26.8 | 17170.0 | ||||
| HTTP/1.1 Pipelined and compression |
181.0 | 159032.4 | 27.8 | 16873.0 | ||||
| Cache validation | ||||||||
|---|---|---|---|---|---|---|---|---|
| Packets | bytes | time [sec] | efficiency | Packets | bytes | time [sec] | efficiency | |
| HTTP/1.0 **) | 489 | 235027 | - | - | ||||
| HTTP/1.1 | 349.6 | 189458.0 | 129.0 | 16800.0 | ||||
| HTTP/1.1 Pipelined | 286.0 | 190383.2 | 32.0 | 16868.0 | ||||
| Cache validation | ||||||||
|---|---|---|---|---|---|---|---|---|
| Packets | bytes | time [sec] | efficiency | Packets | bytes | time [sec] | efficiency | |
| HTTP/1.0 | 449.8 | 188237.4 | 339.4 | 59008.0 | ||||
| HTTP/1.1 | 232.8 | 187618.0 | 88.0 | 13731.0 | ||||
| HTTP/1.1 Pipelined | 163.2 | 187618.0 | 24.4 | 13731.0 | ||||
| Cache validation | ||||||||
|---|---|---|---|---|---|---|---|---|
| Packets | bytes | time [sec] | efficiency | Packets | bytes | efficiency | ||
| HTTP/1.0 | 473.6 | 191385.4 | 340.6 | 59008.0 | ||||
| HTTP/1.1 | 252.0 | 188786.0 | 88.8 | 13755.2 | ||||
| HTTP/1.1 Pipelined | 204.0 | 188811.2 | 25.2 | 13731.0 | ||||
*) These measurements were performed using Netscape Communicator 4.0 beta 1 with max 4 simultaneous connections and HTTP/1.0 keep-alive connections. The Netscape HTTP client implementation uses the HTTP/1.0 Keep-Alive mechanism to allow for multiple HTTP messages to be transmitted on the same TCP connection. It therefore used 8 connections compared to 42 for the libwww HTTP/1.0 implementation, in which this feature was disabled.
Implementations need to close connections carefully. HTTP/1.0 implementations often naively close both halves of the TCP connection simultaneously when finishing the processing of a request. A pipelined HTTP/1.1 implementation can cause major problems if it does so.
The scenario is as follows: an HTTP/1.1 client talking to a HTTP/1.1 server starts pipelining a batch of requests, for example 15 on an open TCP connection. The server decides that it will not serve more than 5 requests per connection and closes the TCP connection in both directions after it successfully has served the first five requests. The remaining 10 requests that are already sent from the client will along with client generated TCP ACK packets arrive on a closed port on the server. This "extra" data causes the server's TCP to issue a reset which makes the client TCP stack pass the last ACK'ed packet to the client application and discard all other packets. This means that HTTP responses that are either being received or already have been received successfully but haven't been ACK'ed will be dropped by the client TCP. In this situation the client does not have any means of finding out which HTTP messages were successful or even why the server closed the connection. The server may have generated a "Connection: Close" header in the 5th response but the header may have been lost due to the TCP reset. Servers must therefore close each half of the connection independently.
TCP's congestion control algorithms [11] work best when there are enough packets in a connection that TCP can determine the optimal rate at which to insert packets into the Internet, and the performance of TCP is also best once a connection is beyond the Slow Start algorithm. Observed packet trains in the Internet have been dropping [13], almost certainly due to HTTP/1.0's behavior, as demonstrated in the data above, where a single connection rarely involves more than 10 packets, including TCP open and close. Some IP switch technology exploits packet trains to enable faster IP routing. In the tests above, the packet trains are longer, but not as much longer as one might first expect, since many fewer, larger packets are transmitted due to pipelining. While the HTTP/1.1 proposed standard specification does permit two connection to be established between a client/server pair, it is clear that dividing the mean length of packet trains down by a factor of two would put diminish the benefits to the Internet (and possibly to the end user due to slow start) substantially. Range requests need to be exploited to enable good interactive feel in Web browsers while using a single connection. Connections should be maintained as long as makes reasonable engineering sense, to pick up user's "click ahead" while following links.
We believe the content transported by HTTP will be changing significantly over the next several years, with the introduction of style sheets and incremental improvements in image and animation formats. This section explores some of the changes we may see if these facilities are exploited significantly.
The web has lacked most of the facilities that graphics designers have used to control presentation and layout; as a result, many pages seen on the web have been painfully synthesized using straight HTML and a plethora of small images. Many graphical elements in the Microscape page could easily be expressed using style sheets. The table below lists all images that appear in the "Microscape" test page and gives an estimate of which images that might be replaced by more compact HTML+CSS1 code.
| File name | GIF size | PNG size | HTML+CSS estimated size | Comments | |
|---|---|---|---|---|---|
| content | markup | ||||
| about1 | 1403 | 653 | 20 | 250 | |
| action | 788 | 770 | 15 | 50 | |
| enter | 157 | 166 | 10 | 50 | |
| prod | 165 | 158 | 10 | 40 | |
| search | 151 | 165 | 10 | 30 | |
| shop | 128 | 131 | 10 | 30 | |
| solutions | 682 | 638 | 10 | 60 | |
| support | 156 | 161 | 10 | 30 | |
| 1ptrans | 44 | 83 1 | 0 | 30 | the CSS property will be set on an existing element |
| spacer | 70 | 70 | 0 | 70 | a new element should probably be added |
| spacer1 | 40 | 70 | 0 | 70 | ditto |
| spacer2 | 69 | 73 | 0 | 70 | ditto |
| vrule | 62 | 74 | 0 | 70 | ditto |
| nav_home | 1664 | 1355 | 150 | 250 | it's possible to make button bars with floating textual elements |
| navigation_bar | 1698 | 1457 | 150 | 250 | ditto |
| Sum: 15/40 images | 7277 | 6024 | 395 | 1350 | |
| File name | GIF size | PNG size | HTML+CSS estimated size | Comments | |
|---|---|---|---|---|---|
| content | markup | ||||
| arrowbl | 69 | 113 | 4 | 60 | A similar arrow glyph exists in Unicode |
| arrowgr | 75 | 125 | 4 | 60 | A similar arrow glyph exists in Unicode |
| arrowr | 75 | 125 | 4 | 60 | A similar arrow glyph exists in Unicode |
| comdex_6 | 799 | 743 | 20 | 200 | the two-colored pointer complicates matters |
| Sum: 4/40 images | 1018 | 1106 | 32 | 380 | |
| File name | GIF size | PNG size | HTML+CSS estimated size | Comments | |
|---|---|---|---|---|---|
| content | markup | ||||
| tagline | 2718 | 2581 | 30 | 300 | the shadow effects require negative margins |
| worldwide | 1698 | 1583 | 20 | 300 | the shadow effects require negative margins |
| h_microsoft | 2080 | 1877 | 20 | 300 | |
| Sum: 3/40 images | 6496 | 6041 | 70 | 900 | |
| File name | GIF size | PNG size | HTML+CSS estimated size | Comments | |
|---|---|---|---|---|---|
| content | markup | ||||
| Pacman1 | 4378 | 3841 | 50 | 300 | probably can be reduced by 50% |
| one_sm | 2917 | 2534 | 30 | 150 | probably can be reduced by 50% |
| home_on | 246 | 215 | 10 | 80 | the "house" can't be found in Unicode |
| Sum: 3/40 images | 7541 | 6590 | 90 | 520 | |
| File name | GIF size | PNG size | Comments |
|---|---|---|---|
| idc | 2428 | 1545 | |
| comdex | 1102 | 1117 | |
| pointcast_small | 786 | 730 | |
| Sum: 3/40 images | 4316 | 3392 |
| File name | GIF size | PNG size | Comments |
|---|---|---|---|
| msft | 366 | 422 | Microsoft logo, very textual but with two different "o" glyphs |
| tbi | 4012 | 3511 | |
| netnow3 | 1884 | 1294 | rotated text not possible in CSS |
| home_igloo | 40095 | 35933 | |
| n | 1435 | 1107 | |
| Sum: 5/40 images | 47792 | 42237 |
| File name | GIF size | PNG size | Comments |
|---|---|---|---|
| clinton | 2095 | 1912 | |
| appfoundry | 4853 | 4461 | |
| commish | 7540 | 7115 | |
| inbox_img | 5076 | 4668 | |
| rolodex | 3340 | 3145 | |
| sports | 3544 | 3231 | |
| woofer | 2411 | 2174 | |
| Sum: 7/42 images | 28859 | 26706 |
| File name | GIF size | PNG size | Comments |
|---|---|---|---|
| ie_animated | 9132 | - 3 |
20 frame animation |
| msinternet | 15856 | - 3 |
16 frame animation |
| Sum: 2/2 animations | 24988 |
The observations here are very preliminary, but indicate that style sheets may make a very significant impact on bandwidth (and end user delays) of the web. Savings from PNG in this data however are modest.
Notes: PNG files listed here additionally contain gamma information, so that they display the same on all platforms; this adds 16 byte
s per image. The GIF images do not contain this information.
The conversion of images to PNG was not optimal. The GIFs were clearly optimized by experts. PNG does not perform as well on the very low bit depth images in the sub-200 byte category because its checksums and other information make the file a bit bigger even though the actual image data is often smaller.
The HTML+CSS1 sizes are estimates. At this date of this writing, no CSS browser is able to render all the replacements in HTML+CSS.
Pipelining implementation details can make a very significant difference on network traffic, and bear some careful thought, understanding, and testing. To take full advantage of pipelining in applications may require explicit interfaces to flush buffers and other minor changes to applications.
To get optimal performance over a single connection in HTTP/1.1 implementations, the read buffer size of an implementation, and the details of how urgently data is read from the operating system, can be very significant. If too much data accumulates in a socket buffer TCP may delayed ACKS by 200ms. Opening multiple connections in HTTP/1.0 resulted in more socket buffers in the operating system, which as a result imposed lower requirements of speed on the application, while keeping the network busy.
We estimate two people for two months implemented the work reported on here, starting from working HTTP/1.0 implementations. We expect others leveraging from the experience reported here might accomplish the same result in much less time, though of course we may be more expert than many due to our involvement in HTTP/1.1 design.
Our principle data gathering tool is the widely available tcpdump program [14]. Some vendors do not ship tcpdump; others ship older versions of tcpdump. We found it necessary to install a current version (version 3.3) on all platforms we used, due to the observation that the last FIN TCP packet was often missing. We used both the NetMon program and IP forwarding on NT to gather the data for the PPP connection. The output is incompatible with tcpdump but a conversion made it possible to use our tcpdump tools for handling the data.
We also used Tim Shepard's xplot program [8] to graphically plot the dumps; this was very useful to find a number of problems in our implementation not visible in the raw dumps. We looked at both data going over both directions of the TCP connections. In the detailed data summary, there are direct links to all dumps in xplot formats. The tcpshow program [21] was very useful when we needed to see the contents of packets to understand what was happening.
In addition to these generic TCP analysis tools, we produced a set of dedicated tools for handling the large amount of data taken:
We have thought about realistic uses of HTTP/1.1 by browsers. Browsers need to know most urgently if an object has changed in a way that requires reformatting the page. Additionally, if an embedded image is large and no size is specified, a browser will want to be able to layout the page completely before finishing retrieval of embedded objects. HTTP /1.1 defines (and most current HTTP/1.0 servers implement) byte range facilities that allow a client to perform partial retrieval of objects. We believe therefore that the natural revalidation request for HTTP/1.1 will combine both cache validation headers and an If-Range request header, to prevent large objects from monopolizing the connection to the server over its connection. The range requested should be large enough to usually return any embedded metadata for the object for the common data types. This capability of HTTP/1.1 is implicit in its caching and range request design.
When a browser revisits a page, it has a very good idea what the type of any embedded object is likely to be, and can therefore both a cache validation request and also simultaneously request the metadata of the embedded object (to detect any change). This information is much more valuable than the embedded image data itself. Subsequently, the browser might generate requests for the rest of the object, or for enough of each object to allow for progressive display of image data types (e.g. progressive PNG, GIF or JPEG images), or to multiplex between multiple large images on the page. We call this style of use of HTTP/1.1 "poor man's multiplexing". Further work is underway [9] to experiment with a multiplexing transport to provide a better way to multiplex the connection than this crude (and relatively high overhead) way provided by HTTP/1.1.
We therefore believe cache validation combined with range requests will likely become a very common idiom of HTTP/1.1.
The HTTP metadata (as opposed to metadata stored inside the object itself) can become a significant amount of overhead for the "poor man's multiplexing" that browsers are likely to want to perform (first to get the metadata stored in an object, and then to perform progressive display of embedded images). A naive implementation (as in the Jigsaw implementation used for the initial tests) might resend all of the HTTP metadata headers in a 206 (Partial Content) response. The HTTP/1.1 proposed standard specification is silent on which header fields are sent on a 206 response. We believe implementations will likely want to be careful on what headers are sent with partial content, and the HTTP/1.1 specification may want clarification in this area.
We believe the CPU time savings of HTTP/1.1 is very substantial due to the great reduction in TCP open and close and savings in packet overhead, and could now be quantified for Apache (currently the most popular Web server on the Internet). HTTP/1.1 will increase the importance of reducing parsing and data transport overhead of the very verbose HTTP protocol, which, for many operations, has been swamped by the TCP open and close overhead required by HTTP/1.0. Optimal server implementations for HTTP/1.1 will likely be significantly different than current servers.
Connection management bears significant further experimentation and modeling. Padmanabhan [1] gives some guidance on how long connections should be kept open, but this work needs updating to reflect current content and usage of the Web, which have changed significantly since completion of the work.
Persistent connections, pipelining, transport compression, as well as the widespread adoption of style sheets (e.g. CSS1) and more compact image representations (e.g. PNG) will increase the relative overhead of the very verbose HTTP text based protocol, particularly for high latency and low bandwidth environment such cellular telephones and other wireless situations. A binary encoding or tokenized compression of HTTP and/or a replacement for HTTP will become more urgent given these changes in the infrastructure of the Web.
We have not investigated perceived time to render (our browser has not yet been converted to use HTTP/1.1), but with the range request techniques outlined below, we believe HTTP/1.1 can perform well over a single connection. PNG also provides significant time to render benefits relative to GIF. The best strategies to optimize time to render are clearly significantly different from those used by HTTP/1.1.
We did not have time to perform a test that would show the relative benefits of deflate compression relative to the data compression provided by current modems.
Future work worth investigating here includes the use of compression dictionaries optimized for HTML and CSS1 text.
For HTTP/1.1 to outperform HTTP/1.0 in elapsed time, an implementation must implement pipelining. Properly buffered pipelined implementations will gain additional performance and reduce network traffic further.
HTTP/1.1 implemented with pipelining outperformed HTTP/1.0, even when the HTTP/1.0 implementation uses multiple connections in parallel, under all circumstances tested. In terms of packets transmitted, the savings are typically at least a factor of two, and often much more, for our tests. Elapsed time improvement is less dramatic, but significant, and all HTTP/1.1 tests using pipelining and a single connection out performed HTTP/1.0 tests using multiple connections.
Since bandwidth savings due to HTTP/1.1 and associated techniques are modest (between 2% and 35% depending on the techniques used in our tests), it is clear that the HTTP/1.1 work on caching is as important as the improvements reported in this paper to conserving total bandwidth on the Internet. Hotspots on the network also strongly argue for good caching systems. The addition of transport compression in HTTP/1.1 provided the largest bandwidth savings. The savings of HTTP/1.1 in terms of number of packets, however, are truly dramatic.
HTTP/1.1 will significantly change the character of traffic on the Internet (given HTTP's dominant fraction of internet traffic), with significantly larger mean packet sizes, more packets per TCP connection, and drastically fewer packets that are not subject to flow control (by elimination of most packets due to TCP open and close).
HTTP/1.1 changes dramatically the "cost" and performance of HTTP, particularly for revalidating cached items. As a result, we expect that applications will significantly change their behavior. For example, caching proxies intended to enable disconnected operation may find it feasible to perform much more extensive cache validation than was feasible with HTTP/1.0. Researchers and product developers should be very careful when extrapolating from current Internet and HTTP server log data future web or Internet traffic and should plan to rework any simulations as these improvements to web infrastructure deploy.
Changes in web content enabled by deployment of style sheets; more compact image, graphics and animation representations will also significantly improve network and perceived performance during the period that HTTP/1.1 is being deployed. To our surprise, style sheets promise to be the biggest possibility of major network performance improvements, whether deployed with HTTP/1.0 or HTTP/1.1, by significantly reducing the need for inlined images to provide graphic elements, and the resulting network traffic. Heavy use of style sheets whenever possible will result in the greatest observed improvements in downloading new web pages, without sacrificing sophisticated graphics design.
Modest, careful implementations can achieve all of the goals set out for HTTP/1.1.
[1] V.N. Padmanabhan, J. Mogul, "Improving HTTP Latency", Computer Networks and ISDN Systems, v.28, pp. 25-35, Dec. 1995. Slightly revised version of paper in Proc. 2nd International WWW Conference '94: Mosaic and the Web, Oct. 1994
[2] J. Nagle, "Congestion Control in IP/TCP Internetworks," RFC 896, Ford Aerospace and Communications Corporation, January 1984.
[3] T. Berners-Lee, R. Fielding, H. Frystyk. "Informational RFC 1945 - Hypertext Transfer Protocol -- HTTP/1.0," MIT/LCS, UC Irvine, May 1996
[4] R. Fielding, J. Gettys, J.C. Mogul, H. Frystyk, T. Berners-Lee, "RFC 2068 - Hypertext Transfer Protocol -- HTTP/1.1," UC Irvine, Digital Equipment Corporation, MIT
[5] J. Touch, J. Heidemann, K. Obraczka, "Analysis of HTTP Performance," USC/Information Sciences Institute, June, 1996.
[6] S. Spero. "Analysis of HTTP Performance Problems," July 1994
[7] J. Heidemann, "Performance Interactions Between P-HTTP and TCP Implementation," USC/Information Sciences Institute, Submitted for publication to ACM Computer Communication Review.
[8] T. Shephard, Source for this very useful program is available at ftp://mercury.lcs.mit.edu/pub/shep. S.M. thesis "TCP Packet Trace Analysis" for David Clark at the MIT Laboratory for Computer Science. The thesis can be ordered from MIT/LCS Publications. Ordering information can be obtained from +1 617 253 5851 or send mail to publications@lcs.mit.edu. Ask for MIT/LCS/TR-494.
[9] J. Gettys, "Simple MUX Protocol Specification," World Wide Web Consortium.
[10] H. Lie, B. Bos, "Cascading Style Sheets, level 1," W3C Recommendation, World Wide Web Consortium, 17 Dec 1996.
[11] Van Jacobson. "Congestion Avoidance and Control". In Proc. SIGCOMM '88 Symposium on Communications Architectures and Protocols, pages 314-329. Stanford, CA, August 1988.
[12] Jon B. Postel. "Transmission Control Protocol," RFC 793, Network Information Center, SRI International, September, 1981.
[13] V. Paxson, "Growth Trends in Wide-Area TCP Connections," IEEE Network, Vol. 8 No. 4, pp. 8-17, July 1994.
[14] V. Jacobson, C. Leres, and S. McCanne, tcpdump, available at ftp://ftp.ee.lbl.gov/tcpdump.tar.Z
[15] R. W. Scheifler, J. Gettys, "The X Window System," ACM Transactions on Graphics # 63, Special Issue on User Interface Software.
[16] Mark S. Manasse and Greg Nelson, "Trestle Reference Manual," Digital Systems Research Center Research Report # 68, December 1991.
[17] Braden, R., "Extending TCP for Transactions -- Concepts," RFC-1379, USC/ISI, November 1992.
[18] Braden, R., "T/TCP -- TCP Extensions for Transactions: Functional Specification," RFC-1644, USC/ISI, July 1994.
[19] Touch, J., "TCP Control Block Interdependence," (work in progress), USC/ISI, June 1996.
[20] T. Boutell, T. Lane et. al. "PNG (Portable Network Graphics) Specification", W3C Recommendation, October 1996, RFC 2083, Boutell.Com Inc., January 1997. General PNG information can be found at /Graphics/PNG.
[21] M. Ryan, tcpshow, I.T. NetworX Ltd., 67 Merrion Square, Dublin 2, Ireland, June, 1996.
[22] P. Deutsch, "DEFLATE Compressed Data Format Specification version 1.3," RFC 1951, Aladdin Enterprises, May 1996.
[23] L. Peter Deutsch, Jean-Loup Gailly, "ZLIB Compressed Data Format Specification version 3.3," RFC 1950, Aladdin Enterprises, Info-ZIP, May 1996.
Jeff Mogul of Digital's Western Research Laboratory has been instrumental in making the case for both persistent connections and pipelining in HTTP. We are very happy to be able to produce data with a real implementation confirming his and V.N. Padmanabhan's results and for his discussions with us about several implementation strategies to try.
Our thanks to Sally Floyd, Van Jacobson, and Craig Leres for use of a machine at Lawrence Berkeley Labs for the high bandwidth/high latency test.
Our thanks to Dean Gaudet of the Apache group for his timely cooperation to optimize Apache's HTTP/1.1 implementation given our feedback.
The World Wide Web Consortium supported this work.
Henrik Frystyk
Nielsen
W3 Consortium
MIT Laboratory for Computer Science
545 Technology Square Cambridge, MA 02139, USA
Fax: +1 (617) 258 8682
Email: frystyk@w3.org
Jim Gettys
W3 Consortium
MIT Laboratory for Computer Science
545 Technology Square
Cambridge, MA 02139, USA
Fax: +1 (617) 258 8682
Email: jg@w3.org
Anselm Baird-Smith
W3 Consortium
Institut National de Recherche en Informatique et en Automatique
2004, route des Lucioles
BP 93 06902 Sophia Antipolis Cedex
France
Email: anselm@w3.org
Eric Prud'hommeaux
W3 Consortium
MIT Laboratory for Computer Science
545 Technology Square Cambridge, MA 02139, USA
Fax: +1 (617) 258 8682
Email: eric@w3.org
Håkon W. Lie
Institut National de Recherche en Informatique et en Automatique
W3C 2004,
route des Lucioles - B.P. 93 06902
Sophia Antipolis Cedex
France
Fax : +33 (0) 493657765
Email: howcome@w3.org
Chris Lilley
World Wide Web Consortium
INRIA, Projet W3C 2004,
Route des Lucioles - B.P. 93
06902 Sophia Antipolis Cedex
France
Fax: +33 93 65 77 65
Email: chris@w3.org
Corresponding Author: Jim Gettys