W3C Recommendations Reduce 'World Wide Wait'

Tired of having to make coffee while you wait for a home page to download?

The World Wide Web Consortium (W3C) has been coordinating the efforts of its research groups to address the increasingly urgent issue of Web slowdown, dubbed by pundits "The World Wide Wait." An early HTTP/1.1 study by the Consortium entitled "Network Performance Effects of HTTP/1.1, CSS1, and PNG" (presented at ACM SIGCOMM '97) found that, when used in tandem, innovations in the HTTP Protocol, Cascading Style Sheets, and Portable Network Graphics can dramatically reduce page download times and ease the load on the global Internet. The study explains how the proper integration of several new technologies can make page downloads 20%-400% faster and reduce Web-generated Internet traffic by as much as 50%.

Soon, you may be down to one cup a day!

Causes of slow Web performance today

So why does it take so long for a Web page to download? Let's look at what goes on during a typical download to isolate the potential performance bottlenecks.

Resolving the URI

Every resource on the Web may be referred to by an address called a Uniform Resource Identifier, or URI. When you click on a hyperlink in your Web browser or type in a Web address seen a magazine article, you are requesting the resource designated by the URI of the hyperlink (e.g., a Web page). URIs such as http://www.w3.org/RDF were designed to be easy to remember, but for this convenience we pay a price -- human-readable information must be translated for use by a computer. Each URI contains three pieces of information: the name of the Web resource to be retrieved, the name of the computer that houses it, and a description of how to transfer it over the Internet. So http://www.w3.org/RDF means ``Transfer the resource named RDF from the computer named www.w3.org using http.''

The first bottleneck can occur during the translation of the computer's name (called the domain name) into a sequence of numbers (called the IP address). This process is called domain name resolution and although it takes time, it's so much easier to remember www.w3.org than 18.23.0.22, we choose to live with it. However, improper resolution strategies can cause slow Web performance.

Connecting to the Web Server

Once the domain name has been resolved, the information it points to must be retrieved. To do this, the user's machine (the client) attempts to connect with the destination computer (the server). Before the two computers can exchange any data, they must agree to speak to one another, just as two people exchange ``hellos'' at the beginning of a telephone call. We call this time-consuming round of introductions handshaking. The greater the distance between the computers, the longer the delay in communication, the slower the handshake. If the server is busy, expect even slower response times and a coffee refill.

There are ways to speed up client-server connections. For instance, caching information - storing snapshots of the resource at strategic places in the network or in a company's proxy servers - speeds up response times since repeated requests for the same resource travel shorter distances. To reap the full benefits of a caching mechanism, users and content providers must be able to tailor it to their particular needs (e.g., have the cache copy validated at different times).

Requesting the Web Resource

Now that the client and server have agreed to speak, the real download can commence. The user's browser requests a resource on the server, which returns a snapshot of it. If the resource is an HTML document, the browser tries to lay out the text, tables, forms, etc. as the document comes in. However, images, sound bites, external style sheets and some other parts of the document are independent resources that the browser has to fetch through additional requests. This can slow down the page layout process and lead to coffee consumption.

Of course, the content of the page has the greatest impact on download time. Images are huge with respect to text -- a picture really is worth a thousand words. This doesn't mean that Web designers should avoid useful images, but designers can speed up downloads by avoiding unnecessary images (e.g., those representing text, those representing characters that may be displayed using a particular font, etc.) and by using terse image formats. Below we will see how new Web technologies facilitate both of these tasks.

How HTTP, CSS, and PNG can alleviate Web slowdown

One of the goals of W3C Activities is to ``save the Internet from the Web'' -- to alleviate Web-induced bottlenecks that might otherwise stifle further Web growth. To appreciate how some bottlenecks have been addressed, we must delve into the HTTP protocol, Cascading Style Sheets, and Portable Network Graphics.

HTTP/1.1

The Hypertext Transfer Protocol (HTTP) is the language spoken by client and server computers during the transfer of a Web resource. W3C and IETF researchers have analyzed the limitations of the previous version of the protocol (HTTP/1.0) exposed by an explosively growing Web and designed the latest version (HTTP/1.1) to tackle them. Most notably, researchers focused on HTTP's reliance on the supporting protocol suites TCP and IP. All information that traverses the Web does so stowed in ``boxcars'' called IP packets. TCP controls how these boxcars travel from station to station and prevents communication from failing when a boxcar derails, i.e., an error occurs. HTTP relies entirely on the TCP/IP infrastructure. To function efficiently, HTTP must take advantage of TCP/IP's strengths and avoid its weaknesses, something that HTTP/1.0 doesn't do very well. For example, in HTTP/1.0, every URI that appears in an HTML document initiates a new request by a client to connect to a server; such requests are part of TCP. Even if ten URIs in a row refer to the same server, it is not possible to ``keep the TCP door open'' for more than one request. Since TCP's strengths manifest themselves the longer the door remains open, it makes sense to eliminate unnecessary TCP open requests. After all, why spend all this time establishing a connection to a Web server only to close it and open another? HTTP/1.1 addresses this issue with an open door policy known as persistent connections.

Once a TCP connection has been opened, as much data should be sent through it as possible. Since the IP packets that carry this data may vary in size but cost the same whether large or small, it makes sense to send the largest IP packets possible over an open TCP connection. On the ``Information Super Tollway,'' all boxcars pay the same toll, so larger boxcars conserve Internet resources. When a browser using HTTP/1.0 encounters a URI, it employs a wastefully small IP packet for the TCP open request. HTTP/1.1 saves up (buffers, in the lingo) TCP open requests until it fills up a large packet. By generating fewer but larger IP packets, HTTP/1.1 reduces network traffic and perceived download times.

In HTTP/1.0, after every IP packet sent, the server waits for a reply from the server before sending another packet. This is like sending the first of one hundred boxcars of cows from Kansas City to Chicago and waiting for a reply from the Chicago station manager before sending the second. In HTTP/1.1, the server can send an entire train of boxcars without waiting for a TCP acknowledgment. Pipelining, as this is called, reduces the total elapsed time between the initial request and the final reply without loss in the serial nature of the requests.

HTTP/1.1 boasts another feature that doesn't affect real-time Web performance directly, but will help save the Internet from the Web. The rapid growth of the Web has produced a feeding frenzy for domain names like mycompany.com, often as important for corporate recognition as a logo. Domain names may be infinite in number, but the IP addresses they translate into are not, and IP address depletion has become a serious concern. The host headers of HTTP/1.1 allow Internet Service Providers to assign a host name to a company without using up a single IP address.

HTTP/1.1 became in IETF Draft Standard in June 1999 as RFC 2616. More information about HTTP/1.1 and other Protocol Activity at W3C is available online.

Cascading Style Sheets

Another way to speed up downloads is to download less information. Many Web pages make inefficient use of resources by using images to lay out a page or to represent text. Cascading Style Sheets (CSS) can prevent all that. To understand CSS, it's best to separate the SS part from the C part.

Style sheets allow HTML page designers to separate a document's structure from its presentation. Presentation issues include how text or images should be laid out on the screen, how the same pages should be printed, and even how they may be rendered by devices other than desktop graphical browsers (e.g., speech synthesizers, which benefit users with blindness, low vision, or who work in an eyes-busy environment).

While CSS gives page designers and readers much greater control of page presentation, it has the added value of speeding up page downloads. First of all, modularity in style sheets means that the same style sheet may apply to many documents, thus reducing the need to send redundant presentation information over the network. Second, CSS can eliminate small images used to represent symbols (such as bullets, arrows, spacers, etc.). With CSS, images that represent text can be replaced by real text and CSS colors and fonts. Third, CSS gives designers the ability to control the position of page elements. This is meant to eliminate the practice of using costly, invisible images for layout purposes. Images may now do what they were meant to do -- be seen and not waited for.

These are just a few of the reasons that in addition to facilitating document design, CSS actually improves overall Web performance. Although style sheets involve a slight layout overhead, the overall savings in TCP packets makes it indispensable to smart page design.

More information about CSS, XSL, and other Style Activity at W3C is available online.

Portable Network Graphics

W3C has also addressed the size and scalability of images on the Web. Digital photos and video will always take up lots of space, but the transmission of smaller images may be improved by using Portable Network Graphics (PNG). Like GIF, PNG is an image formatting language, but it has several advantages over GIF. First, PNG images render more quickly on the screen and produce higher quality images. In some instances, PNG images are also smaller than GIF images. Finally, the PNG format is not subject to the same patent restrictions as GIF, an key issue for the W3C, whose mission includes making Web technology available to the widest possible audience.

More information about PNG, Scalable Vector Graphics (SVG), WebCGM Profile, and other Graphics Activity at W3C is available online.

Putting it all together

Advances in HTTP, CSS, and PNG advances (and others, including HTML compression at the link layer and range requests) have been shown to alleviate some of the traditional Web bottlenecks. To sum up:

Resolving the URI. Thanks to persistent connections, HTTP/1.1 requires fewer name resolutions for the same number of HTTP requests. Also, HTTP/1.1 can correctly cache and reuse URIs that have already been resolved.
Connecting to the Web Server. HTTP/1.1 reduces the number of slow TCP open requests. Improved caching mechanisms in HTTP/1.1 significantly reduce the number of packets necessary to verify whether a cached resource on a proxy server must be refreshed.
Requesting the Web Resource. Pipelining with buffering makes more efficient use of TCP packets. Pipelining also leads to faster validation of cached information and fewer TCP packets sent. By using CSS, authors can eliminate unnecessary images and reuse style sheets with several documents. PNG means better, often smaller, images.

HTTP/1.1, CSS, and PNG and other work going on in W3C Activities demonstrate how W3C is developing common protocols that promote the Web's evolution and ensure its interoperability.

About the World Wide Web Consortium (W3C)

W3C is an international industry consortium jointly run by the MIT Laboratory for Computer Science (MIT LCS) in the USA, the National Institute for Research in Computer Science and Control (INRIA) in France and Keio University in Japan. Services provided by the Consortium include: a repository of information about the World Wide Web for developers and users, reference code implementations to embody and promote standards, and various prototype and sample applications to demonstrate use of new technology.