Ramón Cáceres, Balachander Krishnamurthy, and Jennifer Rexford AT&T Labs-Research; 180 Park Avenue Florham Park, NJ 07932 USA {ramon, bala, jrex}@research.att.com |
Virtually all Web performance evaluation work has focused on server logs, proxy logs, or packet traces based on HTTP 1.0 traffic. HTTP 1.1 [1] introduces several new features that may substantially change the characteristics of Web traffic in the coming years. However, there is very little end-to-end HTTP 1.1 traffic in the Internet today. This has led to a dependence on HTTP 1.0 logs and synthetic load generators to postulate improvements to HTTP 1.1, and to evaluate new proxy and server policies. We believe that Web performance studies should use more realistic logs that take into account changes to the HTTP protocol. In particular, we suggest techniques for converting an HTTP 1.0 log into a semi-synthetic HTTP 1.1 log, based on information extracted from packet-level traces and our knowledge of the HTTP 1.1 protocol. As part of this study, we plan to collect detailed packet-level server traces at AT&T's Easy World Wide Web (EW3) platform [2], the Web-hosting part of AT&T WorldNet.
The changes in the HTTP protocol address a number of key areas, including caching, hierarchical proxies, persistent TCP connections, and virtual hosts. We focus on specific new features that are likely to alter the workload characteristics (as summarized in Table 1):
HTTP 1.1 Feature | Implication |
---|---|
Persistent connections | Lowers number of connection set-ups |
Pipelining | Shortens interarrival of requests |
Expires | Lowers number of validations |
Entity tags | Lowers frequency of validations |
Max-age, max-stale, etc. | Changes frequency of validations |
Range request | Lowers bytes transferred |
Chunked encoding | Lowers user perceived latency |
Expect/Continue | Lowers error response/bandwidth |
Host header | Reduces proliferation of IP addresses |
Research on Internet workload characterization has typically focused on creating generative models based on packet traces of various applications [5,6]. These models range from capturing basic traffic properties, like interarrival and duration distributions, to representing application-level characteristics. Synthetic workload generators based on these models can drive a wide range of simulation experiments, allowing researchers to perform accurate experiments without incurring the extensive overhead of packet trace collection. A synthetic modeling approach has also been applied to develop workload generators for Web traffic [7,8]. Although these synthetic models of HTTP 1.0 traffic are clearly valuable, it may be difficult to project how these synthetic workloads would change under the new features in HTTP 1.1.
In contrast to Internet packet traces, many sites do maintain Web proxy or server logs. Having a way to convert these HTTP 1.0 logs to representative HTTP 1.1 logs would allow these sites to evaluate the potential impact of various changes to the protocol. These semi-synthetic HTTP 1.1 traces could also be converted into synthetic workload models that capture the characteristics of HTTP 1.1. The process of converting HTTP 1.0 logs to representative HTTP 1.1 logs requires insight into the components of delay in responding to user requests, as well as other information that is not typically available in logs. A packet trace, collected at the Web proxy or server site, can provide important information not available in server logs:
The value of packet traces has been demonstrated in recent studies on the impact of TCP dynamics on the performance of Web proxies and servers [9,10]. Similarly, a complete collection of packet traces of both request and response traffic at a Web server would provide a unique opportunity to gauge how a change to HTTP 1.1 would affect the workload.
For example, the packet trace could be used to estimate the latency reductions under persistent connections by measuring the delay involved in closing and reopening a TCP connection between a client and the server for consecutive transfers. As a more complicated example, consider the potential use of range requests in HTTP 1.1 to fetch partial contents of an aborted response message. If a client aborts a request during the transmission of the response, the client (or proxy, if one exists) may receive only a subset of the response. Abort operations can be detected in a packet trace by noting the client RST packet, whereas the server log would either include (or not include) an entry for the request/response. The packet trace would also indicate how much of the transfer completed before the abort reached the server. If the client initiates a second request for the resource, the HTTP 1.0 server would transfer the entire contents again. However, an HTTP 1.1 client (or proxy) could initiate a range request to transfer only the missing portion of the resource. The HTTP 1.0 packet traces would enable us to recognize the client's second request, and model the corresponding range request in HTTP 1.1, assuming the partially-downloaded contents are still in the cache.
During the past year and a half, AT&T Labs has built and deployed two high-performance packet monitors at strategic locations inside AT&T WorldNet. Traces from these PacketScopes have been used for a number of research studies [10,11]. For the purposes of this study, we are constructing a third PacketScope to be installed at AT&T's EW3 Web-hosting complex.
This third packet monitor consists of a dedicated 500-MHz Alpha workstation attached to two FDDI rings that together carry all traffic to and from the EW3 server farm. The monitor runs the tcpdump utility [12], which has been extended to process HTTP packet headers and keep only the information relevant to our study [13]. The monitor stores the resulting data first to a 10-gigabyte array of striped magnetic disks, then to a 140-gigabyte magnetic tape robot. We ensure that the monitor is passive by running a modified FDDI driver that can receive but not send packets, and by not assigning an IP address to the FDDI interface. We control the monitor by connecting to it over an AT&T-internal network that does not carry customer traffic. We make our traces anonymous by encrypting IP addresses as soon as packets come off the FDDI link, before writing any packet data to stable storage. Our experience with an identical monitor elsewhere in WorldNet indicates that these instruments can capture more than 150 million packets per day with less than 0.3% packet loss.
In addition to collecting packet traces, we plan to extend the server logging procedures in EW3 to record additional timing information. A server could log the time it (i) starts processing the client request; (ii) starts writing data into the TCP send socket; and (iii) finishes writing data into the TCP send socket. Typically, servers log just one of the three (often (ii)). But logging all three would allow us to isolate the components of delay at the server. For example, the first two timestamps would allow us to determine the latency in processing client requests (e.g., due to disk I/O, or the generation of dynamic content). The packet traces, coupled with the extended server logs, provide a detailed timeline of the steps involved in satisfying a client request, with limited interference at the server (to log the additional time fields).
Although our initial study will focus on the server packet traces and the augmented server logs, future work could consider additional measurements at (a limited subset of) the client sites. For example, a packet monitor is already installed at one of the main access points for WorldNet modem customers; this data was used in a recent study of Web proxy caching [10]. This data set would provide a detailed view of the Web traffic for the (admittedly small) subset of EW3 requests that stems from these WorldNet modem customers. By measuring Web transfers at multiple locations, and through multiple measurement techniques, we hope to create a clearer picture of how both the network and the server affect Web performance.
Acknowledgments: We thank Dave Kristol for his clarifications on some of the aspects of HTTP 1.1.
ftp://ftp.ietf.org/internet-drafts/draft-ietf-http-v11-spec-rev-05.txt
.
http://www.ipservices.att.com/wss/hosting
.
http://www.acm.org/sigcomm/sigcomm95/papers/mogul.html
.
http://www.inria.fr/rodeo/sigcomm97/program.html
.
http://www.research.att.com/~ramon/papers/sigcomm91.ps.gz
.
http://www.nlanr.net/Flowsresearch/Flowspaper/flows.html
.
http://cs-www.bu.edu/faculty/crovella/paper-archive/sigm98-surge.ps
.
http://www.ca.sandia.gov/~bmah/Papers/Http-Infocom.ps
.
http://http.cs.berkeley.edu/~padmanab/index.html
.
http://www.cs.wisc.edu/~cao/WISP98.html
.
http://www.acm.org/sigcomm/sigcomm98/tp/abs_04.html.
ftp://ftp.ee.lbl.gov
.