Motivation
Virtually all Web performance evaluation work has focused on server logs,
proxy logs, or packet traces based on HTTP 1.0 traffic.
HTTP 1.1 introduces several new features
that may substantially change the characteristics of Web traffic in the coming
years. However, there is very little end-to-end HTTP 1.1 traffic in the Internet
today. This has led to a dependence on HTTP 1.0 logs and synthetic load
generators to postulate improvements to HTTP 1.1, and to evaluate new proxy
and server policies. We believe that Web performance studies should use more
realistic logs that take into account changes to the HTTP protocol. In
particular, we suggest techniques for converting an HTTP 1.0 log into a
semi-synthetic HTTP 1.1 log, based on information extracted from packet-level
traces and our knowledge of the HTTP 1.1 protocol. As part of this study,
we plan to collect detailed packet-level server traces at AT&T's Easy
World Wide Web (EW3) platform , the Web-hosting part
of AT&T WorldNet.
The changes in the HTTP protocol address a number of key areas, including
caching, hierarchical proxies, persistent TCP connections, and virtual hosts.
We focus on specific new features that are likely to alter the workload
characteristics (as summarized in Table 1:
Using Packet Traces to Model HTTP 1.1
Research on Internet workload characterization has typically focused on creating
generative models based on packet traces of various applications
5,6. These
models range from capturing basic traffic properties, like interarrival and
duration distributions, to representing application-level characteristics.
Synthetic workload generators based on these models can drive a wide range
of simulation experiments, allowing researchers to perform accurate experiments
without incurring the extensive overhead of packet trace collection. A synthetic
modeling approach has also been applied to develop workload generators for
Web
traffic&7,8.
Although these synthetic models of HTTP 1.0 traffic are clearly valuable,
it may be difficult to project how these synthetic workloads would change
under the new features in HTTP 1.1.
In contrast to Internet packet traces, many sites do maintain Web proxy or
server logs. Having a way to convert these HTTP 1.0 logs to representative
HTTP 1.1 logs would allow these sites to evaluate the potential impact of
various changes to the protocol. These semi-synthetic HTTP 1.1 traces could
also be converted into synthetic workload models that capture the characteristics
of HTTP 1.1. The process of converting HTTP 1.0 logs to representative HTTP
1.1 logs requires insight into the components of delay in responding to user
requests, as well as other information that is not typically available in
logs. A packet trace, collected at the Web proxy or server site, can provide
important information not available in server logs:
The value of packet traces has been demonstrated in recent studies on the
impact of TCP dynamics on the performance of Web proxies and
servers9,10.
Similarly, a complete collection of packet traces of both request and response
traffic at a Web server would provide a unique opportunity to gauge how a
change to HTTP 1.1 would affect the workload.
For example, the packet trace could be used to estimate the latency reductions
under persistent connections by measuring the delay involved in closing and
reopening a TCP connection between a client and the server for consecutive
transfers. As a more complicated example, consider the potential use of range
requests in HTTP 1.1 to fetch partial contents of an aborted response message.
If a client aborts a request during the transmission of the response, the
client (or proxy, if one exists) may receive only a subset of the response.
Abort operations can be detected in a packet trace by noting the client RST
packet, whereas the server log would either include (or not include) an entry
for the request/response. The packet trace would also indicate how much of
the transfer completed before the abort reached the server. If the client
initiates a second request for the resource, the HTTP 1.0 server would transfer
the entire contents again. However, an HTTP 1.1 client (or proxy) could initiate
a range request to transfer only the missing portion of the resource. The
HTTP 1.0 packet traces would enable us to recognize the client's second request,
and model the corresponding range request in HTTP 1.1, assuming the
partially-downloaded contents are still in the cache.
Implementation
Server Packet Traces
During the past year and a half, AT&T Labs has built and deployed two
high-performance packet monitors at strategic locations inside AT&T WorldNet.
Traces from these PacketScopes have been used for a number of research
studies
10,11.
For the purposes of this study, we are constructing a third PacketScope to
be installed at AT&T's EW3 Web-hosting complex.
This third packet monitor consists of a dedicated 500-MHz Alpha workstation
attached to two FDDI rings that together carry all traffic to and from the
EW3 server farm. The monitor runs the tcpdump
utility [12], which has been
extended to process HTTP packet headers and keep only the information relevant
to our study [13]. The monitor
stores the resulting data first to a 10-gigabyte array of striped magnetic
disks, then to a 140-gigabyte magnetic tape robot. We ensure that the monitor
is passive by running a modified FDDI driver that can receive but not send
packets, and by not assigning an IP address to the FDDI interface. We control
the monitor by connecting to it over an AT&T-internal network that does
not carry customer traffic. We make our traces anonymous by encrypting IP
addresses as soon as packets come off the FDDI link, before writing any packet
data to stable storage. Our experience with an identical monitor elsewhere
in WorldNet indicates that these instruments can capture more than 150 million
packets per day with less than 0.3% packet loss.
Augmented Server Logs
In addition to collecting packet traces, we plan to extend the server logging
procedures in EW3 to record additional timing information. A server could
log the time it (i) starts processing the client request; (ii) starts writing
data into the TCP send socket; and (iii) finishes writing data into the TCP
send socket. Typically, servers log just one of the three (often (ii)). But
logging all three would allow us to isolate the components of delay at the
server. For example, the first two timestamps would allow us to determine
the latency in processing client requests (e.g., due to disk I/O, or the
generation of dynamic content). The packet traces, coupled with the extended
server logs, provide a detailed timeline of the steps involved in satisfying
a client request, with limited interference at the server (to log the additional
time fields).
Client Packet Traces
Although our initial study will focus on the server packet traces and the
augmented server logs, future work could consider additional measurements
at (a limited subset of) the client sites. For example, a packet monitor
is already installed at one of the main access points for WorldNet modem
customers; this data was used in a recent study of Web proxy
caching [10]. This data set would
provide a detailed view of the Web traffic for the (admittedly small) subset
of EW3 requests that stems from these WorldNet modem customers. By measuring
Web transfers at multiple locations, and through multiple measurement techniques,
we hope to create a clearer picture of how both the network and the server
affect Web performance.
Acknowledgments: We thank Dave Kristol for his clarifications on some
of the aspects of HTTP 1.1.
T. Berners-Lee, ``Hypertext transfer protocol - HTTP/1.1,'' September 11
1998.
SIGCOMM, pp. 299-313, August/September 1995.
C. Lilley, ``Network performance effects of HTTP/1.1, CSS1, and PNG,''
in Proc. ACM SIGCOMM, pp. 155-166, August 1997.
TCP/IP conversations,'' in Proc. ACM SIGCOMM, pp. 101-112,
September 1991.
for internet traffic flow profiling,'' IEEE Journal on Selected Areas
in
Communications, vol. 13, pp. 1481-1494, October 1995.
network and server performance evaluation,'' in Proc. ACM SIGMETRICS,
June 1998.
INFOCOM, April 1997.
``TCP behavior of a busy Internet server: Analysis and improvements,'' in
Proc. IEEE INFOCOM, April 1998.
caching: The devil is in the details,'' in Proc. ACM SIGMETRICS
Workshop
on Internet Server Performance, June 1998.
Explaining the multifractal nature of internet WAN traffic,'' in
Proc.
ACM SIGCOMM, pp. 42-55, September 1998.
http://www.acm.org/sigcomm/sigcomm98/tp/abs_04.html
traces,'' October 1998.
In submission to the W3C Workload Characterization Workshop.
Differences Between HTTP 1.0 and HTTP 1.1
Pipelining
Shortens interarrival of requests
Expires
Lowers number of validations
Entity tags
Lowers frequency of validations
Max-age, max-stale, etc.
Changes frequency of validations
Range request
Lowers bytes transferred
Chunked encoding
Lowers user perceived latency
Expect/Continue
Lowers error response/bandwidth
Host header
Reduces proliferation of IP addresses
References
ftp://ftp.ietf.org/internet-drafts/draft-ietf-http-v11-spec-rev-05.txt
.
http://www.ipservices.att.com/wss/hosting
.
http://www.acm.org/sigcomm/sigcomm95/papers/mogul.html
.
http://www.inria.fr/rodeo/sigcomm97/program.html
.
http://www.research.att.com/~ramon/papers/sigcomm91.ps.gz
.
http://www.nlanr.net/Flowsresearch/Flowspaper/flows.html
.
http://cs-www.bu.edu/faculty/crovella/paper-archive/sigm98-surge.ps
.
http://www.ca.sandia.gov/~bmah/Papers/Http-Infocom.ps
.
http://http.cs.berkeley.edu/~padmanab/index.html
.
http://www.cs.wisc.edu/~cao/WISP98.html
.
.