Pei Cao
Department of Computer Sciences
University of Wisconsin, Madison
1210 West Dayton Street
Madison, WI 53706 USA
cao@cs.wisc.edu
The WisWeb research group at University of Wisconsin-Madison are focusing on two aspects of Web characterization: study of Web proxy traffic and building a Web proxy benchmark. This paper reports our progress and current plans.
Using six traces from proxies at academic institutions, corporations and ISPs, we have studied a range of characteristics of requests seen by the proxies. The traces include a 26-day proxy log from DEC, a 19-day trace from UC Berkeley, a three-month trace from CS Dept. in Universita di Pisa, Italy, a 7-day trace from Questnet (which operates parent proxies serving child proxies in Australia), a one-day log from NLANR's proxies, and a 10-day log from FUNET, a regional ISP for academic and research communities in Finland. Our main findings are:
These results are reported in more detail in our paper ``Web Caching and Zipf-like Distributions: Evidence and Implications'', available at http://www.cs.wisc.edu/ cao/papers/zipf-implications.html. Due to space limitation we do not elaborate further here.
We have developed a simple proxy benchmark called Wisconsin Proxy Benchmark (WPB) 1.0 in fall 1997, and used it to compare a variety of commercial and free-ware proxy software [1]. The benchmark has also been used by others in measuring proxy performance and projecting the performance benefits of proxy caching. The benchmark emulates server delays and models temporal locality in the request stream. However, the use of the benchmark also exposed its weaknesses, including the overhead at client end, failure to model persistent connections and HTTP 1.1, and failure to capture spatial locality and URL path length.
We are in the process of developing Wisconsin Proxy Benchmark (WPB) 2.0. It uses the core engine of httperf [3], a very lightweight Web server benchmarking and measurement tool. The benchmark already supports persistent connection and HTTP 1.1, and supports trace replay with as much accuracy as possible at user level. We are in the process of adding in temporal locality, spatial locality and a variety of other features described below.
Through our experience of using WPB 1.0 to compare proxy products, we find that a proxy benchmark should at least reflect the following characteristics of real-life proxy traffic:
Some proxy benchmarks run the requests directly over the Internet to real Web servers; however, running requests directly over the Internet has its drawbacks. Short of that, proxy benchmarks must include a pseudo-server module which delays the responses. For portability reasons, the pseudo-server module most likely is implemented at user level, which means that it can at best emulate packet arrival delay (by issuing the sends with some delay), but cannot emulate features such as delays in connection establishment and network loss.
WPB 1.0 includes a pseudo-server module which delays sending back each reply by a configurable duration. WPB 2.0 will include a more elaborate delay mechanism, which amortize the delay over the packets in a reply.
For each persistent connection, the benchmark must generate proper distribution of the number of requests served. Here again there is no trace data. We hope that information on the distribution of the number of embedded objects in HTML pages can help us here, since for Internet Explorer, it fetches all embedded objects on one connection.
While we have a relatively good understanding of temporal locality and access distribution in proxy requests, we do not understand the spatial locality very well. We are working on this problem.
The benchmark should also generate the proper percentage of requests carrying cookies, and the responses carrying cookies. Cookies are ubiquitous today and different proxies process them differently.
Our results on the correlation between response size and object popularity aid in the benchmark design here. Since the correlation is very low, one can decouple the code on generating the reference URL from the code calculating the size of the object. We are looking into the correlation between response latency and object popularity.
Finally, the benchmark should measure not only the client latency, outgoing traffic, errors, but also fairness of the proxy. We have seen that process-based proxies can introduce significant unfairness in client latency, whereas event-driven proxies such as Squid treat requests much more fairly.
We are in the process of constructing WPB 2.0, consisting of a client-side code and a server-side code. All request generation and distribution fittings are done at the client-side code. In other words, the client code generates a request, sets its URL, then generates its response status code, type, size and latency. The server part of the benchmark is a simple pseudo-server that generates a number of random bytes with the specified status code, document type and size, and emulates packet delays based on the specified latency.
Our client and server codes are built through modifications of the httperf tool. httperf is extremely lightweight, using no threads or processes, but rather using an event-driven architecture. It handles various scalability bottlenecks at client side, including limitations of enumpheral ports (see [3]). It supports persistent connections and HTTP 1.1 range requests. The original httperf implements only the client part. We have changed httperf extensively to provide a server counterpart.
Our benchmark can replay proxy logs faithfully. The client-side code can read the trace and generate a request carrying specifications of size, latency etc. The server-side code then responds properly.
The trace replay tool offers a valuable service to any institution wanting to evaluate the benefit of caching proxies. The institution can replay a portion from their log and immediately obtain numbers such as user latency reduction and Internet traffic reduction.
We are now working on the modeling part of the client-side code, hoping to incorporate all of the items listed above.
We have described our current results through analyzing six Web proxy traces and our plan on building the next version of the Wisconsin Proxy Benchmark. A few data items needed to build a realistic benchmark are still missing, including the average URL path component length, average number of requests serviced by persistent connections, the percentage of persistent connections, etc. New traces that can provide such information would be highly appreciated.
~
luigi/caching.ps.gz.
Characterization of Web Proxy Traffic and Wisconsin Proxy Benchmark
2.0
(Position Paper for W3C Web Characterization Workshop)
This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)
Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
The command line arguments were:
latex2html -no_navigation -split 0 position.tex.
The translation was initiated by Pei Cao on 10/12/1998