WWW Workload Characterization Work at IBM Research

Erich M. Nahum

IBM T.J. Watson Research Center

Yorktown Heights, NY 10598

The phenomenal growth of the World-Wide Web, in both the volume of information on it and the numbers of users desiring access to it, is dramatically increasing the performance requirements for large scale information servers. WWW server performance is a central issue in providing ubiquitous, reliable, and efficient information access.

Understanding the WWW traffic workload is crucial in order to accurately evaluate server performance. For example, different performance optimizations have different benefits depending on the size of the file requested. Accurately capturing characteristics, such as the distributions of file sizes, is necessary to quantify the aggregate overall benefit of a particular server optimizations.

My research has been studying server performance on AIX, IBM's BSD-derived UNIX. Servers I have used include Apache, Zeus, and Flash, a research server developed at Rice University. Workload generators I've worked with include WebStone, SpecWeb96, and SURGE.

Two issues I am currently examining are:

1) 304 (HTTP_NOT_MODIFIED) responses: According to logs taken from IBM's corporate WWW server (www.ibm.com), almost 30 percent of requests from clients are GET If-Modified-Since queries which are successfully cached at the client. These queries thus return 304 responses, which transfers no data. No current benchmark that I am aware of captures this, and I am adapting SURGE to generate the appropriate ratio of GET If-Modified-Since queries.

2) WAN Characteristics: Thus far, all WWW server performance evaluation has been done on high-speed LAN's, which have very different performance characteristics from WAN's. Many WWW servers are used for wide-area information dissemination on the global Internet, which has greatly varying bandwidths, round-trip times, and packet loss characteristics. I am currently using packet traces from the 1996 Olympics to parameterize a software layer in BSD to statistically re-create the WAN environment in a LAN setting.

I look forward to discussing these and other issues with fellow participants at the W3C Workshop on Web Characterization.

---

[1] Martin F. Arlitt and Carey L. Williamson. Internet web servers: Workload characterization and performance implications. IEEE/ACM Transactions on Networking, 5(5):631--646, Oct 1997.

[2] Gaurav Banga and Peter Druschel. Measuring the capacity of a web server. Proceedings of the USENIX Symposium on Internet Technologies and Systems (USITS), Monterey, CA, Dec 1997.

[3] Gaurav Banga and Jeffrey C. Mogul. Scalable kernel performance for Internet servers under realistic loads. USENIX Annual Technical Conference, New Orleans, Louisiana, June 1998.

[4] Paul Barford and Mark Crovella. Generating representative web workloads for network and server performance evaluation. Proceedings of the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, Madison, WI, June 1998.

[5] Mark Crovella and Azer Bestavros. Self-similarity in world wide web traffic: Evidence and possible causes. Proceedings of the ACM Sigmetrics Conference on Measurement and Modeling of Computer Systems, Philadelphia, PA, May 1996.

[6] David Mosberger and Tai Jin. httperf -- a tool for measuring web server performance. Proceedings 1998 Workshop on Internet Server Performance (WISP), Madison, WI, June 1998.

[7] Erich Nahum, Tsipora Barzilai, and Dilip Kandlur. Performance Issues in WWW Servers. Submitted for publication, Oct. 1998.

[8] The Standard Performance~Evaluation Corporation. SpecWeb96. http://www.spec.org/osg/web96.

[9] Gene Trent and Mark Sake. WebStone: The first generation in HTTP server benchmarking. http://www.sgi.com/Products/WebFORCE/WebStone.