Network-Centric Measurement of Caching

Position Paper
Solom Heddaya
InfoLibria, Inc.
1998.10.20

Network managers deploy caching in order to improve their networks, not to achieve hit ratios, or op/s, or any of the other commonly used caching metrics. They seek to improve their network's bandwidth capacity, web service capacity and repsonse time, without adversely affecting network latency, or network availability and reliability.

Very little current research work addresses these issues of pressing concern and value to network managers. Most of the current benchmarking and performance characterization research focuses on the behavior of the network cache as a server. While this approach can be useful in optimizing certain aspects of network caches, it does not accurately reflect the impact of caching on the network. Typical cache performance metrics, such hit ratio (or rate), mystify network managers. They would much rather see metrics that quantify the promised network capacity expansion, response time speedup (to the end-user), and availability enhancement. The reliability impact of caching ranks high on their list, too.

The network-centric point of view impacts workload characterization. For example, request and response routing information needs to be included in workload characteristics, in order to address such problems as network cache placement.

In this paper, we argue why the repertoire of research in the field should widen to include the point of view of the network, and to provide an initial attempt at clarifying how this might be done using research from the field of performability.

Network Bottlenecks Oscillate

As the Internet grows at a historic pace, doubling in aggregate traffic rate every three to six months, it suffers from bottlenecks that frustrate its users. These bottlenecks oscillate between the two major constituents of the Internet: the client/server complex, and the network itself. Recently the Internet was stressed to deliver the Starr report [W98, K98b]. On this particular occasion, the bottleneck was the server(s). However, ordinary traffic conditions give rise to an unacceptably low 40 kilobit/s average transfer rate per TCP connection through the backbone [K98a]. This latter measurement reflects transfer rates delivered, not over modem lines, but over dedicated T1-class last hop.

Network caching applies server-like functionality to solve the network congestion problem. So, is a network cache to be judged on how well it functions as a server, or on the extent to which it improves the network? On the one hand, a network cache looks like a high performance server, whose performance can be characterized via the traditional throughput and response time [MA98]. On the other hand, a network cache expands bandwidth and speeds up response time. These benefits are the true goals of network caching. With only one exception, measurements of network cache performance continue to focus on the server aspect of network caching.

Server-Centric Performance Characterization

Network caches, as implemented most commonly today, originated from work on high performance web servers. >From the point of view of the network, servers are hosts, while network caches are more like routers or switches. The dominance of the server point of view in characterizing cache workload and cache performance, can be seen by noting the following:

Cache performance parameters reflect throughput (commonly in op/s) and response time of the cache itself, augmented with cache hit ratio. These are the same parameters used to characterize servers.

Cache hit ratio is a poor indicator of bandwidth savings, for several reasons: First, hit ratio ignores the network distance saved [HMY97]. A packet transmitted over 30 hops uses up more network capacity (bandwidth) than one that traverses only 10 hops. Second, network capacity is often determined by by the bandwidth across (relatively) few links. Packets that consume otherwise idle bandwidth have zero impact on network capacity. Third, even if we focus on a single bottleneck link, the hit ratio that matters will be the hit ratio at peak link utilization, not the average hit ratio. Fourth, misses are double (or more) counted when they traverse multiple caches. This understates the aggregate hit ratio of a collective cache.

Cache workloads typically capture only server-centric workload characteristics, such as request arrival time, requested object name, file size, etc. This is insufficient to predict a cache's impact on the network. For example, a trace that shows three requests A1, A2, A3 for the same object, can yield a hit ratio of zero, 1/3, or 2/3, depending on where caches are placed in the network. If the three requess reach different caches, the hit ratio will be zero.

Lack of route information in traces or models restricts their validity to situations where a cache is to be deployed at the exact same spot in the network at which the load was traced or modeled.

Network-Centric Performance Evaluation

A number of network-related factors must be taken into account, for network caching to be evaluated in the proper context. These range from network topology, to capacity enhancement, to the effect of cascaded caches on each other's performance. Furthermore, network and web content availability need to be suitably defined and quantified. The requirements for such a network-centric performance model include:

Workload characteristics should include route information for requests and responses. The risks of not doing so have been discussed earlier.

Other network phenomena should be modelled as well, such as slow client connections (e.g., modems [AC98]), and backbone congestion leading to low TCP connection bandwidth to the content-provider's HTTP server.
Throughput, hit ratio, and response time should all be modelled in the aggregate. Or, conversely, the behavior of each cache should not be reported in isolation. For example, the aggregate hit ratio of a caching system that intercepts the same (miss) request multiple times along the request path, would be higher when a miss is counted only once. Similarly, the aggregate throughput of the caching system, would be lower than the sum of the constituent caches, if a miss is counted as a sincle operation, even though it may trigger multiple operation executions at different caches.
Benefits to the network should be the end result of performance evaluation. This means that average bandwidth savings, for example, should be replaced with bandwidth savings at peak link, or network, utilization. For example, a cache that thrashes during such times would provide zero bandwidth expansion, no matter what the average traffic reduction it delivers may be.

Cache performance metrics should include the impact of the caching system on network service availabilty and reliability [HHE97]. Availability is the probability that a request submitted to the network will lead to the successful initiation of service. Availability depends on the length of the downtime caused by a cache failure, until any fail-safety mechanism kicks in to restore service.
Reliability is the probability that a service, once successfully initiated, runs to completion. The typical metric for reliability is the mean time between failures (MTBF). When cached files are small, and hence transaction lifetimes are short, reliability can be safely ignored. This is especially true if availability is high. The converse is not true. Large files, such as the Starr report when served as a single object, resulted in 89% failure rate at the house.gov web server [K98b]. Many of these failures occurred after the download started.
A single figure of merit that combines performance (including overhead), availability and reliability, is performability. It turns out that such a composite metric can be defined simply yet meaningfully (see [HHE97]).

Long Term vs. Short Term

Aside from the tactical effects of network caching, which can be quantified reasonably well using the approach we outlined above, we should not ignore the strategic impact on the network. For example, network scalability can be dramatically enhanced (or hampered) by caching. if the network scales by upgrading individual links, then a parallel computing solution would be suitable, but if the network grows primarily by adding new links and nodes, then a distributed computing approach to caching would be preferable.

References

[AC98] J. Almeida and P. Cao, "Wisconsin Proxy Benchmark 1.0", Univ of Wisconsin, (as of Oct. 20, 1998).

[HHE97] A. Heddaya, A. Helal, and A. Elmagarmid, "Recovery-Enhanced Reliability, Dependability and Performability," Chapter 4 in Recovery Mechanisms In Database Systems (V. Kumar and M. Hsu, eds.) Prentice-Hall, Dec. 1997.

[HMY97] A. Heddaya, S. Mirdad and D. Yates, "Diffusion-based Caching Along Routing Paths", Proc. 2nd Web Caching Workshop, Boulder, Colorado, June 9-10, 1997.

[K98a] Keynote Systems, Inc., "Top 10 Discoveries about the Internet", (as of Oct. 20, 1998).

[K98b] Keynote Systems, Inc., "Clinton/Lewinsky Scandal : Effect on Internet Performance", Oct. 6, 1998.

[MA98] D.A. Menasce, V.A.F. Almeida, "Capacity Planning for Web Performance: Metrics, Models, & Methods," Prentice-Hall, 1998.

[W98] D. Wessels, "Report on the effect of the Independent Council Report on the NLANR Web Caches", NLANR, Sep 23, 1998.

******