Web Service Scalability and Performance with Optimising Intermediaries

Mark Nottingham <mnot@akamai.com>


Experiences with the Web and the Internet at large have taught us that an important attribute for any application or system is scalability; indeed, it's become a cliche to say that something won't work because it "doesn't scale". Tightly bound to scalability is performance, as expressed by end-user perceived latency and other metrics. This paper outlines one approach to scaling Web Services, and proposes further work which leverages XML Protocol's features to help scale them and improve performance.


Scalability in network-available services, which might be defined as the ability of an application to handle growth efficiently, is typically achieved by making them available on multiple devices. While a single server can be enlarged to a certain degree, this approach rapidly reaches a point where the cost of scaling overreaches the benefits. For example, one may add more processors, but the cost of such specialized hardware quickly exceeds that of two commodity servers. Additionally, using more than one server gives benefits in performance, reliability and flexibility, and introduces the opportunity for the introduction of efficiencies which cannot be realised in a single-server deployment.

To function on a number of servers, an application needs to do two things; direct requests to an appropriate server, and enable that server to process them and provide an appropriate response.

In this model, Web Service request messages are sent to a Service's URI, but some mechanism (either in the message or external to it) routes them to another, intermediary, device. That device may or may not act as an intermediary in other aspects (for example, it may or may not be an XML Protocol intermediary, or an HTTP intermediary). For purposes of this paper, we will call such devices service intermediaries.

There are many methods of directing requests to a service intermediary, depending on the nature of the deployment and the service's requirements. XML Protocol, or an XMLP Module, may provide a mechanism for routing messages to the device. Alternatively, if a number of service intermediaries are located (network) near each other, a "Layer-3+" load-balancing switch may be used to distribute the load between them, without explicit in-message routing. In a more distributed deployment, a "Global" load balancing product or service may be used to direct clients to the appropriate service intermediary based on a number of criteria, and achieved through any of a number of possible techniques acting at various layers.

Once a request arrives at a service intermediary, the device needs to be capable of satisfying it. There are two basic approaches to this problem; execution of service-specific code to provide the service, and the use of mechanisms which leverage common service behaviours in order to introduce efficiencies. These roughly align with the functional/optimising axis used in the middlebox taxonomy draft [MIDTAX].

Functional service intermediaries typically execute service-specific code. Because these devices interpret or create messages in the process of providing services, they cannot be removed from the message chain. They also require distribution infrastructure and a means of describing the code's requirements and interfaces. Examples of functional mechanisms include resource-embedded languages which are interpreted by the intermediary and service environments which process the message in some way when providing the service [OPES, ICAP, PCR].

Optimising service intermediaries provide efficiencies by exploiting common service behaviours. If these devices are removed, a service is still available (assuming that requests are routed appropriately), but will not benefit from the performance increase they introduce. Generally, such techniques optimise by managing resource use, such as processing, network connections and bandwidth, through the inclusion of advisory hinting.

In this paper, we concentrate on the techniques which could enable optimising service intermediaries, attempt to identify requirements for them, and explore their use cases.

Requirements for Optimisation Techniques

A standardized set of optimisation mechanisms to allow scalability of Web services needs to fulfill a number of requirements;

They must identify and leverage patterns in Web Service models.
To allow intermediaries to provide efficiencies, the mechanisms will need to find the appropriate behaviours to exploit. In this, they must balance between being too application-specific and too general; either extreme can make its standardization useless.
They should be able to be applied to messages at a number of granularities.
Because Web Service payloads are based on XML, mechanisms have the opportunity to operate on individual XML elements, as well as the entire message. This offers far greater flexibility when transferring service semantics to an intermediary.
They must be easy for service developers to understand and use.
Automatically generating hints for optimisation is difficult; it requires knowledge about the service semantics and underlying application. Because of this, it must be possible for developers to understand and easily use optimisation mechanisms.
They must be able to be invoked explicitly or implicitly.
While some services will be able to communicate hints for mechanisms in-message, others will need to be capable of external hinting. Similarly, some services will be able to invoke efficiency mechanisms by use of an XML Protocol Module, both others will require out-of-band invocation.
They may assume a trust model between the service intermediary and service provider, but should not require it.
Experience with HTTP caching shows that useful intermediary services often need a trust relationship with the content provider. However, there may be situations where this relationship is not essential.

Optimisation Techniques

There is a rich history of optimisation techniques in protocol design and computer science in general which we can draw from. Here, we attempt to separate them into general mechanisms which may be combined to allow services to more powerfully and exactly control how service intermediaries handle their messages. This list draws primarily from techniques used in the HTTP, which in turn benefitted from experience in distributed filesystems [DFSScale].


Caching is a technique that has been used for some time to scale distributed systems, whether it be in filesystem design or the World Wide Web. By allowing clients to keep and reuse copies of entities, efficiencies are realised by either the avoidance of data transfer, or the avoidance of a round-trip to the server altogether. Caching techniques rely on locality in usage patterns; that is, the likelihood that portions of messages can be reused.

To be able to reuse an entity, a caching service intermediary must understand the conditions under which it is appropriate to do so. Cache indexing defines the profile of request semantics in which a particular response may be reused. The most obvious way to index a cache is based upon Services' URIs, as HTTP does. This provides a namespace for cache lookups to be performed in.

For more complex applications, it may be necessary to modify the cache index depending on other attributes. For example, HTTP allows the 'Vary' response header to specify which additional request headers should be used to index the cache, allowing objects with separate language attributes to be stored under the same URI, for example. This content negotiation feature is crude in the HTTP, but could be much more expressive using XML.

Conversely, there may be situations where a Service URI-based cache index may be too restrictive; it may be useful to expand the scope of the cache index to include multiple resources, to allow entities to be reused across services. To accommodate these situations, it should be possible to declare a 'virtual' cache index which different resources can interact with.

Furthermore, a Service much have some control over the entities stored in a caching service intermediary. Cache coherence mechanisms provide this, typically through the use of validation (actively checking to see whether an entity should be reused) and invalidation (marking the content as 'stale' based on some trigger event). Additionally, partial content techniques allow services to express the delta of a changed entity, giving greater efficiencies for large objects with relatively small changing parts [Delta].


Some Services consist of the submission of a message as the request, and a brief acknowledgement as a response, in a manner similar to SMTP's store-and-forward pattern. Standardization of an acknowledgement message would allow intermediaries to take responsibility for handling requests whilst immediately acknowledging them. In combination with caching and other techniques, store-and-forward allows intermediaries to improve service reliability substantially, by making it possible to have multiple, redundant points of contact for message submission, with the possibility for performance improvement through client/intermediary locality.


In some situations, intermediaries need to send or receive a number of separate messages to or from a particular device. Although some transport bindings may make it possible to reuse a network connection for these messages, further processing efficiencies might be realised by their combination into a single message. For example, it might be desirable to send all store-and-forward messages for a Service at once, wrapping all of them in a master message which uses an encryption module to protect them. If used across an HTTP binding, this approach avoids the overhead of separately encrypting the messages and then submitting each one and waiting for a response to indicate success.

Similarly, there may be situations where it is advantageous to 'piggyback' responses to give additional information to the intermediary. Previously, piggyback validation techniques have been examined in the HTTP [Piggyback], and such techniques could also be used with service intermediaries to pre-fill the cache, bundle invalidations, and perform other tasks.

Application of Optimisation Techniques

To use these techniques, the manner in which they are to be applied needs to be described to the service intermediaries. Generally speaking, this has two aspects; when to apply them, and to what they should be applied.

The caching and store-and-forward techniques both require triggers; the cache needs to know when to validate or invalidate an entity. Store-and-forward needs to declare under what conditions the message should be forwarded. To accommodate this, a variety of trigger mechanisms could be defined;

XML offers an ideal way to control the scope of application to portions of a message, because there are a number of ways to associate hints with a particular XML element or hierarchical group of elements (up to the scope of the entire message).

The most obvious means is through use of attributes in a separate XML Namespace. For example, if an element 'foo' and its children are cacheable, it could be expressed as

<foo cache:invalidate="yes" cache:delta="5m"> ... </foo>

Alternatively, if the document has an XML Schema associated, it would be possible to encapsulate optimisation hints in the schema itself.

Finally, a separate XML description using XPath, XPointer or similar technology could be used to describe optimisation hints. This could be located out-of-band, in the intermediaries' configuration, or in an XML Protocol Header.

In both of these dimensions, particular care should be taken to assure that optimisation techniques can be applied in the most flexible and intuitive manner possible.

Optimisation Use Cases

Although tentative, these use cases help illustrate the potential scope of optimized service intermediaries' power, and their effect on Web Services.

StockQuote Service
By caching response elements containing rapidly-changing financial data for a delta-time before invalidation, a Stock Quote Service could offer enhanced end-user perceived performance whilst reducing load on centralized servers. Furthermore, slowly-changing data could be given separate, longer-term cacheability (with absolute-time invalidation if the release schedule is known, or message-triggered invalidation if not).
News (RSS) Service
An XML-based news 'channel' Service can take advantage of a regular publication schedule to cache article summaries with an absolute time until validation (with the possibility of using partial content updates). Additionally, because some servers may provide many such services, channel requests and responses may be aggregated into a single message interchange for efficiency.
Distributed Authentication Service
A centralized Web site user authentication Service can exploit geographic locality in client behaviour by allowing a distributed group of caches to keep authentication state, rendered as XML, at the 'edge' of the network. If the user changes their password, the request for change can act as a trigger for invalidating the cached entry.
File Store Service
An 'Internet hard drive' Service, where users write to and read from a Service as if it were a network-available disk, could be distributed to a number of 'edge' servers to improve end-user perceived latency by exploiting locality in their access patterns. This could be achieved through a combination of store-and-forward into a cache (i.e., write caching), reading from the cache, and invalidation events to stimulate synchronisation with a centralized server.
Order Queue Service
With store-and-forward techniques, service intermediaries can provide higher availability for a service than a centralized server alone, whilst offering the potential to manage load on the central server by aggregating messages to it.
Voting, Poll and Auction Services
'Interactive' Services can take advantage of 'best-guess' information in cache whilst updating critical information through message triggers and element invalidations.

Further Work

This paper has outlined areas of research regarding optimising mechanisms in service intermediaries; they are intended as a discussion point only. Hopefully, they will generate interest in standardization of such techniques, development of a framework for their use, and integration into Web Service toolkits and products.


MIDTAX - B. Carpenter. "Middle boxes: taxonomy and issues". January, 2001.

OPES - IETF "Open Pluggable Extensible Services" Birds-of-a-Feather

ICAP - J. Elson et. al. "Internet Content Adaption Protocol" (see also ICAP web site)

DFSScale - M. Satyanarayanan. "The Influence of Scale on Distributed File System Design". In IEEE Transactions on Software Engineering, January 1992.

Piggyback - Balachander Krishnamurthy and Craig E. Wills. "Piggyback Server Invalidation for Proxy Cache Coherency". In Proceedings of the Seventh International World Wide Web Conference, Brisbane, Australia , April 1998.

Delta - J. Mogul et. al. "Delta encoding in HTTP", October, 2000.

PCR - M. Bech et. al. "Enabling Full Service Surrogates Using the Portable Channel Representation". March, 2001.

Version: 1.01 - March 12, 2001