ESI Invalidation Protocol 1.0

W3C Note 04 August 2001

This version:: http://www.w3.org/TR/2001/NOTE-esi-invp-20010804
Latest version:: http://www.w3.org/TR/esi-invp
Authors:: Larry Jacobs, Oracle; Gary Ling, Oracle; Xiang Liu, Oracle

Abstract

This specification defines the ESI Invalidation Protocol, to allow for tight coherence between origin serves and surrogates (also know as "Reverse Proxies").

Status of this document

This document is part of a submission to the World Wide Web Consortium (see Submission Request, W3C Staff Comment) that outlines an approach to scaling the Web infrastructure. Comments to the authors are welcome, but you are also encouraged to share your views on the W3C publicly archived www-talk mailing list <www-talk@w3.org>. For a full list of all acknowledged Submissions, please see Acknowledged Submissions to W3C.

This document is a NOTE made available by the W3C for discussion only. Publication of this Note by W3C indicates no endorsement by W3C or the W3C Team, or any W3C Members. No W3C resources were or are allocated to the issues addressed by the NOTE. W3C has had no editorial control over the preparation of this NOTE.

A list of current W3C technical documents can be found at the Technical Reports page.

Introduction

This document describes invalidation protocols for doing ESI invalidation of cached page documents. An invalidation request is an XML document sent over HTTP using HTTP/1.1 POST method. The syntax for an invalidation request is dictated by WCSinvalidation.dtd, the DTD (Document Type Definition) file which is defined later.

There is no official HTTP port reserved for invalidation yet, but there should be effort to petition for an official invalidation HTTP port. An implementation uses port 4001 for example.

Invalidation request authentication is done by using HTTP/1.1 basic authentication. An invalidation account should be created for this purpose. In one implementation, all invalidation is done by using the identity of a special account called "invalidator".

Invalidation request. The body of the invalidation is a valid XML document, in which a list of one or more invalidation objects is given. An invalidation object consists of a selector-action pair which means for all page documents that match the selector, action should be applied. An invalidation object completes successfully if the action is successfully applied to all the documents selected. An entire invalidation request is successful if all of its invalidation objects complete successfully.

If no page document is selected in an invalidation object, the invalidation object is still considered successful in the sense that it is over an empty set of page documents.

Invalidation response. Invalidation response is sent over HTTP as well. If the invalidation request is successful, the HTTP response code is 200, and the body of the invalidation response is a valid XML document, in which a list of invalidation object results is given-one for each corresponding invalidation object in the invalidation request. An invalidation object result consists of a selector-result pair, in which selector is the same as that of the corresponding invalidation object and result details the status of invalidation for this invalidation object.

If the invalidation fails, a non-200 HTTP response code is returned. And the body of invalidation response is just an HTTP message, describing further error message.

Invalidation DTD for Requests and Responses

One of the goals of designing the invalidation DTD is to make it as general as possible so that it can accommodate wide range of requirements and future development.

A valid invalidation message has to be a valid XML instance in the first place. Therefore, the first line of the document has to be

<?xml version="1.0"?>

Note that no white space is allowed before "<".

Through the document, any literal use of ampersand "&", less-than "<", greater-than ">", double-quote""", and apostrophe "'" has to be escaped with "&", "<", ">", """ and "'" respectively. For more information, please refer to XML standards.

The following shows the entire content of WCSinvalidation.dtd.

<!-- WCSinvalidation.dtd is the DTD file that describes a valid

-- Webcache invalidation XML message for request and response.

-->

<!ELEMENT INVALIDATION (SYSTEM?,OBJECT+)>

<!ATTLIST INVALIDATION

VERSION CDATA #REQUIRED

<!ELEMENT SYSTEM (SYSTEMINFO+)>

<!ELEMENT SYSTEMINFO EMPTY>

<!ATTLIST SYSTEMINFO

NAME CDATA #REQUIRED

VALUE CDATA #IMPLIED

<!ELEMENT OBJECT ((BASICSELECTOR|ADVANCEDSELECTOR), ACTION)>

<!ELEMENT BASICSELECTOR EMPTY>

<!ATTLIST BASICSELECTOR

URI CDATA #REQUIRED

<!ELEMENT ADVANCEDSELECTOR (COOKIE|HEADER|OTHER)*>

<!ATTLIST ADVANCEDSELECTOR

URIPREFIX CDATA #REQUIRED

HOST CDATA #IMPLIED

URIEXP CDATA #IMPLIED

METHOD CDATA #IMPLIED

BODYEXP CDATA #IMPLIED

<!ELEMENT COOKIE EMPTY>

<!ATTLIST COOKIE

NAME CDATA #REQUIRED

VALUE CDATA #IMPLIED

<!ELEMENT HEADER EMPTY>

<!ATTLIST HEADER

NAME CDATA #REQUIRED

VALUE CDATA #IMPLIED

<!ELEMENT OTHER EMPTY>

<!ATTLIST OTHER

TYPE CDATA #REQUIRED

NAME CDATA #REQUIRED

VALUE CDATA #IMPLIED

<!ELEMENT ACTION EMPTY>

<!ATTLIST ACTION

REMOVALTTL CDATA #IMPLIED

<!ELEMENT INVALIDATIONRESULT (SYSTEM?, OBJECTRESULT+)>

<!ATTLIST INVALIDATION

VERSION CDATA #REQUIRED

<!ELEMENT OBJECTRESULT ((BASICSELECTOR|ADVANCEDSELECTOR), RESULT)>

<!ELEMENT RESULT EMPTY>

<!ATTLIST RESULT

ID CDATA #REQUIRED

STATUS CDATA #REQUIRED

NUMINV CDATA #REQUIRED

The body of a valid invalidation request should begin with

<?xml version="1.0" ?>

<!DOCTYPE INVALIDATION SYSTEM "internal:///WCSinvalidation.dtd">

Therefore the root element for an invalidation request must be INVALIDATION, which contains a list of one or more OBJECT elements.

And the body of the invalidation response begins with

<?xml version="1.0"?>

<!DOCTYPE INVALIDATIONRESULT SYSTEM "internal:///WCSinvalidation.dtd">

Likewise, the root element for an invalidation response must be INVALIDATIONRESULT, which contains a list of one or more OBJECTRESULT elements, each one of which matches one OBJECT element in the invalidation request.

Both INVALIDATION and INVALIDATIONRESULT take VESION as attribute to denote what version the WCSinvalidation.dtd is being used as XML document type.

Also note that the SYSTEM element is optional and intended for use to send system information by any implementation that supports WCSinvalidation.dtd. Each implementation can choose to use it or ignore it since it is optional.

The meanings for all the elements and attributes are discussed below.

Invalidation Selectors

An invalidation selector defines a set of page documents upon which the invalidation action is to be performed. There are two types of selectors: basic selectors and advanced selectors.

Basic. A basic selector contains an exact HTTP URI, which specifies what page document to be invalidated. The definitions for URI and its comparison are found in HTTP/1.1 specification. It is easy to see that a basic selector can select either none or exactly one page object. An implementation chooses to ignore the host name in URI and only keeps the abs_path. For definition of abs_path, also refer to HTTP/1.1 specification.

Advanced. An advanced selector is defined based on the following attributes and elements:

URIPREFIX (required)	URIPREFIX attribute specifies a URI path prefix and therefore it must begin and end with a slash '/'. Same as with basic URI, if host name is present in URIPREFIX, it is extracted (and may be ignored in some implementation). This required field specifies that in the set of page documents that the current selector defines, all documents have this common path prefix. Any subsequent attributes and elements are used in conjunction with URIPREFIX to further limit the matching of set of page documents. If no more element is specified, invalidation simply means the expiration of all page documents with the same path prefix.
URIEXP (optional)	URIEXP attribute is a regular expression and its scope is the abs_path part of the URI. Please refer to HTTP specification for more definitions. Therefore if the URI contains the following regular expression special characters literally, they need to be escaped. Reserved regular expression characters include period ".", question mark "?", asterisk "", bar "\|", parentheses "()", brackets "[]", curly braces "{}", carat "^", dollar sign "$", and backslash "\". Caveat: many people confuse Unix shell filename shorthand with regular expression. For example, when you type "ls -l .c" under shell, the last part, ".c" is really not* regular expression. Please refer to regular expression specification for more. Since URIEXP field is used in conjunction with URIPREFIX, regular expression matching is done only to the page objects having the same URIPREFIX path prefix. Therefore users should be cautious not to supply a URIEXP field that is contradictory to the URIPREFIX field, especially when URIEXP contains the use of "^". Contradiction yields an empty set of page objects to be invalidated-which is always successful.
HOST (optional)	HOST attribute specifies host name. It should contain same information as in HTTP/1.1 host header. Or it can also be extracted from URI, URIPREFIX and/or URIEXP if any one of them contains host name. There should not be any conflict.
METHOD (optional)	METHOD attribute describes which HTTP method is used. Currently it is either GET or POST. And when this field is blank, it defaults to GET.
BODYEXP (optional)	BODYEXP attribute is the regular expression matching the HTTP request body. It is only meaningful when METHOD field is POST. Same as with URIEXP, BODYEXP field cannot contain the special regular expression characters literally-they have to be escaped. When specifying the using of "^", users also need to be careful not to define empty set of invalidation objects by accident.
COOKIE (optional)	COOKIE elements contains NAME and VALUE attributes to describe an HTTP cookie. One advanced selector can have zero or more cookies. When a cookie is given, its name cannot be blank. The use of value varies depending on what kind of cookie it is. If the cookie is a session cookie, whatever the value is given, it is ignored. Otherwise, value is used literally to match against the cookies that the page objects have. Only objects having the common URIPREFIX path prefix are examined with COOKIE field.
HEADER (optional)	HEADER element contains NAME and VALUE attributes to describe an HTTP/1.1 header. One advanced selector can have zero or more headers. When a header is given, its name cannot be blank. Only objects having the common URIPREFIX path prefix are examined with HEADER field.
OTHER (optional)	OTHER element is designed to support any other type of aspects for a page document other than COOKIE and HEADER. It has 3 attributes which define what TYPE, NAME and VALUE this aspect is/has. Each implementation has the freedom to utilize this element at their convenience.

The advanced selector is more sophisticated than the basic selector. Its descriptive capability in URLs is as powerful as regular expression itself. In fact, a basic selector can be expressed in the form of the advanced selector.

For example, suppose the URI in basic selector is "/p1/p2/p3/file.htm", then in the equivalent advanced selector, the URIPREFIX is "/p1/p2/p3/" and the URIEXP is "^/p1/p2/p3/file.htm$". If you choose to specify "/" for the URIPREFIX, it is still okay. Since the regular expression is done against the set of page objects containing the common URIPREFIX path prefix, it is obvious that the smaller the set, the more efficient the invalidation.

Invalidation Actions

An invalidation action consists of an implicit immediate expiration and an optional removal TTL (or time-to-live). The removal TTL is non-negative number in seconds. If the removal TTL is missing, then it means to remove the selected page documents on demand in the next appropriate cycle.

Since in one invalidation XML message there can be more than one invalidation object, it is possible that some page documents are affected by more than one action if they are selected by more than one invalidation objects. In this situation, the earliest removal time always prevails.

Invalidation Results

An invalidation result contains 3 fields and they are all of string type.

ID is a sequence number to disambiguate the selectors.

Since it is legitimate but not common for invalidation objects in invalidation request to contain the same selector on purpose or by accident, ID field is used to tell them apart in the invalidation response.

STATUS

STATUS is a string to describe the status for the action given to Webcache to invalidate what selector selects.

Examples of STATUS value can be "SUCCESS", "URI NOT FOUND", "URI NOT CACHEABLE", etc.

NUMINV

NUMINV is a number to describe how many page documents are successfully invalidated.

Invalidation Clients

Any client that is capable of sending and receiving HTTP messages can be used to do invalidation clients. The simplest client is Telnet. Here is a walk-through of the transcript for an invalidation request and response done by using Telnet program in one invalidation implementation.

1	POST /x-invalidate HTTP/1.0
2	Authorization: Basic aW52YWxpZGF0b3I6aW52YWxpZGF0b3I=
3	Content-Length: 217
4
5	<?xml version="1.0" ?>
6	<!DOCTYPE INVALIDATION SYSTEM "invalidation.dtd">
7	<INVALIDATION VERSION="WCS-1.0">
8	<OBJECT>
9	<BASICSELECTOR URI="/cache.htm" />
10	<ACTION />
11	</OBJECT>
12	</INVALIDATION>

Line 1 specifies using HTTP/1.0 POST method to request "/x-invalidate". All invalidation requests must contain a URI that the implementation chooses to use.

Line 2 specifies using HTTP basic authentication and "aW52YWxpZGF0b3I6aW52YWxpZGF0b3I=" really is "invalidator:invalidator" after the base64 encoding.

Line 3 specifies the content length.

Line 4 is the separator line.

Line 5 declares the following HTTP body contains an XML document.

Line 6 specifies the root element of the XML document to be INVALIDATION and its definition of document type is given by WCSinvalidation.dtd.

Lines 7 ~ 12 contains one invalidation object which intends to invalidate "/cache.htm" if it is in the cache.

Here is the response for a successful invalidation of "/cache.htm".

1	HTTP/1.1 200 OK
2	Date: Sun, 22 Apr 2001 07:54:09 GMT
3	Allow: GET, HEAD
4	Server: Webserver/2.0.0.0.0
5	Content-Type: text/html
6	Content-Length: 284
7
8	<?xml version="1.0"?>
9	<!DOCTYPE INVALIDATIONRESULT SYSTEM "invalidation.dtd">
10	<INVALIDATIONRESULT VERSION="WCS-1.0">
11	<OBJECTRESULT>
12	<BASICSELECTOR URI="/cache.htm" />
13	<RESULT ID="1" STATUS="SUCCESS " NUMINV="1"/>
14	</OBJECTRESULT>
15	</INVALIDATIONRESULT>

Lines 1 contains the 200 HTTP response code to denote successful invalidation.

Line 2 ~ 7 are other HTTP headers. Please refer to HTTP and ESI standards for more details.

Line 7 is the separator line between headers and body.

Line 8 declares that the HTTP body is an XML document.

Line 9 specifies the root element of the XML document to be INVALIDATIONRESULT and its definition of document type is also given by WCSinvalidation.dtd.

Lines 10 ~ 15 contains one invalidation object result, which says the action specified against the selection is successful, and there is 1 match for the selection in the cache.