Editing the Web

Detecting the Lost Update Problem Using Unreserved Checkout

W3C Note May 10 1999

This version:: http://www.w3.org/1999/04/Editing/01
Latest version:: http://www.w3.org/1999/04/Editing/
Authors:: Henrik Frystyk Nielsen, <frystyk@w3.org>, W3C
Daniel LaLiberte, <laliberte@w3.org>, W3C

Copyright © 1999 W3C (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.

Status of this document

This document is a W3C Note describing how HTTP/1.1 can be used to detect the lost update problem using unreserved checkout and hence avoid that edits are lost when multiple users are editing documents remotely on the Web. This document is a NOTE made available by the W3C for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by this NOTE.

Although not a work item of the IETF WebDAV working group, Jim Whitehead has authorized that comments regarding this document be sent to the IETF WebDAV <w3c-dist-auth@w3.org> mailing list (archive). Information on how to subscribe and unsubscribe can be found at the subscription request page.

Abstract

Avoiding the lost update problem has been a notorious challenge when editing documents remotely on the Web using HTTP/1.0. While WebDAV provides an extended set of services for editing the Web, HTTP/1.1 provides a minimal set of hooks for avoiding the lost problem by detecting when versions have changed so that changes aren't lost in the editing process. While simple, these hooks are fundamental to editing the Web using HTTP/1.1 and are needed in Webdav as well.

This Note explains a) how to use HTTP/1.1 to detect the lost update problem using preconditions and strong etags and b) how to avoid problems with HTTP/1.0 clients that do not know about these features but only use plain HTTP PUT requests. Neither a) nor b) requires any changes to HTTP/1.1, but can be achieved using existing features.

The mechanism has been implemented in Web Commander and Amaya (both using libwww), and Jigsaw - all W3C Open Source software freely available to all interested parties.

Detection is only one of several ways to avoid the lost update problem and this document discusses the pros and cons of various other mechanisms including exclusive locks and immutable revisions.

1. Introduction

2. HTTP Etags and Friends

3. Protocol Interactions

3.1 Saving a Document Not Known to Exist
3.2 Saving a Document Known to Exist
3.3 Handling Conflicts

Appendix A

A.1 Trace of Saving a Document Not Known to Exist - without etag mismatch
A.2 Trace of Saving a Document Not Known to Exist - with etag mismatch
A.3 Trace of Saving a Document Known to Exist - without etag mismatch
A.4 Trace of Saving a Document Known to Exist - with etag mismatch

1. Introduction

The lost update problem has been present in most distributed authoring environments using rudimentary HTTP/1.0 features. The lost update problem can be illustrated as follows:

Ron accesses document X using HTTP GET and starts editing it.
Shirley also accesses document X using HTTP GET and starts editing it.
Ron saves his edits.
Shirley saves her edits but as she didn't see Ron's edits, they are lost in the operation.

There are different ways to solve the lost update problem - each with a varying level of complexity. Some (not necessarily) mutually exclusive solutions commonly used include:

Out-of-band communications and social agreements (OBC): This solution doesn't involve any explicit protocol mechanism for handling editing in a multi-user environment but relies on human policies and agreements. This obviously doesn't scale very well and often leads to misunderstandings in how to handle conflicts when they occur. This is really the only supported mode in HTTP/1.0.
Unreserved Checkout with Automatic Detection and manual resolution of conflicts at checkin (UCAD): The effect of this solution would be that when Shirley tries to save her edits in step 4), potentially causing her to override Ron's edits, the server would not allow the operation to succeed but instead issue an error message. Typically, Shirley will then handle the merge manually and redo the save operation when done. This is the default behavior of CVS, for example, where the merge is performed semi-automatically leaving conflicts to be solved by the user.
Early warning of potential future conflicts (EW): Early warning can be implemented as "watch flags" as for example is the case in later versions of CVS or as shared locks in WebDAV. While watch flags or shared locks do ensure that update doesn't happen unnoticed, it in fact does not prevent the lost update problem by itself as holders of a shared lock or watch flag still can step on each other's edits. Therefore, watch flags are often used in connection with UCAD.
Reserved Checkout (RC): The principle behind exclusive resource locking or reserved checkout is to allow only one person to edit a resource at a time. If Ron had used an exclusive lock, for example, then Shirley may have been able to get the document for reading but if she decided to start editing it, she would have failed getting a lock on the document as it was already held by Ron. In order for Shirley to edit the document, Ron would have to first release the lock.

People accustomed to unreserved checkouts often find reserved checkouts to be too heavyweight and restrictive. People used to reserved checkouts can't live with the possibility of dealing with uncertainty of unreserved checkouts. In practice, the most suited mechanism depends on the content being edited and the circumstances under which it is being edited. Some considerations involve:

Is the content mergeable?: If the content is not mergeable then attempting to merge edits can be a very tedious operation and it may be easier to use reserved checkout rather than unreserved checkout with merging.
Is the editing expected to be localized to isolated points in the document or spread out throughout the whole document?: If the edits are spread throughout the document (for example pretty printing an HTML document) then a merge may be difficult to perform but if the edits are localized, then merging may be straightforward.
Is the content being edited while the user is offline?: Requesting a lock on a resource requires contacting the server responsible for granting the lock. This means that the user must be online when deciding for which resources to request locks. In a multi-user environment this may unnecessarily block other users waiting for the locks to be released.

In many environments, exclusive locks are not really exclusive - often there are provisions for breaking a lock under certain conditions, for example by timeouts or for administrative reasons. Some reasons for this are that people tend to forget to release locks on documents that they decided not to edit after all, or the client can loose connectivity and not be able to release the lock. Because locks can not always be expected to be exclusive, being able to detect the lost update problem is still necessary. This is for example the case with exclusive locks in WebDAV which require the mechanism described in this document for detecting the lost update problem.

This document shows how HTTP/1.1 can be used to automatically detect conflicts and leave the resolution to be handled manually by the user. It is by no means the intent to imply that this is the only solution ever needed, but its main advantage is that it is very lightweight and doesn't prohibit more complex merging or versioning operations to be implemented on top.

Editing is often used in connection with version control systems that allow previous as well as concurrent revisions of a document to be exported as first class objects. The Delta-V IETF working group is attacking this problem along with a model for how to merge parallel branches. One way to extend the mechanism described in this document to support versioning would be to generalize the etags to be first class objects (make them URIs). (We believe it in fact is a bug in the design of etags that they are not first class objects).

2. HTTP Etags and Friends

In order to achieve our goal of detecting an update conflict, we are using HTTP/1.1 features including persistent cache, strong etags, the various if-* preconditions header field and HEAD requests. Note that we do not attempt to perform any merge operation - only detection. Here is a short description of how we use these features:

Strong Etag: A strong etag is a unique identifier for a particular representation (bag of bytes) of a resource given the URI that resulted in that particular representation to be returned in a response. That is, an etag is only unique within the scope of that URI. The client uses etags in a set of preconditions that must be met in order for the operation to succeed.
Preconditions: Preconditions are the family of If-* header fields including If-Match, If-None-Match etc. The semantics of preconditions are that the request -- treated atomically -- can not succeed if the precondition is not met. Preconditions are often used in GET cache validation requests but here we use preconditions together with strong etags in order to ensure that we detect changes and don't loose edits. A precondition can for example be: only do this if your current etag matches the one I send you in this request.
Persistent Cache: We use a persistent cache on the client side to store etags for as long as we need to know anything about the document so that we can always send it back again when we are going to save the changed document.
HEAD Request: HEAD requests are used to check whether or not the resource already exists. If it does then we can expect some sort of 2xx response, and if not then it should be 404 (Not Found). The OPTIONS method could also be used here but we prefer the HEAD request as it is more likely supported by HTTP/1.0 servers.
PUT Request: Of course we use a PUT request to save or create the document on the server

3. Protocol Interactions

In the following we will look at two scenarios:

The user saves a document for the first time. This document may or may not already exist - the user doesn't know which.
The user saves a document for a second or subsequent time. Now the user knows that the document exists but doesn't know whether somebody else has been editing it.

Because this document is meant as a "hands-on" guide on how to do this, we show on-the-wire HTTP/1.1 messages illustrating the solution at the protocol level.

3.1 Saving a Document Not Known to Exist

If the user wants to save a document not known to exist, the client first to verify whether this is correct or not. This is done by issuing a HEAD request before doing the actual save using a PUT request. The HEAD request is only necessary if the client does not know whether we are speaking to an HTTP/1.0 or an HTTP/1.1 server - only the latter understands etags.

There are two situations that have to be handled:

The document does not already exist: The first trace shows a situation where the document does not already exist and the PUT request is executed normally
The document already exists: The second trace shows a situation where the document already exists. The user is given the option to download the existing version or to override it as described in section 3.3. In the trace, the user chooses to override the existing version which is done by replacing the If-None-Match: * precondition in a second PUT request with an If-Match with the etag of the existing resource. That way, we know exactly which revision we are replacing and avoid any race condition between the HEAD request and the following PUT request.

3.2 Saving a Document Known to Exist

When a new document has been created on the server, the server responds with a 201 (Created) response including the etag of the created resource. This etag is stored in the client's persistent HTTP/1.1 cache. Once the etag is known, it is used on all subsequent PUT requests in a If-match header field. If the document has not changed, the etags will match, and the PUT can proceed. However, if the document has changed, the etags will not match and the PUT can not proceed.

Again, there are two situations that have to be dealt with:

The document has not been changed: This trace shows a PUT request with an If-Match header field. In this situation the etag matches and the PUT request is executed as normal.
The document has been changed: The last trace shows a situation where someone updated the document and so the etags do not match anymore. The PUT fails and the user is presented with the option of downloading the new version or overriding the existing version as described in section 3.3. If the user chooses to override the existing version, then a second PUT request is issued with an If-None-Match header field with the same etag used in the first PUT request in the If-Match header field. In both cases, the new etag returned in the response from the server is saved in the persistent cache and the old etag is deleted.

3.3 Handling Conflicts

The current implementation in the libwww Web Commander is very simple: When a conflict is detected, either because a precondition fails or a HEAD request indicates that a resource already exists, the user is presented with two choices:

download the latest revision from the server so that the user can perform a merge using some independent mechanism; or
override the existing version on the server with the one that the client has.

If the user wants to override the existing revision on the server, a second PUT request is issued. Depending on whether the document initially was known to exist or not, the client may either

If known to exist, issue a new PUT request which includes an If-None-Match header field with the same etag as was used in the If-match header field in the first PUT request, or
If not known, issue a new PUT request which includes an If-Match with the etag of the existing resource on the server (this etag was recieved in the response to the initial HEAD request). This could also have been achieved by resubmitting the PUT request without a precondition. However, the advantage of using the precondition is that the server can block all PUT requests without any preconditions as such requests are guaranteed to come from old clients without knowledge of etags and preconditions.

A more sophisticated solution could involve the server attempting to merge the content on the fly and if done cleanly then just accept the request or alternatively send the diffs to the client but this requires features not directly supported in HTTP/1.1.

Appendix A

This appendix contains on the wire traces of HTTP between the libwww Web Commander and the W3C Jigsaw server. The traces illustrates various scenarios of how the lost update problem is detected in HTTP/1.1.

A.1 Trace of Saving a Document Not Known to Exist - without etag mismatch

The first trace shows a situation where the document doesn't exist and the PUT request is executed normally

A.2 Trace of Saving a Document Not Known to Exist - with etag mismatch

This trace shows a situation where the document already exists. The user is given the option to download the existing version or to override it as described in section 3.3. In the trace the user overrides the existing version which is done by replacing the If-None-Match: * precondition in the PUT request with a If-Match with the etag of the existing resource.

A.3 Trace of Saving a Document Known to Exist - without etag mismatch

This trace shows a PUT request with an If-Match header field. In this situation the etag matches and the PUT request is executed as normal.

A.4 Trace of Saving a Document Known to Exist - with etag mismatch

The last trace shows a situation where someone else updated the document and so the etags don't match anymore. The PUT request fails and the user is presented with the option of downloading the new version of overriding the existing one as described in section 3.3. In the latter case, we override the existing version by including a If-None-Match header field with the same etag as we used in the If-Match header field above. In both cases, the new etag sent back is saved in the persistent cache and the old etag deleted.

Henrik Frystyk Nielsen, W3C,
@(#) $Id: Overview.html,v 1.53 1999/05/10 21:02:12 frystyk Exp $