NEW ISSUE: weak validator: definition inconsistent from Werner Baumann on 2007-12-29 (ietf-http-wg@w3.org from October to December 2007)

From: Werner Baumann <werner.baumann@onlinehome.de>
Date: Sat, 29 Dec 2007 17:57:22 +0100
To: ietf-http-wg@w3.org
Message-ID: <47767C72.6030507@onlinehome.de>
 From 13.3.3 Weak and Strong Validators:

    Entity tags are normally "strong validators," but the protocol
    provides a mechanism to tag an entity tag as "weak." One can think
    of a strong validator as one that changes whenever the bits of an
    entity changes, while a weak value changes whenever the meaning of
    an entity changes. Alternatively, one can think of a strong validator
    as part of an identifier for a specific entity, while a weak
    validator is part of an identifier for a set of semantically
    equivalent entities.

      Note: One example of a strong validator is an integer that is
        incremented in stable storage every time an entity is changed.

        An entity's modification time, if represented with one-second
        resolution, could be a weak validator, since it is possible that
        the resource might be modified twice during a single second.

While in paragraph 1 "weak validator" is defined in terms of semantic 
equivalence, paragraph 3 qualifies modification time as "weak 
validator". But the second modification of a file within the same second 
may change the file into anything. There is no means to guarantee 
semantic equivalence in this case. Both this paragraphs are mutual 
exclusive.

The reason for this is the abstraction "weak validator" itself.
While "validator" is a good abstraction from the details of 
Last-Modified and Etag, and also "strong validator" is quite clear, this 
can't work for "weak".

"weak validator" tries do build a common abstraction from two different, 
completely unrelated kinds of "weakness".

Weak etags: the weakness is not to guarantee byte-equivalence, but they 
guarantee semantic equivalence. Of course, the server needs some concept 
of semantic equivalence build in, to use weak etags. (Oh, and it would 
be fine, if the client would have the same idea about semantics.)

Last-Modified date: the weakness is the limited time resolution. It is 
*unreliable* (or not a validator at all), unless it meets some extra 
conditions. There is no concept of semantic equivalence whatsoever.

On consequence are the strange restrictions on "weak validators". 
Clients must only use them in conditional (full body) GET requests. This 
is reasonable for Last-Modified (if it does not meet the additional 
restrictions), but not at all justified for weak etags.

The only reasonable restriction on weak etags is not to use them in 
range requests. But a PUT with If-Match: W/"xxx" is perfectly ok.

I suggest to remove the term "weak validator" from the spec. Validator 
is either a Last-Modified Date or an Etag. Etags can be strong or weak.
I should be made clear, that weak etags ore only meant to validate 
semantic equivalence and it should be clear, that everything said about 
semantic equivalence is related to weak etags.

Practical issue:
Apache misuses weak etags when it can not create a strong one, due to 
the limited time resolution (and mtime is the main component of Apache's 
etags). This etags will *never* match. (IIS seems to do something 
similar.) Although I'm sure, this is not what weak etags are intended 
for, one could use the inconsistent definition in the spec to justify 
this (one has to be either a lawyer or a programmer to do so).

I don't know, if there is any application, that uses weak etags as they 
are intended (for validating semantic equivalence). But if there is, or 
will be, the above misuse will most likely create interoperability 
problems. WebDAV-clients (e.g. davfs2) already have problems to work 
around this wrong "weak etags".

Werner
Received on Saturday, 29 December 2007 16:57:55 UTC