Re: Comments on draft "baseline" httprange-14 replacement from David Booth on 2012-02-28 (www-tag@w3.org from February 2012)

From: David Booth <david@dbooth.org>
Date: Tue, 28 Feb 2012 16:21:35 -0500
To: Jonathan A Rees <rees@mumble.net>
Cc: www-tag <www-tag@w3.org>
Message-ID: <1330464095.10438.54121.camel@dbooth-laptop>
Hi Jonathan,

Regarding the revision proposed in
http://dbooth.org/2012/awwsw/uddp-2012-02-21.doc 
The following document now contains my latest (and most readable)
proposed revision of the "baseline" document:
http://dbooth.org/2012/awwsw/uddp-latest.doc

Substantial responses and explanations below . . .

On Mon, 2012-02-27 at 14:53 -0500, Jonathan A Rees wrote:
> On Tue, Feb 21, 2012 at 3:24 AM, David Booth <david@dbooth.org> wrote:
> > Hi Jonathan,
> >
> > Attached are two versions of the "baseline" document
> > http://www.w3.org/2001/tag/doc/uddp/

FYI I refer below to the version that I reviewed as the 17-Feb-2012
draft, i.e.:
http://www.w3.org/2001/tag/doc/uddp-20120217/

> > containing my suggested changes.  They are in MS Word format (produced
> > by OpenOffice).  One has "track changes" turned on, and shows
> > (hopefully) all the changes.  The other simply shows the document as a
> > whole with the changes incorporated.  (The formatting is a little screwy
> > in this latter version because of "holes" left by deleted portions.)
> 
> While I appreciate the work that went into this, the upwards of 200
> changes indicated by track changes are too much for me to process. If
> the original text had been preserved for comparison that would have
> helped; all I see in the track-changes version is "deleted" at the
> side in a way that's very hard to read.
> 
> In fact I see also that many changes are not marked at all, which
> makes processing even more difficult. Effectively I have to treat your
> document as entirely new. In the time I have I simply can't compare it
> to mine in detail - that would take days.

Hmm, it sounds like the "Track Changes" feature did not work correctly.
However, MS Word and OpenOffice both have features for comparing two
documents to see what has changed.  I have run a comparison to
re-generate all the differences and posted the differences here:
http://dbooth.org/2012/awwsw/uddp-2012-02-21-changes.doc 
Hopefully that will show them all.

> 
> I suggest you turn your suggestions into a list of issues that can be
> treated and decided independently. We can then run these in parallel
> with dealing with ISSUE-57.

We could, and at first I started doing that.  But there were so many
changes that I had to suggest, it seemed far more efficient and more
readable to show them all as a resulting cohesive document.  And
honestly, I think the result would act as a considerably better
"baseline" than the 17-Feb-2012 draft.

> 
> > The changes that I am suggesting are of two kinds: (1) substantive
> > changes to better align this "baseline" document with the intent
> > of the existing httpRange-14 resolution and background specifications;
> > and (2) editorial changes to improve the readability and clarity
> > of the document.
> 
> I could not identify any substantive changes, i.e. any that would
> change the behavior of someone consulting the document, in a quick
> scan of your document. If you could list these that would be helpful.

Drat, I guess I should have kept a list of them.  :(  You'll probably
have to read the whole document to get them all.  But the document is
worth reading anyway, as a substantial overall revision that I think is
significantly clearer, more to-the-point, and easier to read than the
17-Feb-2012 draft. 

Here are the substantive changes that I remember making:

1. Restricting the scope of this specification to http and https URIs,
since: (a) that is what httpRange-14 covers, and this specification is
intended as a replacement for the httpRange-14 resolution; and (b) it
would be very hard to make this document clear enough if it were to
cover all URIs.

2. This addition:
[[
This document adopts the [webarch] definition of “information resource”,
with the notable exception that in this document, the set of information
resources is not defined to be disjoint with any other set of resources.
]]
which I explained: "This is not implied by the httpRange-14 resolution,
but simply reflects my own suggestion [for resolving the issue of what
definition of 'information resource' to use].  As such, it could be
omitted and left to a later change proposal."

3. This addition:
[[
If the URI owner provides multiple URI definitions during the same time
period for a target URI, then all such URI definitions apply i.e., the
effective URI definition applicable during that time period is the
conjunction of all such URI definitions. This can happen, for example,
through content negotiation or by providing multiple “Link” headers.
This specification does not constrain the bounds of such a time period. 
]]

4. The 17-Apr-2012 draft
http://www.w3.org/2001/tag/doc/uddp-20120217/
is ambiguous about whether a "nominal URI documentation carrier"
actually contains a URI definition.  I have changed this to attempt to
be unambiguous.  Note that this was not an editorial ambiguity, but an
*intentional* ambiguity, as reflected in statements such as:
[[
Applying the adjective "nominal" is a technicality that signifies that
being a URI documentation carrier for the URI is expected according to
this specification, but that it might not actually be one (for example,
if someone has made a mistake, documentation might not be found where it
is expected).
]]
and:
[[
Note that this practice incurs vulnerability to mischief on the part of
the U2's URI owner as well as to U1's - the eventual URI documentation
may not be correct and may not even be "authoritative" in any agreed
sense.
]]
and:
[[
Receivers should approach nominal URI documentation carriers with
skepticism and seek independent assurance of their consistency with what
their interlocutors have consulted.
]]

These repeated attempts to distinguish between content that might be
construed as a URI definition (by a consuming agent following this
protocol) and the *real* URI definition (in reality, in the real world)
appear to reflect a pervasive misunderstanding of the role of this
protocol specification and the way the semantic web works.

The job of this protocol specification is to unambiguously specify what
constitutes a URI definition _according_to_this_specification_ -- not
what constitutes any "real" URI definition in real life (which is
outside the scope of this protocol).  If someone uses a URI definition
in their application that is completely different than the one defined
under this protocol specification, then that is their business.   Or if
a URI owner screwed up and published a completely different URI
definition than the one he/she intended to publish, it is still a URI
definition _under_this_protocol_.  This protocol should not mistakenly
attempt to govern the "real" URI definition or even distinguish between
any "real" URI definition and the URI definition
_as_specified_under_this_protocol, because this protocol has nothing
whatsoever to say about any "real" URI definition.

The semantic web is architected such that the real life "meaning" of a
URI and the real life truth of any RDF statements are *completely*
irrelevant to the architecture.  The architecture functions just as well
whether you interpret a particular RDF statement as being the ultimate
truth about life, the universe and everything, or you interpret it as
being complete nonsense.

This misunderstanding of the way the semantic web works is what I've
been calling Myth #4:
http://dbooth.org/2010/ambiguity/paper.html#myth4
I realize that not everyone may agree with me, but the fact is that the
architecture of the semantic web *cannot* work any other way, because:
(a) there is no machine algorithm for distinguishing truth from falsity;
and (b) a given semantic web application does not *need* truth, it just
needs RDF assertions that allow it to generate the right output.

5. The 17-Apr-2012 draft seems to indicate that a "Link:" header is an
*alternative* to the URI definition that it implicitly conveyed by an
HTTP 200 status code.  I have changed this such that *both* the Link
header and the 200 status code convey URI definitions, i.e., both apply,
for two reasons: (1) paragraph "(a)" of the current httpRange-14
resolution has no such exemption:
http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039 
and (2) the client should not be required to know about processing the
Link header, and thus the client should be able to use the implicit URI
definition conveyed by the 200 status code even if the Link header may
lead to a more detailed definition.

> 
> > I included them both at once: (a) because it was easier for me to
> > make a single pass through the document, improving everything
> > that I thought I could improve; and (b) because I thought it would
> > be more helpful to others to see how the document as a whole
> > would read with these changes.
> >
> > However, if you felt more comfortable doing so, you could just
> > incorporate the substantive changes now and leave the
> > editorial changes for me to make as a change proposal,
> > in an additional step.
> >
> > Some further explanations that were too long
> > to embed in the attached documents:
> >
> > 1. This document is partially written as though it is applicable
> > to URIs in general, but it really only gets into the necessary
> > detail to cover http: (and https:) URIs.  I think it would
> > be better to only attempt to cover http: (and https:) URIs,
> > since those are the ones that the httpRange-14 issue bears on.
> 
> The document *is* applicable to hash URIs in general, not just those
> where the stem URI is an http: URI. This is because it derives from
> RFC 3986, which is not specific to http:.

AFAIK this document arose from unhappiness about the httpRange-14
resolution and the LOD community's mixed conformance to that resolution,
and it is intended to supersede that resolution.

> 
> It easily *could* be applicable to hashless non-http: URIs since the
> architectural notions of dereference, retrieval, representation, etc.
> are not specific to http:.  

Agreed.  But I think it will be quite difficult to write it this
generally.

> The HTTP protocol is not specific to http:
> URIs, either, 

Agreed, but it is widely used with the http and https schemes.

> and a change proposal I might suggest is to remove this
> restriction, which I see no need for. The httpRange-14 idea is a
> consequence not of HTTP or http:, but of retrieval, as far as I can
> tell right now. (But I have asked the HTTP WG for clarification that
> bears on this, we'll see what they say.)

Agreed in principle, but again, I just think you're trying to bite off
way too much by fully generalizing it.

> 
> In TAG discussion this subject has always been treated as an
> architectural problem, which to me means that it is not special to the
> http: scheme. Certainly the problem to be solved is not specific
> either to http: or to HTTP; they are just the vehicles for the
> proposed solution.

As a piece of counter evidence, the TAG apparently considered
httpRange-14 to be an architectural issue, and that clearly was about
the http scheme (and presumably implicitly https).

> 
> I continue to be open to taking back to the TAG the proposal to
> restrict the discussion to http: (or HTTP, I am not sure which), or to
> move the whole thing out of the TAG as being non-architectural. This
> to me makes it more confusing, but so it goes. I suggest we take this
> up at a later time, after progress has been made on ISSUE-57.

AFAICT, nobody is experiencing pain outside of the http (or https) and
HTTP case, so we would be trying to fix a problem that could exist in
theory, but that nobody is actually experiencing.

> 
> > 2. The title, "Understanding URI Hosting Practice as Support
> > for Documentation Discovery" is too vague.  It sounds like it
> > would apply to *any* kind of documentation whatsoever, when
> > really it is only about URI definitions.
> 
> Hmm, I see what you mean, maybe "URI" should occur twice in the title?
> Will think about this.
> 
> > 3. The current draft seems to be unnecessarily obsessed with
> > differentiating statements conveyed under this protocol with
> > truth in real life, as though such statements would otherwise
> > be taken as truth.  This shows up in a number of places,
> > such as: (a) in its concern about the term "authoritative"
> > (as though "authoritative" imparts any authority beyond this
> > specification);
> 
> This was a specific response to complaints that a document like this
> cannot be authoritative for meaning, or even for anything that might
> be called a documentation or definition, just as HTTP cannot be
> authoritative for what is or isn't a representation of something. What
> is correct regarding meaning and definition is application specific. I
> agreed and sought ways to address this concern, which needs to be
> addressed *somehow*.

Why would anyone even think that a document like this *could* be
authoritative for meaning, or could dictate what URI definition an
application chooses to use in the privacy of its own data space?

Perhaps this confusion is happening because the document was not being
adequately framed as a *protocol* specification.  After all, nobody in
their right mind would construe the HTTP protocol specification as being
authoritative for what constitutes a "representation" in any sense
outside of the HTTP protocol.  No protocol specification can do that.

At present, the 17-Apr-2012 draft wavers between confusing the protocol
scope with the real world, and repeatedly trying to differentiate them.

I think if the document as a whole is framed properly (as a protocol
specification) and something like the following is prominently placed at
the beginning (perhaps in bold) that should be adequate:
[[
Although this specification defines a protocol for providing and
discovering a URI definition,this specification is not concerned with
the interpretation or “meaning” of a URI definition that is conveyed.
This protocol makes no claim whatsoever about the truth or falsity of
any statements contained in a URI definition, nor does it dictate
whether or how an application must use such statements. Such questions
are outside the scope of this protocol.
]]
FYI, I have included this in bold in my latest revision:
http://dbooth.org/2012/awwsw/uddp-latest.doc 

> 
> I agree that it can be authoritative for what is or isn't a NUDC, but
> it does so just by saying which representations are NUDCs.
> 
> Will note this as an issue.
> 
> > (b) in its use of the term "URI documentation"
> > instead of "URI definition" (as though "definition" would imply
> > too much);
> 
> This was a specific response to complaints that "definition" was too
> strong and too specific, that I agreed with.

Right, but as I've explained above, I believe that is misguided.

> 
> There are other aspects to URI documentation other than definition,
> such as stability information.

True, but the text already says that other information may be included
with a URI definition "without any particular demarcation between the
documentation for that URI and the other information".  And the main
focus is the URI definition.

Clearly this is an editorial call, but once we get over the hump of
misunderstanding that I explained at length above and below (i.e., the
confusion about this protocol having anything to say about anything
*outside* of this protocol), then I think it will be clearer that "URI
definition" is far more to the point than "URI declaration".

> 
> > (c) in its use of the qualifying word "nominal",
> > as in "nominal URI documentation carrier" (as though there is a
> > need to distinguish the URI definition conveyed by this protocol
> > from the *real* URI definition); and
> 
> You have misparsed this as {nominal URI documentation} carrier, where
> it is clearly (I think) meant as nominal {URI documentation carrier}.
> Clearly you can get one of these things and have it not carry any URI
> documentation at all, in which case it would not be a URI
> documentation carrier.  It is only nominally so (like "allegedly" in
> the newspapers). 

No, no, no.  Again, this is making the same fundamental mistake that I
explained above under #4, of confusing the scope of this protocol with
the real world.  The point is that when the algorithm defined by this
protocol produces something that is supposed to be a URI definition,
then that thing damn well *is* a URI definition *by* *fiat*
(_under_this_protocol_).  There is no "allegedly"; there is nothing
wishy-washy about it -- no "maybe it does contain one and maybe it
doesn't".  The protocol defines that thing as being a URI definition,
regardless of what it contains -- garbage or otherwise.

Granted, the algorithm defined by the protocol could have a special case
for empty representations, because that case can be clearly defined.
But it *cannot* state anything like:
[[
A "URI documentation carrier" for a URI is a representation that carries
URI documentation that bears on the meaning of that URI.
]]
because there is no algorithmic way to decide whether a given
representation "bears on the meaning of that URI" or not.

> I'm not sure how to make this more clear, and I see
> no rhetorical way to get rid of the qualifier "nominal".
> 
> Maybe if I introduced some acronyms the grouping might be apparent -
> introduce UDC, and then "nominal UDC", then shorten to "NUDC". But I
> think you misread it because you anticipated a particular problem, and
> most readers won't have this anticipatory difficulty.
> 
> Will note this as an issue.
> 
> > (d) in its mention that
> > "URI documentation may not be correct".
> 
> But it might not be correct, in solving the problem that was set out,
> which is to coordinate communication between a sender and a receiver.
> The sender might use the URI at variance with the documentation, the
> documentation host might change the documentation, other conventions
> might apply locally (as they do in OWL, Memento, and maybe even HTTP),
> and so on.
> 
> Again, this was added explicitly in order to address concerns of other
> reviewers.

I think I've already addressed this adequately above.

> 
> > 4. The abstract includes the disclaimer: 'There is no
> > intention that the set of specified circumstances should be
> > either "authoritative" or exclusive of other sources of URI
> > documentation.'  I think this is unhelpful, counterproductive,
> > and should be deleted, for these reasons:
> >
> >  - A protocol specification can and *should* state
> > what bits are to be considered authoritative
> > _under_that_protocol_.  The fact that something is authoritative
> > _under_a_particular_protocol_ does not automatically make it
> > "authoritative" in any legal or other sense.  Defining something
> > as "authoritative" under a particular protocol is simply a means
> > of describing an algorithm for choosing between potentially
> > conflicting pieces of information.  If it is useful for a
> > protocol to describe an algorithm in terms of authoritative
> > versus non-authoritative information, then that is fine to do.
> > For example, the HTTP 1.1 specification defines "authoritative"
> > entity headers.  Nobody confuses that as conveying any sort
> > of legal or other authority beyond the scope of that protocol
> > definition.
> >
> >  - If a protocol specification gains such uptake in the
> > community that conformance eventually becomes a legally binding
> > social expectation, then that is a social and legal matter that
> > falls entirely outside the scope of the specification itself.
> > The specification itself should not attempt to make claims
> > one way or the other about such matters.
> >
> >  - It sounds disparaging to the specification itself, to
> > include such a disclaimer.  By comparison, the HTTP 1.1
> > specification does not say that "There is no intention that
> > this protocol be "authoritative" or exclusive of other transport
> > protocols".
> >
> >  - The point of recommending a particular protocol is that we
> > *want* the community to rally around one particular protocol
> > (which may itself involve a wide range of options), rather
> > than having the chaos of multiple conflicting protocols.
> > The disclaimer undermines this purpose.
> >
> >  - This "baseline" draft is supposed to convey the intent of the
> > current httpRange-14 decision, but the httpRange-14 decision
> > has nothing like this disclaimer.  It appears to originate
> > from personal opinion.
> 
> I will take this up in future TAG discussion of the document.
> 
> By the way I personally find "best practice" notes to be, most of the
> time, pretentious and will resist them in any document I am
> responsible for.

Okay, I have changed them to be called "Good Practice" notes, in
accordance with the AWWW:
http://www.w3.org/TR/webarch/#app-principles

> 
> I won't be able to make all reviewers happy. My scan of your changes
> indicate mostly editorial disagreement with other reviewers. 

As I hope you now see, the differences go much deeper than that.  There
are fundamental misunderstandings underlying the editorial style that
pervades the 17-Feb-2012 draft.

> The
> document is crafted based on input I've received and mostly I don't
> think I've taken positions unilaterally. To make progress I think
> we'll have to start some kind of issue tracking for the document, with
> changes given not as a long list of track-changes edits but as issues
> with alternatives to weigh. I will look into this.

Honestly, I think the latest revision that I have provided here:
http://dbooth.org/2012/awwsw/uddp-latest.doc 
would constitute a substantially better starting point than the
17-Feb-2012 draft.  I appreciate that this may mean that there would be
some grunt work needed to put it into XML source form or some such, but
that should be a secondary consideration, and I'd be willing to help
with that if desired.


-- 
David Booth, Ph.D.
http://dbooth.org/

Opinions expressed herein are those of the author and do not necessarily
reflect those of his employer.
Received on Tuesday, 28 February 2012 21:22:01 UTC