HttpRange14Options

From W3C Wiki
Revision as of 19:43, 12 November 2011 by Jrees (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

DRAFT, NOT READY FOR PUBLIC REVIEW - use discussion page for discussion

I (Jonathan Rees) think most people involved are agreed that amending the 2005 httpRange-14 resolution somehow is a good idea. Doing so is our best chance of ending the confusion and bickering.

The issue is recorded as TAG issue 57. There is a call to make linked data more efficient and more easily deployed, and a natural question is why certain obvious solutions don't work. Please read this if you don't think there's a problem or don't know what it is.

Some of the material herein has been presented in this issue 57 writeup.

An amendment could retract the resolution, amplify it, or replace it with something that is stronger, or weaker, or incompatible.

An amendment or "change proposal" could be codified as another TAG resolution, a TAG finding, a W3C Architectural Recommendation, or TAG or W3C endorsement of a document produced by some other group.

Desiderata

There are no consensus requirements, but see HttpRange14Requirements for expressed requirements or desiderata. Roughly speaking the following seems to be what's wanted:

  • A you have to be able to refer to arbitrary documents/IRs that are on the web
  • B you have to be able to define URIs easily
  • C you have to be able to tell the two cases apart.

These are three different "yous": someone who wants to refer to a document, someone who wants the general public (message and document senders and receivers) to be able to use URIs in communication, and someone who wants to get an idea of what a URI refers to.

What an amendment needs to do

In order to anticipate questions that are otherwise likely to arise, I suggest that the text of any amendment provide the following:

  • A A procedure (informally specified) that, given any retrieval-enabled URI, yields the preferred way to refer to the information resource at that URI. A good faith definition of "the information resource at" must be given if the one given here is not used.
  • B Criteria for which URIs are 'definable'; and a procedure that, given a definition of (or description using) a 'definable' URI, yields instructions for deploying the definition/description so that it can be found given the URI.
  • C A procedure that, given any retrieval-enabled URI, classifies it as either (a) taken to refer to the information resource at that URI, (b) taken to refer to what a retrieval says it does, (c) both, (d) neither or unknown.

Number 2 may need some explaining. The idea is that depending on the amendment different kinds of URIs may be definable or not. The status quo is that the URI can be any http: URI whose HTTP behavior (after a fragid, if present, is removed) can be controlled.

Status quo

We could do nothing. Unfortunately the httpRange-14 resolution doesn't give a useable definition of 'information resource' and does not tell you which 'information resource' is 'identified' by a 2xx URI. Thus retrieval-enabled hashless URIs have no useful agreed meaning.

  • A Use blank node notation [ir:onWebAt "http://example/foo"] or reduce to the B case (hash or 303)
  • B Use hash or 303
  • C N/A, but observe that HTTP GET/2xx for an http: URI means the URI refers to an "information resource" - not clear how that helps, or which one

Minimal interventions

Option: No agreement i.e. repeal the resolution

Remember that the question is what general agreement around URI meaning are we, as a community, going to attempt. One option is "no agreement".

So what should one do if one believes that all bets are off regarding retrieval-enable hashless URIs - you can't assume the URI refers to the document/IR, and you can't assume it refers to what's described therein? In this case a retrieval, on its own, tells you nothing. You have to use some other method such as hash or 303.

  • A Use blank node notation [ir:onWebAt "http://example/foo"] or reduce to the B case (hash or 303)
  • B Use hash or 303
  • C N/A

Observe that this is effectively the same as the status quo.

Neutral: Each either one or the other

Many of the options offered below have the property that each retrieval-enabled hashless URI (REHU) is either used in the A case or in the B case (possibly both, if you get the same referent either way). The choice is globally uniform for each URI. In the present option, that is all you know - it is not even specified how that choice is determined.

In case the A and B referents are different and you need to know which is meant, that information would have to be communicated out of band.

If the two possible referents are the same (self-describing documents), or if it is known that the receiving agent will not care which is meant, then a REHU can be used without qualification and be understood. If not, then if the URI is used it all, it must be used in conjunction with other information. This is very similar to no agreement except in the case of self-describing documents.

What the test is for self-describingness is unclear. One could for instance say it's self-describing if there is an application/rdf+xml retrieval that contains the triple <U> ir:onWebAt "U", and not self-describing if either condition fails, but that would be both severe and misleading - there's a large gray area.

  • A If evidently self-describing, use the URI, otherwise revert to general methods (blank, hash, 303)
  • B If the definition/description is self-describing, then use the URI and deploy as usual, otherwise hash or 303
  • C If you get an evident self-description, then success, otherwise unknown

(I think I'm missing something there)

Of course this is vulnerable to confusions of the kind described here - disagreement over whether a particular URI is A-referring or B-referring.

This leaves open the possibility of additional agreements permitting the use of other REHUs. For example, a recurring suggestion is to use statements that are true under one interpretation and false under the other, such as those that imply class memberships, to convey which choice is meant. The details would need to be thrashed out as there are a number of pitfalls. The next two sections provide examples of such agreements.

Rehabilitated status quo and its enhancements

These candidates all have the property that all retrieval-enabled hashless URIs (REHUs) are A5-style (i.e. each refers, by agreement, to the document or IR at that URI).

Rehabilitated status quo

Most people seem to understand the httpRange-14(a) rule to say that it is not an arbitrary 'information resource' that is meant, but rather the one found at that URI. Unfortunately this is not codified in any widely accepted specification, so it would need to be written down. If it were, we would have

Details here: HttpRange14Repaired

  • A5 Use the URI (confirm and strengthen HR14a)
  • B use hash or 303
  • C2 All hashless URIs that are retrieval enabled (HTTP 2xx -> OK) refer to the document/IR at that URI (direct).

HTTP extension

The purpose here is to reduce the number of round trips relative to the current 303 case.

Like "Option prepared by JAR" but extend the HTTP protocol with a single-round-trip way to get the definition/description, either a new response status (209), or a new request (MGET).

Also in this class is the idea of returning the definition/description in the content of a 303 response (redundantly with deploying it at the 303 Location: URI); thanks Mark Nottingham.

  • A5 Use the URI
  • B4 Respond with a definition/description on MGET request, or using a 209 response to a GET (leave hash, 303 as alternatives)
  • C2 Retrieval-enabled means document/IR at that URI
  • D is met by MGET but not by 209 (unless 209 is used in conjunction with some new 209-evoking HTTP header)

Negotiate for instance vs. description

Option based on a new kind of content negotiation. Every retrieval result ("representation" sensu 3986) is either an instance ("representation" sensu TimBL) of the URI's referent or a description of it, and you can state a preference using content negotiation. Requires a change to "the information resource at" to exclude descriptions (i.e. metadata applies to instances but not descriptions). See http://www.ltg.ed.ac.uk/~ht/wantOther.html

  • A5 subject to defining "the information resource at" to exclude retrieval responses marked as being descriptions
  • B4 removes the 303 round trip since description can be returned immediately
  • C2 if there is a description then either (a) or (c). if not then (b) or (d). or something like that.
  • D is met

/.well-known/about/

Put descriptions under /.well-known/about/ (for some value of 'about'), or else deposit a file containing a rewrite rule in /.well-known/about (for some value of 'about'). In addition, whenever possible, deploy definition/description using 303 (redundantly) for FYN purposes.

Example: to 'define' the URI http://example/eq18, deploy the definition/description at the URI http://example/.well-known/about/eq18 .

  • A5 Use the URI
  • B5 See above. When providing a 303 - or informative 404 - is impossible (hosting case), we would be foregoing the "obvious" nose-following, which could be judged as demoting this proposal down to B4.
  • C2 Successful retrieval always means the document/IR at the URI
  • D is met

Documents that pretend

This section needs a lot of work...

This is the conjunctive case: We would agree that every REHU is both direct and indirect (i.e. self-describing).

No way, that would be too weird. Also redundant with either all-direct or all-indirect.

If a document uses its URI as a name in a description, pretend that the document is what's described. For example if the document at http://example/dog15 says "http://example/dog15" ex:mass 10, then pretend that the document is what it's describing (something with mass). Think of the document as a toy in a make-believe game.

This is harmless (maybe) as long as what's described either honestly is the document, or else can't be confused with the document. If it's on the edge - say, another document - then it will be impossible to distinguish, and use of blank node, hash, or 303 is required.

See discussion elsewhere, more, Sandro's blog post, TBL opposed.

  • A If self-describing, use the URI, otherwise revert to blank, hash, 303
  • B5 If self-describing, serve a description using 200. Otherwise use a hash or 303.
  • C? Always both direct and indirect. Think about this.

Hmm, for this to work every non-self-describing document would have to be removed from the Web...

Linked-data-preferred options

Take at face value (linked data only)

It has been suggested that to find out what a URI refers to, do a retrieval and read the retrieval result. That is, all REHUs are indirect. This gives us

  • A Use ir:onWebAt (blank node), hash, or 303
  • B5 Serve description in RDF using retrieval
  • C2 the URI refers to what a retrieval says it does

If, by reading the retrieval, you can determine that the URI refers to the document/IR at that URI, then it will appear that the A5 desideratum is supported for that URI.

"Take at face value" in its pure form is a radical view since we are "wasting" every retrieval-enabled hashless URI whose retrievals that does not have a "face value". We can reclaim some of these URIs, and perhaps others, for use as document-referring by agreeing on an enhanced C rule that detects those URIs that are not to be used with rule B5 (and also update the A rule to use those URIs). One such proposal follows.

The Ian Davis plan

Presented in Davis's blog post

  • A4 If no application/rdf+xml (and Content-location?), then use the URI; otherwise use Content-location (?? if this doesn't cover it then blank, hash, 303)
  • B5 Use 200 + application/rdf+xml . (Does Content-location work on all hosting services?)
  • C? GET with conneg for application/rdf+xml; if got that, and there is a Content-location, then as defined, otherwise the document at the URI

more here: HashlessUriContention

Other approaches

Ontology-based

(Similar to cautious approach, but allows a URI to refer to the document at the URI in some cases - perhaps when and only when there is only one instance (TimBL's "fixed resource" case). Waiting for Alan R to provide details.)

  • A (not sure ir:onWebAt would even make sense)
  • B? TBD
  • C? (not sure)
  • E by construction

tbd: / duri:

  • A? can use duri: if resource is stable, otherwise must use ir:onWebAt
  • B0 tdb:
  • C2 Look at the URI scheme ?
  • F by construction

Comparison

TBD: Matrix with one column for each putative-requirement group from HttpRange14Requirements, and one row for each alternative above.

Also need some discussion.

If we call this an Architectural Recommendation and give it a name like Nose-Following 1.0, and have some notion of "conforming agent", then we can start talking about which agents and documents conform to it and which ones don't, and we introduce the possibility of markers that signal conformance (or a willingness to be held accountable for conformance). This would be a neutral way to deal with non-conforming uses - people can just say "oh this agent doesn't conform to NF 1.0".

It would be nice to replace browbeating "you aren't complying with HR14" with polite request "it would be nice if you conformed to NF 1.0". Then bringing people in is much more like upgrading from one version of a spec to the next version.

A marker indicating intent to conform would be useful; the best place would be in an HTTP header. ... but then if there were such a marker maybe we could also use it as part of a solution.

The bottom line is that not all requirements can be met. Something has to give, and we have to figure out how to cope.