HCLSIG BioRDF Subgroup/Tasks/URI Best Practices/Recommendations/StatusOfHttpScheme
Status of the HTTP URI Scheme
Since the HTTP specification has come up a number of times recently, I thought I'd do my own research. It's not my purpose here to either defend or attack its use as proposed for the semantic web (much as I'm eager to do both); I just wanted to find out what the story is.
URI schemes are defined by IANA, which lists its scheme registry here: http://www.iana.org/assignments/uri-schemes.html If you look in here for HTTP, you find that the HTTP URI scheme is defined by the HTTP/1.1 specification. This document in turns says:
- A network data object or service that can be identified by a URI, as defined in section 3.2. Resources may be available in multiple representations (e.g. multiple languages, data formats, size, and resolutions) or vary in other ways.
It never comes out and says that the URI (or more precisely the request-URI together with the Host: header) is supposed to 'identify' a resource; it merely says that the resource to be [accessed] is identified (by the server) by looking at that information. If in the judgment of the server the URI doesn't identify a resource, it can do as it pleases, maybe.
I don't understand # yet, but am led to understand that it's unproblematic (for RDF content types), so I'll focus on 303.
Suppose that an HTTP URI 'refers to' a potato and that a server knows this. The specification for 303 says "The response to the request can be found under a different URI" - meaning a GET or other HTTP request. If the URI refers to (identifies; see below) a potato, then a 303 would be an assertion (according to RFC 2616) that a potato can handle a GET request, which is absurd.
405 Method Not Allowed would make sense, I think, except that it is defined to mean "is not allowed for the resource identified by the Request-URI" - but the case we're considering is that of a request-URI that doesn't refer to a resource at all.
So the status of 303 URIs is extremely murky from a standards viewpoint, I think. 2616 doesn't come out and say you can't use an HTTP URI to refer to something that's not a 'network data object or service', but it's difficult to avoid reading it that way.
AWWW's use of "resource" to mean "thing" ("anything", "entity", etc.) is blatantly inconsistent with RFC 2616. I believe the consensus now is that AWWW should be fixed.
AWWW's introduction of the term "information resource" was redundant since we already had "resource" from RFC 2616 and its predecessors in widespread use. To attempt a technical definition for a back-formation from 'URI' was valiant, but I think is not helping the cause.
(The phrase 'that can be identified by a URI' in the definition is vacuous, since a URI can identify anything at all, and in the right circumstances any 'identifier' can be used by anyone to identify anything at all.)
One way to put the proposed semantic web use of 303 responses on a sound footing with regard to standards would be to revise the HTTP spec. This could probably be done without too much difficulty (relative to what it took to create HTTP in the first place).
The spec all but says that HTTP is only about [network] resources. Our options regarding 303:
- start thinking about revising the HTTP spec to permit semweb 303s
- use # URIs instead
- look for URIs for non-[network-]resources outside the HTTP space
- not worry about it
- drive a wedge between identification and reference (see Pat Hayes's explanation, below)
I know this is not news - it's not my aim to stir all this up again, as most of what needs to be said has already been said. But I can't keep up with the TAG and semweb mailing lists where this is being discussed and thought others might be interested in this summary.
Pat Hayes responds:
Well, seems to me you over-interpret what http/1.1 spec says.
Let us agree that 'identify' in HTTP1.1 documentation, and 'denote' (aka 'refer to') are separate notions. Then HTTP1.1 says nothing about what URIs refer to. HttpRange-14 however says that for information resources, reference and identification must coincide (which retrospectively blesses the traditional confusion between these notions in this technical literature.)
In your example, we know that a URI refers to a potato. OK, but that says nothing about what it identifies. HttpRange-14 says that the http endpoint for this URI ought to redirect it to some other resource which can emit a representation which somehow explains what the first URI does refer to. Give temporary names to all these things:
Then the following should hold, according to httpRange-14:
- URI1 refers to potato
- URI1 identifies endp
- URI2 identifies redir
- URI2 refers to redir (since redir is an 'information resource,' the kind that HTTP1.1 spec is talking about, so reference and identification coincide here)
and, hopefully, redir emits representations which explain the first of these facts.
But nothing here says that potato is the same as endp, or that a vegetable is handling a GET request. One could describe the situation as follows: the potato's name identifies a thing which one might call the potato's computational doppelganger: a network entity whose sole function is to catch any attempts to identify the potato, and toss them to another thing which can return some useful information about the potato. It could of course do this itself were it not for the unfortunate fact that, because of httpRange-14, this would probably confuse [one] into thinking that it actually was the potato.
Pat's rewrite of the httpRange-14 resolution, improving an earlier one by JAR's:
- If a network resource responds to a GET request with a 2xx response, then that URI refers to that network resource;
- If ... with a 303 response, then that URI refers to something (and the response should be helpful in determining what it refers to);
- If ... with a 4xx response, then nothing is known about whether the URI refers to anything, or what it refers to.
JAR wants to clarify that "that URI refers..." here means that "assuming the HTTP server is speaking on behalf of the URI owner, then the response code implies that the URI owner desires for that URI to refer to ...", or something like that.