HttpRange14Repaired

DRAFT, NOT READY FOR PUBLIC REVIEW

This page gives Jonathan Rees's draft amended version of the httpRange-14 resolution. As of 2011-10-20 the TAG has not endorsed this. For requirements see HttpRange14Requirements. For other options see HttpRange14Options.

The proposed advice

The TAG gives the community the following advice regarding the use of URIs, for the sake of interoperability:

If a 'hashless' URI admits a potential retrieval (i.e. if an HTTP GET request could correctly be answered with a 2xx response), then what one says about the URI's referent should be whatever one can say about any potential retrieval result.

(OR: the characteristics of the URI's referent should be the ones that are invariant among potential retrieval results. (Which do you like better?))

For example, if any valid retrieval would yield something that is a biography of Mahatma Ghandi, then the URI's referent should be taken to be something that is a biography of Mahatma Ghandi, and vice versa.

This advice supersedes clause (a) of the TAG's resolution of 15 June 2005.

Examples

If any valid retrieval would yield particular content and media type, then the URI effectively names that content. That is, anything that's true of the retrieval result would also be true of the referent: what its length is, what its 37th octet is, its media type, who wrote it (if anyone), and so on.

If any valid retrieval would yield something containing the address of Acme Web Inc., then the referent of the URI should be considered something containing the address of Acme Web Inc., even if the address is different from one retrieval to the next.

If there is a valid retrieval result, then the URI cannot (under this rule) refer to a person, since a person has a beating heart, and no retrieval result does.

(Something that's true of everything in a class of retrieval results (such as those that are correct for a URI) might be called an invariant of that set.)

Epistemology

How would you ever know whether any given invariant holds for the retrieval results that are correct for a given URI?

For http: URIs, as of this writing, whether a result is correct depends ultimately on the party that controls resolution for the domain name in the URI. So to say anything about the URI is to say something about what will be OK with that party. If you are that party, you control what happens, so you can make predictions. If not, then whether you're willing to say anything about the referent of the URI would depend on how reliable and/or predictable you think that agent is. Your judgment might depend on public pronouncements, such as a domain owner's declarations concerning retrieval results, or on contractual arrangements that hold the domain owner accountable should an invariant be violated.

For other kinds of URIs, other considerations may apply regarding what's correct, such as what an RFC says.

The rule only applies to correct retrievals. Security violations and bugs in the communication path (such as a buggy proxy) do not affect this rule.

RDF has no uniform theory of time, so we can say nothing about it here. Depending on context ,"any" might mean retrievals done pretty soon, within a year, at any point in the foreseeable future, etc.

Just as with any prediction about a physical system, it is possible to be wrong about what will be correct in the future. That doesn't mean such statements are not useful.

Communicating invariants

Invariants (what is true about the referent of a retrieval-enabled hashless URI) may be communicated by anyone, or not. The agent that controls retrieval is in a good position to formulate them, since it can bring them about, and may want to communicate them to others. They may do so via any appropriate channel, such as a site policy document. Two channels deserve special consideration:

One of the retrieval results may contain statements about the referent of the URI. If this is done it is important to be clear that the statement is an invariant and not just a same-document reference (references: 3986, XML). It is common practice to take a URI, when used in a retrieval result using that URI, to refer to that particular retrieval result. The latter may have properties not possessed by the intended referent of the URI according to the advice.
As an alternative to using retrieval results, the HTTP Link: header is available. The POWDER specification suggests using the 'describedby' relation to supply a document that explains what the URI is supposed to refer to - which in this case could include its retrieval invariants.
Another option is use of a .well-known URI (see blog post "new opportunities") but this remains to be developed.

What about all the deployed content that doesn't use retrieval-enabled hashless URIs in this way?

Agents reading (consuming) messages or documents that use URIs referentially are encouraged to be on the defense against uses of URIs by agents that are not aware of this advice, do not agree with it, or who simply make mistakes. This is nothing new; you can't trust all information that comes your way.

Ideally all content (or each use of a URI) would be flagged in some manner as being in conformance with the rule or not, as a way for consuming agents to hold producers accountable. We recommend that such opt-in and opt-out indicators be developed by the affected communities. But given the widespread and casual referential use of URIs, and the size of the deployed content base, clarity in every case will probably not be achievable.

What happened to "information resource"?

The idea is not helpful. We don't need to make a type distinction, we need to talk about particular properties of particular resources. This does not mean that ontologies involving things that have retrieval invariants are not allowable or not possible; indeed the advice provides strong constraints regarding how any such ontology ought to work. But it is not the purpose of this advice to provide an ontology.

Retrieval

This isn't specifically about HTTP or 200 responses; it's about what a hashless URI being retrieval-enabled means. This rule applies uniformly to a wide variety of URI schemes including data: mid: urn: ftp: gopher: and so on, and it is protocol independent in the same way that RFC 3986 is protocol independent where it talks about dereference and access.

What happened to the (b) and (c) clauses?

The httpRange-14 (b) clause talks about 303 responses. These are now uncontroversial, to the point of being documented in HTTPbis. Nothing more needs to be said.

The (c) clause provides no information, so repeating it also does nobody any good.

For those wanting URIs to use in some way other than the above, there are plenty of options. The TAG's favorite is hash URIs, but hashless http: URIs with 303, and URIs outside the http: scheme (such as mailto: or tag:) are other options.

A footnote regarding invariants

This section should be skipped by those not concerned with formal inference.

An invariant ("what one says about") is a one-place predicate, a proposition with a hole in it. While many predicates "work" as invariants per the advice, such as "has a beating heart" or "contains the letter e", there are some that need to be excluded. An example of the latter is "contains the letter e or does not contain the letter e". Certainly this predicate is true of each retrieval result, since each one either contains the letter e, or doesn't. Yet this predicate is not particularly meaningful when applied to the referent of a URI when some retrieval results contain the letter e and some don't. Does the referent of the URI contain the letter e, or not? If it does, then each retrieval result must, and that is false; similarly if it doesn't. Because of this inconsistency, this predicate must be excluded from the second-order quantification over predicates implied by the advice.

Acceptable predicates include nearly any atomic predicate (no logical connectives) that would "make sense" to ask about a retrieval result. A minimal list may be found here. Formalizing this, for the benefit of someone requiring a formalization, should be possible, but is beyond the scope of this note.

An analogy

Consider Cartesian plane geometry. Each point has certain characteristics, including its x coordinate, y coordinate, distance from origin, whether it's above the X axis, and so on. For any set of points, there will be characteristics that are invariant across the points in the set. For example, if the set is a vertical line x=3, then all points have x coordinate 3, so having x coordinate 3 is an invariant of that set. In this case we might say that the line has x coordinate 3 - that is, the invariant characteristic would be a characteristic of the line. Conversely, if someone said the line had x coordinate 3, then we would conclude that each point on the line had x coordinate 3.

Replace point with "potential retrieval result", line with "referent of the URI", having x coordinate 3 with "is a biography of Mahatma Ghandi".

Characteristics that relate to the line as a whole do not participate in this language game. For example, a line may slope m, but that does not imply that every point on that line has slope m.

A similar analogy is a random variable. One can think of retrieval as sampling from a distribution of potential retrieval results. If every sampling will yield a biography of Mahatma Ghandi, then the advice asks one to say that the random variable is such a biography, too. Perhaps an ontological purist might not speak in this manner, preferring to say that the characteristics of variables do not overlap with the characteristics of samples; such an individual would reject the advice presented here, since it too tries to make characteristics generic across point-like things and line-like things.

Essentially this is a question of whether knowledge states are considered part of the logic or part of the metalogic. Both approaches are consistent, so the choice ought to be an engineering decision, not an assessment of truth.

Background

Originally from this page.