Understanding URI Hosting Practice as Support for Documentation Discovery

1 Introduction

This document defines the URI Documentation Discovery 1.0 protocol, or "UDDP 1.0" for short. The protocol is to be used for communication between an agent who controls resolution behavior for some URI (the "probe" URI) and wants to establish a meaning for the URI, and other agents interested in knowing that meaning. The protocol allows the first agent to provide documentation to the other agents that is supposed to establish the desired meaning.

General agreement on the meaning of a URI is useful for purposes of interoperability, since without agreement it becomes necessary for applications to understand different meanings for different uses of a URI. Such context tracking, when it is possible, can be fragile, complex, and confusing.

The uses targeted here are those involving notations such as RDF [rdf-concepts] (and languages layered on RDF) in which declarative URI meaning figures centrally, but other languages and notations that treat URIs as having meaning are not excluded.

Although framed as a new protocol, UDDP 1.0 in fact merely records the way in which Web architecture [webarch] and the so-called httpRange-14 resolution [issue-14-resolved] have been interpreted in common practice. The "Cool URIs for the Semantic Web" note [cooluris] is another presentation of this architecture.

The intepretation in practice of [issue-14-resolved] goes beyond what a literal reading would imply. Differences are noted in a separate section at the end of this document.

The document does not define "meaning", "reference", or "identification" in any absolute sense, nor is there any implication that documentation found via UDDP 1.0 is either "authoritative" or exclusive of other sources of documentation. ^[1]

1.1 Historical note

This document is part of a conversation first started in 2002 around the declarative meaning of URIs. At the time two different conventions were proposed for the declarative use of URIs. One convention, inherited from the hypertext Web, was for a hashless URI to refer to the document-like entity ("information resource") served at that URI. This convention collided with a separate desire to use a URI to refer to an entity described by that information resource. Which use would, or should, have priority was not clear at the time. After deliberation, the TAG adopted its so-called httpRange-14 resolution [issue-14-resolved], asking the community to use hashless URIs to refer to their information resources, not to what those information resources describe. An exception allowed a hashless URI to refer according to a description in the case where no information resource was served at the URI, as signalled by a 303 HTTP response.

A parallel question for URIs with fragment identifier arose, but was easier to settle, since in any given case there was no ambiguity: either the URI was tied to a description, or it was tied to a document fragment, the choice being dictated by the media type of the response to a retrieval request on the "stem" URI (without the fragment identifier).

With the growth of linked data [linked-data], some resistance to the architecture has been expressed. Reports of hash URIs being unacceptable in some situations, coupled with performance difficulties arising from the 303 redirection and the impossibility of deploying 303 redirects at all on many Web hosting services, have led to the current reexamination of the architecture.

1.2 URI documentation

URI documentation is information whose purpose is to document the intended meaning of a particular probe URI. URI documentation may be transmitted along with other information, such as documentation for other URIs, without any particular demarcation between the documentation for that URI and the other information. A typical example might be an ontology document in which one finds integral documentation for a set of URIs. The ontology document serves as URI documentation for a number of URIs at the same time.

URI meaning is subject to normative specifications such as RFC 3986 [rfc3986] and applicable URI scheme registrations and media type registrations. The purpose of URI documentation is to provide URI-specific information that goes beyond what the normative spefications say, while retaining compatibility with them. URI documentation should not be written that is inconsistent with constraints imposed by these specifications.

URI documentation typically takes the form of a set of statements in which the probe URI occurs. The statements, by saying what is supposed to be true of the entity named by the probe URI, are meant to communicate the probe URI's intended meaning - what that entity is.

1.3 Retrieval

As described in RFC 3986 [rfc3986], retrieval is an operation that starts with a URI and, when successful, yields a retrieval result (or "representation").

Retrieval may be requested using a variety of protocols and APIs. The GET request in the HTTP protocol [rfc2616] is one way to request retrieval. A 200 status in a response to a GET request indicates a successful retrieval. Other HTTP status codes, such as 304, relate to retrieval in ways documented by the protocol specification.

For purposes of this document, retrieval entails following redirect chains (HTTP status 301, 302, and 307). That is, if retrieval is requested using a URI U1, and a GET specifying U1 yields a redirect to U2, and a retrieval request using U2 yields a result R, then R is the result of the retrieval request using U1. For the sake of timely termination, redirect chains are limited in length by the HTTP protocol.

Like 410 and various other HTTP status codes, a response to an HTTP GET request that has status code 303 indicates an unsuccessful retrieval.

It is customary to speak of retrieval of a representation of the state of the resource "identified" by the URI, but this is not informative unless we know something about that resource and its states, and what constitutes correct "representation". To avoid confusion, we will speak only of "retrieval using a URI", not "retrieval of a representation of the state of a resource".

1.4 Stability considerations

Consider the situation where a sender S composes a message (or document, or "representation") M containing a URI U, and sends it to a receiver R (or leaves it somewhere for R to find). S may choose to use the UDDP 1.0 protocol to learn how to use U in M, and R may choose to use the UDDP 1.0 protocol as a way of understanding the use of U in M.

However, it is possible that the protocol will deliver different URI documentation in the two instances. Because of this, R should use UDDP 1.0 only when there is a reasonable expectation that the meaning of U (as reflected in the retrieved URI documentation) has remained the same, or is only inconsequentially different, across the time interval spanning S's use of UPPD 1.0 in composing M and R's use of UDDP 1.0 in interpreting M.

2 Probe URI with local identifier

Editorial note

If the purpose of this document is to provide a baseline against which httpRange-14 change proposals are to be written, why talk about fragment identifiers at all? After all they're not mentioned in [issue-14-resolved]. Answer: If this is going to serve the community, and especially if it is to go to Rec track, it had better be complete by some criterion, and fragment ids are an important part of the documentation discovery story. Better to put the whole URI documentation discovery story under one roof, as opposed to requiring separate reports for hashful and hashless URIs.

The syntax 'stem#id' has come to be used not just for document fragment references but for any reference determined relative to content found at 'stem'. Therefore the present document refers to 'id' as a 'local identifier' rather than a 'fragment identifier'. The two expressions may be considered synonymous but with distinct connotations.

This section is not intended to provide new information, but merely to reinforce what RFC 3986 [rfc3986], AWWW [webarch], and related documents such as RFC 3870 [rfc3870] and RDF Concepts [rdf-concepts] already say.

The following language from [rfc3986] bears on the semantics of local identifiers:

The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type [RFC2046] of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced.

A consequence of this is that if there are multiple simultaneous representations then they need to be consistent in what they convey, if a local identifier is to be meaningful beyond a single representation:

[webarch]

That is, if two retrieval results (representations) assign meanings to a given local identifier, the meanings must be the same.

2.1 General case

When the probe URI has the form 'stem#id', and the media type of the result of a retrieval using 'stem' establishes an association between the local identifier 'id' and URI documentation carried in the retrieval result, then retrieval result should provide URI documentation for 'stem#id' per UDDP 1.0.

Normal HTTP user-agent behavior implements this part of UDDP 1.0, as ordinary retrieval behavior of 'stem#id' involves doing a retrieval using 'stem'.

When more than one distinct retrieval result is possible, every result must carry the URI documentation, since any of the possible results might be the one that is retrieved.

The delegation of local identifier semantics to the content of the retrieval result may be made either directly in a media type registration or by a chain of normative references. For media type application/rdf+xml, this is accomplished by language in the media type registration and normative references therein. For media type application/xhtml+xml, delegation is accomplished via the XML namespace document [xhtml-ns], which leads one (via RDFa) to the algorithm for extracting an RDF graph from the XHTML markup, and so on.^[2]

2.2 Document fragment reference case

Local identifiers can also get their meaning in other ways, such as format specifications that specify that certain local identifiers "identify" document parts (fragments). For example, the @name attribute in HTML binds the local identifier to its enclosing HTML element (assuming consistency among representations).

Editorial note
This might be the place to talk about local identifiers that refer to different fragments in different conneg variants, which ought to be fine as long as they are "equivalent" (say, one is a translation of another). But this seems like a can of worms and I'm not sure what to say about it.

3 Probe URI lacking local identifier

3.1 General case (probe URI not retrieval-enabled)

Editorial note
This section says more or less what HTTPbis is expected to say when it is eventually published. That language in turn derives from the (b) clause of [issue-14-resolved]. What is here is expected to stand with little controversy, but some of the change proposals under discussion create new ways to connect URI documentation to a hashless URI.

If a retrieval request using the probe URI leads to a 'See Other' link with target V, where V is another URI, then results of retrieval requests using V must carry URI documentation for the probe URI.

In the HTTP protocol, a response (to a GET request) with status 303 See Other is interpreted (according to UDDP 1.0) as a 'See Other' link with the value of the response's Location: header as its target (V).

Usually the probe URI is an http: URI, but this is not a requirement. Use of the HTTP protocol with non-http: URIs is legitimate (but rare).

There is no type restriction on what the probe URI refers to in this case. It can refer to whatever the URI documentation specifies, which could be (and often is) an "information resource".

3.2 Information resource reference (probe URI retrieval-enabled)

Editorial note
This section is the controversial one: the (a) clause of [issue-14-resolved]. Some parties are lobbying for retrieval results in some circumstances to constitute URI documentation, rather than instances of an information resource that the URI refers to.

When retrieval is enabled for a URI (i.e. can legitimately succeed), then each retrieval using the URI is considered equivalent to URI documentation that says that the referent of the probe URI is an "information resource" and the retrieval result is a "representation" of the "state" of that information resource. [webarch]

Different retrievals can yield different results under different circumstances, but that makes this no less true. The information resource is then whatever is common to all such retrievals. The distinct retrieval results are not taken to be distinct referents of the URI, but rather distinct instances of a common information resource.

If the retrieval request also yields a 'describedby' link, then results of retrievals using the link target URI provide additional URI documentation for the probe URI. In the HTTP protocol, a 'describedby' link would be given as a header of the following form in the HTTP response: [rfc5988] [powder]

Link: <http://example.com/uri-documentation>; rel="describedby"

4 Signalling uses of the protocol

Many protocols and formats include a specific indicator of the protocol being used. For example, every HTTP request or response contains an HTTP protocol version number, and each XML document starts with an XML version number. UDDP 1.0 has no such indicator. However, UDDP 1.0 combines elements of existing protocols in a manner that is compatible with current practice. Therefore no indicator is necessary.

5 Comparison with the TAG resolution

The critical parts of the TAG resolution [issue-14-resolved] are as follows:

a) If an "http" resource responds to a GET request with a 2xx response, then the resource identified by that URI is an information resource;

b) If an "http" resource responds to a GET request with a 303 (See Other) response, then the resource identified by that URI could be any resource;

The wording "an http resource responds" has not been interpreted literally, as implying that some resources are inherently "http resources" that have the capability of "responding". Rather it is understood that it is the HTTP server that is doing the "responding" and that the resources in question are merely those that are named (identified) by http: URIs.

The (a) clause has been understood as saying that the resource in question is not just any old information resource, but in particular the one whose instances (representations) are the possible results of retrievals using the URI.

The purpose of a 2xx HTTP status code is to signal successful retrieval (per [rfc3986]), but the HTTP protocol is only one way to perform a retrieval. Moreover, retrieval applies to some resources named by URIs other than http: URIs, for example those named by ftp: and data: URIs. In order to harmonize UDDP 1.0 with the architecture articulated in [rfc3986], the editor has therefore unilaterally made the obvious generalizations beyond the http: URI scheme and the HTTP protocol.

In the 303 case, the (b) clause does not say anything about which resource is "identified", but it has been taken as strongly suggesting the convention that the See Other link is to documentation meant to establish what the probe URI means.

The resolution says nothing about what is to happen in the case of redirect chains (HTTP status 301, 302, 307). Applications have generally taken a redirect as a statement that the URI and its redirect target URI have the same meaning. Of course redirection figures into the provenance of any information received, and therefore may affect trust judgments.

6 Acknowledgments

Larry Masinter and Henry S. Thompson gave valuable advice on drafts of this document. Many of the ideas grew out of work done by the TAG's AWWSW Task Group.

7 References

cooluris: Leo Sauermann and Richard Cyganiak. Cool URIs for the Semantic Web. W3C Interest Group Note, 03 December 2008. (See http://www.w3.org/TR/2008/NOTE-cooluris-20081203/.)
issue-14-resolved: Roy Fielding. [httpRange-14] Resolved. Email to www-tag list, 2005. (See http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html.)
linked-data: Tim Berners-Lee. Linked Data. Design note, June 2009. (See http://www.w3.org/DesignIssues/LinkedData.html.)
powder: Phil Archer, Kevin Smith, and Andrea Perego, editors. Protocol for Web Description Resources (POWDER): Description Resources. W3C Recommendation, 1 September 2009. (See http://www.w3.org/TR/powder-dr/#appD.)
rdf-concepts: Graham Klyne and Jeremy J. Carroll, editors. Resource Description Framework (RDF): Concepts and Abstract Syntax. W3C Recommendation, 10 February 2004. (See http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/.)
rfc2616: R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1. RFC 2616, IETF, 1999. (See http://www.ietf.org/rfc/rfc2616.txt.)
rfc3870: A. Swartz. application/rdf+xml Media Type Registration. RFC 3870, IETF, 2004. (See http://www.ietf.org/rfc/rfc3870.txt.)
rfc3986: T. Berners-Lee, R. Fielding, L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. RFC 3986, IETF, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)
rfc5988: M. Nottingham. Web Linking. RFC 5988, IETF, 2010. (See http://www.ietf.org/rfc/rfc5988.txt.)
webarch: Ian Jacobs and Norman Walsh, editors. Architecture of the World Wide Web, Volume One. W3C Recommendation, December 2004. (See http://www.w3.org/TR/webarch/.)
xhtml-ns: XHTML namespace document. (See http://www.w3.org/1999/xhtml.)

End Notes

[1]

In the philosophy of language, meaning and reference are distinct properties of linguistic tokens. For example, when the word "now" is used at two different times, it refers to different times in the two instances, without any change in meaning. Meaning and reference coincide in the case of proper names.

URIs are used in a variety of ways. When a URI appears to refer to something, especially in a declaration or statement that says that what it refers to has some type or has properties with particular values, this is a referential use of the URI.

This document does not take a stand as to whether uses of a URI as a hypertext link, XML namespace indicator, HTTP request URI, or HTTP Location: or Link: target are referential.

It is customary to speak of a URI as "identifying" a "resource". Although "identification" is one kind of meaning, this document makes no assumption regarding the relation between the resource that a URI "identifies" and what the URI refers to (when it does both). (One might hope, however, that except in rare cases a URI would refer to what it identifies.) Depending on what is meant by "resource" it may or may not be possible to refer to something that isn't a resource, but no assumption is made here that what a URI refers to is a resource.

[2]

Quoting the media type registration for application/rdf+xml: [rfc3870]

In RDF, the thing identified by a URI with fragment identifier does not necessarily bear any particular relationship to the thing identified by the URI alone. This differs from some readings of the URI specification, so attention is recommended when creating new RDF terms which use fragment identifiers. More details on RDF's treatment of fragment identifiers can be found in the section "Fragment Identifiers" of the RDF Concepts document.

When a URI with local identifier occurs in an RDF graph, the following passage from RDF Concepts [rdf-concepts] applies to its meaning:

"a URI reference in an RDF graph is treated with respect to the MIME type application/rdf+xml [RDF-MIME-TYPE]. Given an RDF URI reference consisting of an hashless URI and a fragment identifier, the fragment identifer identifies the same thing that it does in an application/rdf+xml representation of the resource identified by the hashless URI component."

This simply reinforces the representation consistency directive quoted just above. However if there is no application/rdf+xml representation (i.e. such a retrieval result is not allowed) this makes any URI meaning coming from, say, RDFa or some XML-based MIME type registration, out of reach of RDF. To make this consistent with UDDP 1.0 we must assume that when a URI with local identifier is used in an RDF graph, an equivalent application/rdf+xml representation is authorized for the stem URI, even if such a representation is never delivered in an actual retrieval request.

Editorial note
Is there talk of amending this passage when RDF Concepts gets revised?