Understanding URI Hosting Practice as Support for Documentation Discovery

Editor's Draft 2 February 2012

This version:
http://www.w3.org/2001/tag/doc/uddp-20120202/
Latest version:
http://www.w3.org/2001/tag/doc/uddp/
Previous version:
http://www.w3.org/2001/tag/2011/12/uddp/
Editor:
Jonathan A. Rees <rees@mumble.net>

This document is also available in these non-normative formats: XML.


Abstract

This document defines version 1.0 of the URI Documentation Discovery protocol, or "UDDP 1.0". The protocol is a way to ask the agent that controls resolution behavior for a URI what it thinks the URI means or should mean.

Status of this Document

This document is an editor's copy that has no official standing.

It is intended that some successor to this document will supersede the Technical Architecture Group's so-called "httpRange-14 resolution" [issue-14-resolved].

The main purpose of the present version of this document is to provide a baseline against which change proposals may be prepared. To that end this version is limited to recording the editor's attempt to interpret the so-called "httpRange-14 resolution" [issue-14-resolved] against the background of applicable specifications.

The TAG has not yet determined what editorial track this document will take. It might end up on Architectural Recommendation track (discussion here), or it could just end up as a Finding or Note, or it could be transferred to a different venue. A decision will be reached at some point following the collection of change proposals.

Table of Contents

1 Introduction
    1.1 Historical note
    1.2 URI documentation
    1.3 Retrieval
2 Probe URI with local identifier
    2.1 General case
    2.2 Document fragment reference case
3 Probe URI lacking local identifier
    3.1 General case (probe URI is not retrieval-enabled)
    3.2 Information resource reference (probe URI is retrieval-enabled)
4 Signalling uses of the protocol
5 Stability considerations
6 Comparison with the TAG resolution
7 Acknowledgments
8 References
9 Change log

End Notes


1 Introduction

This document defines the URI Documentation Discovery 1.0 protocol, or "UDDP 1.0" for short. The protocol is to be used for communication between an agent who controls resolution behavior for some URI (the "probe" URI) and wants to establish a meaning for the URI, and other agents interested in knowing that meaning. The protocol allows the first agent to provide documentation to the other agents that is supposed to establish the desired meaning of the URI.

General agreement on the meaning of a URI is useful for purposes of interoperability, since without agreement it becomes necessary for applications to understand different meanings in different spheres of use of a URI. Such context tracking, when it is possible, can be fragile, complex, and confusing.

The uses targeted here are those involving notations such as RDF [rdf-concepts] (and languages layered on RDF) in which declarative URI meaning figures centrally, but other languages and notations not excluded.

Although framed as a new protocol, UDDP 1.0 in fact merely records a best effort interpretation of the so-called "httpRange-14 resolution" [issue-14-resolved], with the relevant specifications [rfc3986] and [rfc2616] and their interpreting documents [webarch] and [httpbis-2] as background. The "Cool URIs for the Semantic Web" note [cooluris] is another interpretation of the same architecture.

The document does not define "meaning", "reference", or "identification" in any absolute sense, nor is there any implication that URI documentation found via UDDP 1.0 is either "authoritative" or exclusive of other sources of URI documentation. [1]

Following a review of the history of the principal controversy around URI documentation discovery, this introduction concludes with brief discussion of the two central concepts of URI documentation and retrieval. The following two sections give discovery methods for URIs with and without a hash sign, respectively. The document concludes with discussion of "opting in" to the protocol, stability of documentation over time, and a comparison of the present interpretation with the literal text of [issue-14-resolved].

1.1 Historical note

This document is part of a conversation first started around 2002 around the declarative meaning of "hashless" URIs. At the time two different conventions were proposed for the declarative use of URIs. One convention, inherited from the hypertext Web, was for a hashless URI to refer to the document-like entity ("information resource") served at that URI. This convention collided with a separate desire to use a hashless URI to refer to an entity described by that information resource. Which use would, or should, have priority was not clear at the time. After deliberation, the TAG adopted its so-called httpRange-14 resolution [issue-14-resolved], asking "the community" to use hashless URIs to refer to their information resources, not to what those information resources describe (except when the resource is self-describing). An exception allowed a hashless URI to refer according to a description in the case where no information resource was served at the URI, as signalled by a 303 HTTP response to a GET request.

A parallel question for URIs with fragment identifier arose, but was easier to settle, since in any given case there was no ambiguity: either the URI was tied to a description, or it was tied to a document fragment, the choice being dictated by the media type of the response to a retrieval request on the "stem" URI (without the fragment identifier). In particular, if a media type specifies an RDF equivalence, then the equivalent RDF graph's use of the fragment identifier bears on its meaning.

With the growth of linked data [linked-data], some resistance to the architecture has been expressed. Reports of hash URIs being unacceptable in some situations, coupled with performance difficulties arising from the 303 redirection and the impossibility of deploying 303 redirects at all on many Web hosting services, have led to the current reexamination of the architecture. Some of the criticisms of the two approaches are captured in [issue-57-report]

1.2 URI documentation

URI documentation is information whose purpose is to document the intended meaning of a particular probe URI. URI documentation may be transmitted along with other information, such as documentation for other URIs, without any particular demarcation between the documentation for that URI and the other information. A typical example might be an ontology document in which one finds integral documentation for a set of URIs. The ontology document serves as URI documentation for a number of URIs at the same time.

URI meaning is subject to normative specifications such as RFC 3986 [rfc3986] and applicable URI scheme registrations and media type registrations. The purpose of URI documentation is to provide URI-specific information that goes beyond what the normative spefications say, while retaining compatibility with them. URI documentation should not be written that is inconsistent with constraints imposed by these specifications.

URI documentation typically takes the form of a set of statements in which the probe URI occurs. The statements, by saying what is supposed to be true of the entity to which the probe URI refers, are meant to communicate the probe URI's intended meaning - what that entity is. There is always a risk that as a result the URI means nothing at all, or that it could refer to more than one thing; treating such situations is outside the scope of UDDP 1.0, which only addresses the delivery of URI documentation, not its interpretation.

1.3 Retrieval

As described in RFC 3986 [rfc3986], retrieval is an operation that starts with a URI and, when successful, yields a retrieval result (or "representation").

Retrieval may be requested using a variety of protocols and APIs. The GET request in the HTTP protocol [rfc2616] is one way to request retrieval. A 200 (OK) status in a response to a GET request indicates a successful retrieval. Other HTTP status codes, such as 304 (Not Modified), relate to retrieval behavior in ways documented by the protocol specification.

[Flush this -- people do this but it's not really entailed by the specs. But encourage a change proposal?] For purposes of this document, retrieval may entail following redirect chains (HTTP status 301, 302, and 307). That is, if retrieval is requested using a URI U1, and a GET specifying U1 yields a redirect to U2, and a retrieval request using U2 yields a result R, then R is the overall result of the retrieval request using U1.

Like 410 (Gone) and various other HTTP status codes, a response to an HTTP GET request that has status code 303 (See Other) indicates an unsuccessful retrieval.

It is customary to speak of retrieval of a current representation of the resource "identified" by the URI, but this is not informative unless we know something about that resource and what constitutes correct "representation". To avoid confusion over the meaning of these two technical terms, we speak only of "retrieval using a URI", not of "retrieval of a current representation of the resource identified by a URI".

2 Probe URI with local identifier

Editorial note 
If the purpose of this document is to provide a baseline against which httpRange-14 change proposals are to be written, why talk about fragment identifiers at all? After all they're not mentioned in [issue-14-resolved]. Answer: If this is going to serve the intended community, and especially if it is to go to Rec track, it had better be complete by some criterion, and fragment ids are an important part of the documentation discovery story, especially as they provide an efficient alternative to the use GET/303. Better to put the whole URI documentation discovery story under one roof, as opposed to requiring separate reports for hashful and hashless URIs.

The syntax 'stem#id' has come to be used not just for document fragment references but for any reference determined relative to content found at 'stem'. Therefore the present document refers to 'id' in 'stem#id' as a 'local identifier' rather than a 'fragment identifier'. The two expressions may be considered synonymous but with distinct connotations.

This purpose of this section is to reiterate what the URI specification [rfc3986] and related documents such as RDF Concepts [rdf-concepts] and the application/rdf+xml media type registration [rfc3870] already say on the subject.

The following language from [rfc3986] bears on the semantics of local identifiers:

The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced.

This text is somewhat confusing concerning the distinction between what is retrieved and what is identified, so we propose the following interpretation:

The semantics of a local identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The retrievals' formats and therefore the identity of the secondary resource are therefore dependent on the media types of potentially retrieved representations, even though such retrievals are only performed if the URI is dereferenced.

A consequence of this is that if there are multiple simultaneous representations then they need to be consistent in what they convey about a local identifier, if it is to be meaningful beyond a single representation. That is, if two retrieval results (representations) assign meanings to a given local identifier, the meanings must be consistent:

If the primary resource has multiple representations, as is often the case for resources whose representation is selected based on attributes of the retrieval request (a.k.a., content negotiation), then whatever is identified by the fragment should be consistent across all of those representations. Each representation should either define the fragment so that it corresponds to the same secondary resource, regardless of how it is represented, or should leave the fragment undefined (i.e., not found). [rfc3986]

For URI definition discovery to behave correctly in the presence of content negotiation, all retrievable representations should provide URI documentation for any given local identifier, which of course should be consistent across these representations.

The topic of retrieval result (representation) consistency is also covered in [webarch] section 3.2.

2.1 General case

When the probe URI has the form 'stem#id', and the media type of the result of a retrieval using 'stem' establishes an association between the local identifier 'id' and URI documentation carried in the retrieval result, then the retrieval result should provide URI documentation for 'stem#id' per UDDP 1.0.

Normal HTTP user-agent behavior implements this part of UDDP 1.0, as ordinary retrieval behavior of 'stem#id' involves doing a retrieval using 'stem'.

When more than one distinct retrieval result is possible, every result must carry the URI documentation, since any of the possible results might be the one that is retrieved.

The delegation of local identifier semantics to the content of the retrieval result may be made either directly in a media type registration or by a chain of normative references. For media type application/rdf+xml, this is accomplished by language in the media type registration and normative references therein. For media type application/xhtml+xml, delegation is accomplished via the XML namespace document [xhtml-ns], which leads one (via RDFa) to the algorithm for extracting an RDF graph from the XHTML markup, and so on.[2]

2.2 Document fragment reference case

Local identifiers can also get their meaning in ways other than explicit documentation in a retrieval result, such as format specifications that specify how certain local identifiers "identify" document parts (fragments). For example, the @name attribute in HTML binds the local identifier to its enclosing HTML element (assuming consistency among representations). UDDP 1.0 has nothing in particular to add in this case beyond what [rfc3986] says (see above).

3 Probe URI lacking local identifier

3.1 General case (probe URI is not retrieval-enabled)

If a retrieval request using the probe URI leads to a URI documentation link (see following) with target V, where V is another URI, then according to UDDP 1.0 the results of retrieval requests using V carry URI documentation for the probe URI.

There are two ways to use the HTTP protocol to express a URI documentation link to a given target:

  1. using the Location: response header of a 303 See Other response [httpbis-2], e.g.
    303 See Other
    Location: http://example.com/uri-documentation>
  2. using a Link: response header with link relation 'describedby' ([rfc5988], [powder]), e.g.
    200 OK
    Link: <http://example.com/uri-documentation>; rel="describedby"

Normal HTTP user-agent behavior implements the 303 part of UDDP 1.0, as retrieval of 'stem#id' is ordinarily followed by a retrieval using 'stem'.

If both headers are present, they should give the same documentation, so that clients do not feel compelled to examine two sources.

In the 303 case, the term "landing page" is sometimes applied to the redirect target document - it is "where you land" when you attempt a retrieval.

There is no type restriction on what the probe URI refers to or "identifies" in this case. It can refer to whatever the URI documentation specifies, which could be (and often is) an "information resource" (see below); a URI documentation link in itself does not say that the referent is not an "information resource". (But see below for the case when retrieval is successful.)

3.2 Information resource reference (probe URI is retrieval-enabled)

Editorial note 

This section is the controversial one: the (a) clause of [issue-14-resolved]. Controversy surrounds the following:

  1. the definitions of "identifies", "representation", and "information resource"
  2. whether the (a) clause follows from the HTTP specification [rfc2616] or has the status of a separate good practice or recommendation
  3. for any particular interpretation of these terms, whether the (a) clause is actually a good idea or not from an engineering point of view
  4. what should replace the (a) clause, if it's not a good idea

The editor is not aware of anyone who is happy with the status quo, which is what is presented here. Those desiring a change (that would be everyone) should submit a change proposal to modify or replace this section. Change proposals will be considered on an equal footing with this baseline.

The editor's best attempts so far to untangle the controversies may be found in [issue-57-report] and [generic] . Any change proposal targeting this section should address the design questions listed in section "Design space overview" of [issue-57-report].

When retrieval is enabled (i.e. can legitimately succeed) for a URI, and the URI identifies a resource that is "identified" by an http: URI, then each retrieval using a URI that "identifies" it is considered equivalent to URI documentation that says that the referent of the probe URI is an "information resource", making the retrieval result a current "representation" of that information resource.

Different retrievals can yield different results under different circumstances, but that makes this no less true. The information resource is "represented" by all such retrieval results.

The following passage in [webarch] introduces the term "information resource":

It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as “resources”. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as “information resources.”

The determination of which characteristics of any given resource are to be considered "essential," and how it is decided whether a characteristic is "conveyable," are left up to the reader.

If a (successful) retrieval response provides a URI documentation link (e.g. via a Link: header), the URI documentation retrieved in that way should be consistent with the URI "identifying" an information resource having the provided content as a current representation.

(Consult [rfc3986] to attempt an understanding of the terms "resource", "identification", and "representation". How a URI comes to "identify" a particular resource is not stated.)

4 Signalling uses of the protocol

Many protocols and formats include a specific indicator of the protocol being used. For example, every HTTP/1.1 request or response contains "HTTP/1.1" in a fixed location, and each XML document starts with an XML processing directive giving the XML version number. UDDP 1.0 has no such indicator. However, UDDP 1.0 combines elements of existing protocols in a manner that is largely compatible with current practice. Therefore no indicator is necessary.

Editorial note 
There is a failure case here when the following conditions hold: (1) the client assumes that UDDP 1.0 is being used, (2) the server does not respect UDDP 1.0 through either ignorance or choice, (3) the server uses the URI to refer to something that is not the information resource at the URI, and (4) the fact that the server uses the URI in this way could end up mattering somehow to the client. In case this combination of circumstances is considered important, a possible change proposal might therefore be to revisit the assertion "no indicator is necessary" and introduce changes to avoid error in cases in which (1)-(4) would otherwise hold.

5 Stability considerations

Consider the situation where a sender S composes a message (or document, or "representation") M containing a URI U, and sends it to a receiver R (or leaves it somewhere for R to find). S may choose to use the UDDP 1.0 protocol to learn how to use U in M, and R may choose to use the UDDP 1.0 protocol as a way of understanding the use of U in M.

However, it is possible that the protocol will deliver different URI documentation in the two instances. Because of this, R should use UDDP 1.0 only when there is a reasonable expectation that the meaning of U (as reflected in the retrieved URI documentation) has remained the same, or is only inconsequentially different, across the time interval spanning S's use of UDDP 1.0 in composing M and R's use of UDDP 1.0 in interpreting M.

6 Comparison with the TAG resolution

The above gives an interpretation of the TAG resolution [issue-14-resolved]. What it says modifies and extends [issue-14-resolved] in certain ways. This section lists some important points of comparison between the above and [issue-14-resolved].

For reference, the critical part of [issue-14-resolved] is reproduced below:

a) If an "http" resource responds to a GET request with a 2xx response, then the resource identified by that URI is an information resource;

b) If an "http" resource responds to a GET request with a 303 (See Other) response, then the resource identified by that URI could be any resource;

c) If an "http" resource responds to a GET request with a 4xx (error) response, then the nature of the resource is unknown.

'"http" resource" is used in [issue-14-resolved] but not defined there, but it seems to mean a a resource that is "identified" by an http: (or possibly https:) URI. The distinction in kind between what "is" identified and what "could be" appears to be immaterial, especially in light of (b).

The purpose of a 2xx HTTP status code is to signal successful retrieval (per [rfc3986]), but the HTTP protocol is only one way to perform a retrieval. In order to harmonize UDDP 1.0 with the architecture articulated in [rfc3986], the editor has therefore made the obvious generalization from the resolution's narrow scope of the HTTP protocol to retrieval in general.

The (b) clause does not say anything about which resource is "identified", but an informal practice has emerged whereby the See Other link is to documentation meant to establish what the probe URI means - that is, the URI is understood to "identify" according to that URI documentation. This interpretation is corroborated by [httpbis-2], section 7.3.4, which says

The Location URI indicates a resource that is descriptive of the target resource, such that the follow-on representation might be useful to recipients without implying that it adequately represents the target resource.

Editorial note 
Other issues potentially suitable for inclusion (or for change requests): (1) resource = thing i.e. everything is a resource, (2) interaction between redirects and discovery, (3) whether HR14 (a) and (b) really apply only to http: URIs or should be extended to arbitrary schemes.

7 Acknowledgments

Larry Masinter, Henry S. Thompson, and other TAG members gave valuable advice on drafts of this document. Many of the ideas grew out of work done by the TAG's AWWSW Task Group.

8 References

cooluris
Leo Sauermann and Richard Cyganiak. Cool URIs for the Semantic Web. W3C Interest Group Note, 03 December 2008. (See http://www.w3.org/TR/2008/NOTE-cooluris-20081203/.)
generic
Jonathan A. Rees, editor. Generic resources and Web metadata. Editor's draft, W3C, 2012. (See http://www.w3.org/2001/tag/awwsw/ir/20120127/.)
httpbis-2
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, Y. Lafon (editor), and J. Reschke (editor). HTTP/1.1, part 2: Message Semantics. Revision of [rfc2616]. Work in progress, version 18, IETF, 2011. (See http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-18.)
issue-14-resolved
Roy Fielding. [httpRange-14] Resolved. Email to www-tag list, 2005. (See http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html.)
issue-57-report
Jonathan A. Rees, editor. Providing and discovering definitions of URIs. W3C editor's draft, 25 June 2011. (See http://www.w3.org/2001/tag/awwsw/issue57/20120202/.)
linked-data
Tim Berners-Lee. Linked Data. Design note, June 2009. (See http://www.w3.org/DesignIssues/LinkedData.html.)
powder
Phil Archer, Kevin Smith, and Andrea Perego, editors. Protocol for Web Description Resources (POWDER): Description Resources. W3C Recommendation, 1 September 2009. (See http://www.w3.org/TR/powder-dr/#appD.)
rdf-concepts
Graham Klyne and Jeremy J. Carroll, editors. Resource Description Framework (RDF): Concepts and Abstract Syntax. W3C Recommendation, 10 February 2004. (See http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/.)
rfc2616
R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. Hypertext Transfer Protocol -- HTTP/1.1. RFC 2616, IETF, 1999. (See http://www.ietf.org/rfc/rfc2616.txt.)
rfc3870
A. Swartz. application/rdf+xml Media Type Registration. RFC 3870, IETF, 2004. (See http://www.ietf.org/rfc/rfc3870.txt.)
rfc3986
T. Berners-Lee, R. Fielding, L. Masinter. Uniform Resource Identifier (URI): Generic Syntax. RFC 3986, IETF, 2005. (See http://www.ietf.org/rfc/rfc3986.txt.)
rfc5988
M. Nottingham. Web Linking. RFC 5988, IETF, 2010. (See http://www.ietf.org/rfc/rfc5988.txt.)
webarch
Ian Jacobs and Norman Walsh, editors. Architecture of the World Wide Web, Volume One. W3C Recommendation, December 2004. (See http://www.w3.org/TR/webarch/.)
xhtml-ns
XHTML namespace document. Namespace document, occasionally revised, retrieved 31 January 2012. (See http://www.w3.org/1999/xhtml.)

9 Change log


End Notes

[1]

In the philosophy of language, meaning (i.e. semantics) and reference are distinct properties of linguistic tokens. For example, when the word "now" is used at two different times, it refers to different times in the two instances, without any change in meaning. Meaning in context determines reference. Meaning and reference coincide in the case of proper names.

According to [rfc3986], the semantics of a given URI are supposed to be uniform across contexts of use.

When a URI appears to refer to or "identify" something, especially in a declaration or statement that says that what it refers to has some type or has properties with particular values, this is a referential use of the URI. Uses of URIs in RDF are referential. This document does not take a stand as to whether uses of a URI as a hypertext link target, XML namespace indicator, HTTP request URI, or HTTP header value (as in Location:) are referential.

It is customary to speak of a URI as "identifying" a "resource". Although "identification" is related to meaning, this document makes no particular assumption regarding the relation between what a URI "identifies" and what the URI refers to. (One might hope, however, that except in rare cases a URI would refer to what it identifies.)

Depending on what is meant by "resource" it may or may not be possible to refer to and/or identify something that isn't a resource, but this question is outside the scope of this document.

[2]

Quoting the media type registration for application/rdf+xml: [rfc3870]

In RDF, the thing identified by a URI with fragment identifier does not necessarily bear any particular relationship to the thing identified by the URI alone. This differs from some readings of the URI specification, so attention is recommended when creating new RDF terms which use fragment identifiers. More details on RDF's treatment of fragment identifiers can be found in the section "Fragment Identifiers" of the RDF Concepts document.

When a URI with local identifier occurs in an RDF graph, the following passage from RDF Concepts [rdf-concepts] applies to its meaning:

"a URI reference in an RDF graph is treated with respect to the MIME type application/rdf+xml [RDF-MIME-TYPE]. Given an RDF URI reference consisting of an hashless URI and a fragment identifier, the fragment identifer identifies the same thing that it does in an application/rdf+xml representation of the resource identified by the hashless URI component."

This simply reinforces the representation consistency directive quoted just above. If there is no application/rdf+xml representation (i.e. such a retrieval result is not allowed) this makes any URI meaning coming from, say, RDFa or some XML-based MIME type registration, out of reach of RDF. To reconcile [rdf-concepts] with [rfc3986] we must assume that when a URI with local identifier is used in an RDF graph specified according to the media type, there is a potential equivalent application/rdf+xml representation, even if such a representation is never delivered in a retrieval response.

Editorial note 
Is there talk in the RDF WG of amending this passage when RDF Concepts gets revised?