This document is also available in these non-normative formats: XML.
Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document specifies a set of circumstances under which a document ("representation") is to be treated as documentation for the meaning of a given URI. The specification is meant to be useful for coordinating uses of the URI among its "URI owner(s)" and other agents. The specification is mainly targeted to RDF and linked data, but is intended to be applicable to a range of other applications as well.
There is no intention that the set of specified circumstances should be either objectively "authoritative" or exclusive of other sources of URI documentation.
This document is an editor's copy that has no official standing.
It is intended that some successor to this document will supersede the W3C Technical Architecture Group's so-called "httpRange-14 resolution" [issue-14-resolved].
Achieving consensus around [issue-14-resolved] is likely to require amending it. The main purpose of the present version of this document is to provide a baseline against which change proposals may be prepared. To that end this version is limited to recording the editor's attempt to interpret the so-called "httpRange-14 resolution" [issue-14-resolved] against the background of applicable specifications.
Comments are welcome and should be posted to the publicly archived TAG mailing list email@example.com (archive).
The TAG has not yet determined what editorial track this document will take. It might end up on Architectural Recommendation track (discussion here), it could end up as a TAG Finding or Note, or it could be transferred to a different venue. A decision will be reached at some point following the collection of change proposals.
1.1 Historical note
2.1 URI documentation
2.2 Representations and nominal representations
2.3 Representations that carry URI documentation
3 Probe URI with local identifier
3.1 Example: URI documentation via RDF graph
3.2 Example: URI documentation via markup
4 Probe URI lacking local identifier
4.1 General case
4.2 Information resource reference (probe URI is retrieval-enabled)
4.3 Discovery via redirection
5 Inconsistency risks
5.1 Transactional inconsistency
5.2 Clients and servers that use incompatible practices
5.3 Inconsistency with the URI scheme
6 Comparison with the TAG resolution
7 Disclaimer regarding the meaning of "meaning"
10 Change log
This document gives a set of conditions under which a particular document ("representation" in the sense of [rfc3986]) might be considered valid, current, and/or canonical documentation for the meaning of a particular URI. Such a representation will be called a "nominal URI documentation carrier" for the URI.
The purpose of defining which representations are to be considered nominal URI documentation carriers is to coordinate uses of the URI. If all parties in a communication scenario agree on which representations are nominal URI documentation carriers and which ones are not, that will help to promote agreement on meaning and therefore correct interoperation.
This specification can be seen as inducing a protocol, namely the set of implied methods from existing protocols (HTTP, FTP, etc.) that allow a client to obtain a nominal URI documentation carrier for a given probe URI.
The definition of "nominal URI documentation carrier for a URI" records a best effort interpretation of [rfc3986] and the so-called "httpRange-14 resolution" [issue-14-resolved], with [httpbis-2] and [webarch] as background. The "Cool URIs for the Semantic Web" note [cooluris] is another description of the same architecture.
The uses targeted here are those involving notations such as RDF [rdf-concepts], and languages layered on RDF, in which declarative URI meaning figures centrally, but other languages, notations, and modes of "meaning" are not excluded.
After a review of the history of the principal controversy around URI documentation discovery, there is a discussion of the central concepts of URI documentation and "representation". The following two sections give discovery methods for URIs with and without a hash sign, respectively. The document concludes with discussion of inconsistency risks resulting from content negotiation, change over time, and other sources, and a comparison of the present interpretation with the literal text of [issue-14-resolved].
This document is part of a conversation first started circa 2002 around the declarative meaning of "hashless" URIs. At the time two different conventions were proposed for the declarative use of URIs. One convention, inherited from the hypertext Web, was for a hashless URI to refer to the document-like entity ("information resource") served at that URI. This convention collided with a separate desire to use a hashless URI to refer to an entity described by that information resource. Which use would, or should, have priority was not clear at the time. After deliberation, the TAG adopted its so-called httpRange-14 resolution [issue-14-resolved], asking "the community" to use hashless URIs to refer to their information resources, not to what those information resources describe (except when the resource is self-describing). An exception allowed a hashless URI to refer according to a description in the case where no information resource was served at the URI, as signalled by a 303 HTTP response to a GET request.
A parallel question for URIs with fragment identifier arose, but was easier to settle, since in any given case there was no ambiguity: either the URI was tied to a description, or it was tied to a document fragment, the choice being dictated by the media type of the response to a retrieval request on the "stem" URI (without the fragment identifier). In particular, if a media type specifies an RDF equivalence, then an equivalent RDF graph's use of the fragment identifier bears on its meaning.
With the growth of linked data [linked-data], some resistance to the architecture has been expressed. Reports of hash URIs being unacceptable in some situations, coupled with performance difficulties arising from the 303 redirection and the impossibility of deploying 303 redirects at all on many Web hosting services, have led to the current reexamination of the architecture. Some of the criticisms of the two approaches, and possible alternatives to them, are captured in [issue-57-report].
The punchline — specifically the circumscription of a small number of general URI documentation discovery methods — can be stated concisely once we have established a framework.
URI documentation is information that documents the intended meaning of a particular probe URI. URI documentation may be transmitted along with other information, such as documentation for other URIs, without any particular demarcation between the documentation for that URI and the other information. A typical example might be an ontology document in which one finds integral documentation for a set of URIs. The ontology document carries URI documentation for a number of URIs at the same time.
URI documentation typically takes the form of a set of statements in which the probe URI occurs. The statements, by saying what is supposed to be true of the entity to which the probe URI refers, are meant to communicate the probe URI's intended meaning - what that entity is. There is always a risk that as a result the URI means nothing at all, or that it could refer to more than one thing. Treating such situations is outside the scope of this specification, which only addresses the discovery of URI documentation, not its interpretation.
This specification rests on Web retrieval, as defined in [rfc3986], so we will need precise terminology for talking about Web retrieval.
The word "representation" is used in two ways here, as in [rfc3986] and elsewhere, as a type and as a relationship.
As a type, "representation" is a term of art meaning an octet sequence (the "content") together with metadata, such as media type, that directs the interpretation of the content. In [rfc2616] the word "entity" is used for this. In discussion that follows "representation" on its own should always be understood this way (that is, as a type), following the usage in [webarch], and [httpbis-2].
As a relationship, "representation of" is not clearly defined in [rfc3986] and we take it to be undefined, perhaps possessing an ordinary language meaning. In a successful retrieval using a URI the server is saying that the retrieved representation is a representation of the resource identified by that URI. So we will call a representation that results (or could result, given a suitable request) from a successful, authoritative retrieval request using a URI a "nominal representation of" the resource identified by the URI, or, to reduce the amount of verbiage, a nominal representation "from" the URI.
A representation is hereby defined to carry URI documentation for a given URI if it contains the URI documentation (with or without other information), the syntax and semantics of the documentation is as determined by the media type of the representation, and the documentation occurs unqualified. It is difficult to define "unqualified" precisely for all media types, but we generally mean by this that the documentation is given "sincerely", not quoted, conditionally, or modally. That is, if D is URI documentation and a carrier says in effect that D is not or might not be true, then D, although it occurs in the carrier, is not considered to be carried by it. (E.g. documentation that occurs in an XML literal inside of an application/rdf+xml representation is not unqualified, and therefore not "carried" by it under the present definition.)
A "URI documentation carrier" for a URI is a representation that carries URI documentation that bears on the meaning of that URI. Applying the adjective "nominal" is a technicality that signifies that being a URI documentation carrier for the URI is expected according to this specification, but that it might not actually be one (for example, the representation might be empty, or it might contain information, but not information that helps to document the URI, perhaps as the result of a mistake).
Specifying an answer to the question "When is a given representation a nominal URI documentation carrier for a given URI?" is the purpose of this document.
The determination of URI semantics according to the content of a representation may be made either directly in media type documentation or by a chain of normative references starting from it. For example, for media type application/rdf+xml, this is accomplished by language in the media type registration [rfc3870] and normative references therein. For media type application/xhtml+xml, delegation is accomplished, among other ways, via XHTML's XML namespace document [xhtml-ns], which leads one (via the RDFa specification) to the algorithm for extracting an RDF graph from the XHTML markup. The RDF graph then can provide URI documentation.
It is not intended that a nominal URI documentation carrier is either objectively "authoritative" or exclusive of other sources of URI documentation.
The syntax stem#id has come to be used not just for document fragment references as originally specified, but for any reference determined relative to content found at stem. Therefore the present document refers to id in stem#id as a 'local identifier' rather than a 'fragment identifier'. The two expressions may be considered synonymous but with different connotations.
When a URI is of the form stem#id (a 'hash' URI), a nominal representation from stem is a nominal URI documentation carrier for the probe URI.
Normal user-agent behavior implements this part of the specification, as ordinary retrieval behavior for stem#id involves a retrieval using stem.
URI documentation can be provided in the form of an RDF graph. Sometimes an RDF graph can be specified by the media type, such as application/rdf+xml, application/xhtml+xml (for RDFa), or text/turtle, of a nominal URI documentation carrier. This is true for all URIs, but it is called out specifically for 'hash' URIs in the media type registration for application/rdf+xml: [rfc3870]
In RDF, the thing identified by a URI with fragment identifier does not necessarily bear any particular relationship to the thing identified by the URI alone. This differs from some readings of the URI specification, so attention is recommended when creating new RDF terms which use fragment identifiers. More details on RDF's treatment of fragment identifiers can be found in the section "Fragment Identifiers" of the RDF Concepts document. 
A local identifier can also get its meaning via format specifications that specify that certain local identifiers refer to document parts (fragments). For example, the @name attribute in HTML, or the @xml:id attribute in XML, are defined per their respective media types to provide 'anchors', and thereby to document that the local identifier refers to the enclosing element. The relevant markup then acts as URI documentation for the corresponding URI.
If the URI scheme of the probe URI is 'http' or 'https', the URI has a nominal URI documentation carrier in the following ways. The cases are not exclusive (e.g. both GET/200 and Link: may yield nominal URI documentation carriers for the URI).
If a nominal representation from the probe URI includes a URI documentation link (see following) with target V in its response to the retrieval request, then nominal representations from V are nominal URI documentation carriers for the probe URI.
There are two ways to locate a URI documentation link in an HTTP response:
303 See Other Location: http://example.com/uri-documentation>
200 OK Link: <http://example.com/uri-documentation>; rel="describedby"
Normal user-agent behavior partially implements this part of the specification, as retrieval yielding a 303 See Other response is ordinarily followed by a retrieval using the URI in the documentation link.
In the 303 case, the term "landing page" is sometimes applied to the redirect target document - it is "where you land" when you attempt a retrieval.
There is no type restriction on what the probe URI refers to or "identifies" in this case. It can refer to whatever the URI documentation specifies, which could be (and often is) an "information resource" (see below); a URI documentation link in itself does not say that the referent is not an "information resource". (But see below for the case when retrieval is successful.)
This section is the controversial one: the (a) clause of [issue-14-resolved]. Controversy surrounds the following:
The editor is not aware of anyone who is happy with the status quo, which is what is presented here. Those desiring a change (that would be everyone) should submit a change proposal to modify or replace this section. Change proposals will be considered on an equal footing with this baseline.
The editor's best attempts so far to untangle the controversies may be found in [issue-57-report] and [generic] . [issue-57-report] is intended to be useful as an overview of the design space and a source of ideas for change proposals, and it provides a basis for evaluating potential change proposals.
If there is a nominal representation Z from the probe URI, then the client may consider this state of affairs as equivalent to the existence of a nominal URI documentation carrier for the probe URI that says that Z is a current representation of the resource identified by the probe URI, and, moreover, that the identified resource is an "information resource" (see below).
There can be many nominal representations over time or under different circumstances, but that makes this no less true. It is being said in this case that all of the nominal representations are current representations of the identified resource.
The following passage in [webarch] introduces the term "information resource":
It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as “resources”. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as “information resources.” [...] Other things, such as cars and dogs (and, if you've printed this document on physical sheets of paper, the artifact that you are holding in your hand), are resources too. They are not information resources, however, because their essence is not information.
The determination of which characteristics of any given resource are to be considered "essential," and what it means for any given essential characteristic to be "conveyable in a message," is left up to the reader, but some idea of what [webarch] intends is provided by the surrounding explanation and examples.
Being a nominal representation from a URI does not in itself qualify a representation as being a nominal URI documentation carrier for that URI.
For purposes of discovery, redirect chains (HTTP status 301, 302, and 307) are often followed. That is, if retrieval is requested using a URI U1, and a retrieval using U1 yields a non-303 redirect to U2, and a retrieval request using U2 succeeds with a result R, then R is consequently considered a nominal URI documentation carrier for U1. Note that this practice creates a reliance on U2's URI owner as well as on U1's, increasing chances of failure at the application level.
What happens if there are multiple URI documentation carriers (nominal or otherwise), and they provide inconsistent information, is outside the scope of this specification. This must be recognized as a risk by all users of this specification.
Potential sources of conflict arise in the following situations:
Servers should endeavor to reduce the variability in URI documentation among these multiple sources, in order to maximize the utility of URI documentation discovery to receivers. Receivers should approach nominal URI documentation carriers with skepticism and seek independent assurance of their consistency with what their interlocutors have consulted.
Consider the situation where a sender S composes a message (or document, or "representation") M containing a URI U, and sends it to a receiver R (or leaves it somewhere for R to find). S may choose to rely on a nominal URI documentation carrier for U to decide how to use U in composing M, and R may choose to rely on a nominal URI documentation carrier for U as a way of understanding the use of U in M after receiving M.
It is possible that this specification will yield different representations as nominal URI documentation carriers in the two instances. Because of this, S and R should rely on this specification for URI meaning coordination only when there is a reasonable expectation that the meaning of U (as reflected in the retrieved URI documentation, and to the extent it is needed in context) is equivalent, or is only inconsequentially different, between the two nominal URI documentation carriers involved in the transaction. This is under the control of the the agent(s) controlling retrieval at the probe URI (or its stem), but only partially up to the sender's control (they can at least make sure they consulted a fresh version), and hardly at all under the receiver's control, so use of this specification entails trust between the sender and the publisher, and between the receiver and both other agents. If there is a discovery link then yet another agent might be involved.
A typical transaction might be:
If P writes D in between the two reads, or the second read otherwise yields a different nominal representation, then S and R's coordination attempt may fail. M acts in some ways as a partial cache of D. Note that R might read M before S does, and in effect cache it; the same problem arises in this case.
Many document and message formats include a specific indicator of the protocol being used. For example, every HTTP/1.1 request or response contains the string "HTTP/1.1" in a fixed location, and each XML document starts with an XML processing directive giving the XML version number. Such an indicator is meant to convey that the originating agent is respecting some particular specification, and urges receiving agents to either understand according to that specification, or reject as not understood. This specification combines elements of existing protocols and formats in a manner that is largely compatible with current practice, so the risk of inconsistency is low. However, there is a failure case here when the following conditions hold:
|In case this combination of circumstances is considered important, a possible change proposal might therefore be to revisit the assertion "risk of inconsistency is low" and introduce changes to the definition of "nominal URI documentation carrier" to avoid error in cases in which these conditions would otherwise hold.|
URI meaning is subject to normative specifications such as RFC 3986 [rfc3986], applicable URI scheme registrations, and media type registrations. The purpose of URI documentation is to provide URI-specific information that goes beyond what the normative specifications say, while retaining compatibility with them. URI documentation should not be written that is inconsistent with constraints imposed by these specifications. The http: scheme imposes no such constraints, but other schemes such as mailto: do.
The above gives an interpretation of the TAG resolution [issue-14-resolved]. This section lists some important points of comparison between the preceding and [issue-14-resolved].
For reference, the critical part of [issue-14-resolved] is reproduced below:
a) If an "http" resource responds to a GET request with a 2xx response, then the resource identified by that URI is an information resource;
b) If an "http" resource responds to a GET request with a 303 (See Other) response, then the resource identified by that URI could be any resource;
c) If an "http" resource responds to a GET request with a 4xx (error) response, then the nature of the resource is unknown.
'"http" resource" is used in [issue-14-resolved] but not defined there, but it seems to mean a resource that someone uses an http: (or possibly https:) URI to "identify". The distinction in kind between what is identified and what could be appears to be immaterial, especially in light of (b).
The purpose of a 2xx HTTP status code is to signal successful retrieval (per [rfc3986]), but the HTTP protocol is only one way to perform a retrieval. In order to harmonize this specification with the architecture articulated in [rfc3986], the editor has therefore made the obvious generalization from the resolution's narrow scope of the HTTP protocol to retrieval in general.
The (b) clause does not say anything about which resource is "identified", but an informal practice has emerged whereby the See Other link is to documentation meant to establish what the probe URI means - that is, the URI is understood to "identify" according to that URI documentation. This interpretation is corroborated by [httpbis-2], section 7.3.4, which says
The Location URI indicates a resource that is descriptive of the target resource, such that the follow-on representation might be useful to recipients without implying that it adequately represents the target resource.
As an obscure technicality, because nobody is authoritative for what is or isn't an "information resource", the (a) clause can only be interpreted to mean that the resource is nominally (i.e. said to be) an information resource, not that it is one.
One concern here is that too imprecise or cavalier a treatment of meaning may lead to mistakes. An opposing concern is that going into as much detail as is given here is a distraction. Some decision will have to be reached on whether and how to include the material below.
Henry Thompson gives the following advice:
This document does not define "meaning", "reference", or "identification" in any absolute sense. It only specifies a particular manner of coordination that may be used by agents that choose to use it. The word "meaning" is meant to be broad enough to encompass a wide variety of uses.
"Meaning" in general encompasses all such manners of use, with different facets surfacing in different contexts.
Larry Masinter, Henry S. Thompson, Ashok Malhotra, and other TAG members gave valuable advice on drafts of this document. Many of the ideas grew out of work done by the TAG's AWWSW Task Group.
When a URI with local identifier occurs in an RDF graph (not just the graph found via in a nominal representation), the following passage from RDF Concepts [rdf-concepts] applies to its meaning:
"a URI reference in an RDF graph is treated with respect to the MIME type application/rdf+xml. Given an RDF URI reference consisting of an hashless URI and a fragment identifier, the fragment identifer identifies the same thing that it does in an application/rdf+xml representation of the resource identified by the hashless URI component."
This simply reinforces the representation consistency directive quoted previously. If there is no application/rdf+xml nominal representation this makes any URI meaning coming from, say, RDFa or some XML-based MIME type registration, out of reach of RDF. To reconcile [rdf-concepts] with [rfc3986] we must assume that when a URI with local identifier is used in an RDF graph specified according to the media type, there is a potential equivalent application/rdf+xml representation defining all of the local identifiers, even if such a representation is never delivered in a retrieval response.
|Is there talk in the RDF WG of amending this passage when RDF Concepts gets revised?|
The following language from [rfc3986] bears on the semantics of local identifiers.
The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment's format and resolution is therefore dependent on the media type of a potentially retrieved representation, even though such a retrieval is only performed if the URI is dereferenced.
This text is somewhat confusing concerning the distinction between what is retrieved and what is identified, so we propose the following interpretation:
The semantics of a local identifier are defined by the set of nominal representations that might result from a retrieval action using the primary resource's URI. The retrievals' formats and therefore the identity of the secondary resource are therefore dependent on the media types of potentially retrieved representations, even though such retrievals are only performed if the URI is dereferenced.
A consequence of this is that if there are multiple simultaneous representations then they need to be consistent in what they convey about a local identifier, if it is to be meaningful beyond a single representation. That is, if two nominal representations assign meanings to a given local identifier, the meanings should be consistent:
If the primary resource has multiple representations, as is often the case for resources whose representation is selected based on attributes of the retrieval request (a.k.a., content negotiation), then whatever is identified by the fragment should be consistent across all of those representations. Each representation should either define the fragment so that it corresponds to the same secondary resource, regardless of how it is represented, or should leave the fragment undefined (i.e., not found). [rfc3986]
For URI definition discovery in the presence of content negotiation to behave correctly for a given local identifier, all retrievable representations should define the local identifier, consistently across these representations.
The topic of representation consistency is also covered in [webarch] section 3.2. All of the considerations for fragment identifiers also apply in the hashless URI case (303, Link:), when there are RDF graphs or other mechanisms for documenting hashless URIs involved.
In the philosophy of language, meaning (i.e. semantics) and reference are distinct properties of linguistic tokens. For example, when the word "now" is used at two different times, it refers to different times in the two instances, without any change in meaning. Meaning in context determines reference. Meaning and reference coincide in the case of proper names.
According to [rfc3986], the semantics of a given URI is supposed to be uniform across contexts of use.
When a URI appears to refer to or "identify" something, especially in a declaration or statement that says that what it refers to has some type or has properties with particular values, this is a referential use of the URI. Uses of URIs in RDF are referential. This document does not take a stand as to whether uses of a URI as a hypertext link target, XML namespace indicator, HTTP request URI, or HTTP header value (as in Location:) are referential.
It is customary to speak of a URI as "identifying" a "resource". Although "identification" is related to meaning, this document makes no particular assumption regarding the relation between what a URI "identifies" and what the URI refers to. (One might hope, however, that except in rare cases a URI would refer to what it identifies.)
Depending on what is meant by "resource" it may or may not be possible to refer to and/or identify something that isn't a resource, but this question is outside the scope of this document.