UriDefinitionDiscoveryProtocol

From W3C Wiki
Revision as of 14:56, 29 March 2012 by Dbooth (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This document is drafted as an alternative to the document posted at http://www.w3.org/2001/tag/doc/uddp-20120229/ . It is written as a simple protocol that allows URI definitions to be conveyed from URI owners to interested clients. Its scope is limited to "http:" (and "https:") URIs, because this is what the httpRange-14 resolution[issue-14-resolved] addressed, and this is where the LOD community is experiencing controversy.

Highlights of this proposal:

  • It enables a URI owner to unambiguously convey any URI definition to an interested client.
  • It also allows non-URI owners to publish conflicting or complementary definitions, and allows them to refer by URI to each other's definitions.
  • It does not constrain whether or how a client must use that or any other URI definition, as that is the client's business.
  • It retains the existing httpRange-14 rule.
  • It also permits the use of an HTTP 200 response with RDF content as a means of conveying a URI definition.
  • It provides guidelines for avoiding confusion and inconsistencies, while acknowledging the burden those guidelines place on URI owners.
  • It encourages URI owners to publish URI definitions even if those URI definitions are not perfect.

It also includes numerous other clarifications.

Others are encouraged to help improve this draft in accordance with its intent, either by editing this page directly or by sending comments to david@dbooth.org . However, if you wish to make a proposal with significantly different goals then please do so separately from this document. Thanks! -- David Booth 19:24, 1 March 2012 (UTC)

 

URI Definition Discovery Protocol for “http” and “https” URIs

 

Editor:

 

Abstract
This specification defines a protocol that enables a URI owner to provide a definition for a target URI that uses the “http” or “https” scheme, such that a client agent starting with that target URI can easily discover the URI owner's definition. The specification is meant to be useful for coordinating uses of the URI among its URI owner(s) and other agents. The specification is mainly targeted to RDF and linked data, but may also be useful in other areas.

 

Status of this Document
This document is an editor's copy that has no official standing. The document attempts to synthesize and build upon [rfc3986], the "httpRange-14 resolution" [issue-14-resolved], [httpbis-2] and [webarch] and the "Cool URIs for the Semantic Web" note [cooluris].

It is intended that some successor to this document will supersede the W3C Technical Architecture Group's "httpRange-14 resolution" [issue-14-resolved]. Editorial comments are welcome and should be posted to the publicly archived TAG mailing list www-tag@w3.org (archive).

The TAG has not yet determined what editorial track this document will take. It might end up on Architectural Recommendation track (discussion here), it could end up as a TAG Finding or Note, or it could be transferred to a different venue.

 

1 Introduction (Non-normative)

This document describes a set of conventions that enable a URI owner[webarch] to provide a definition for a target URI that uses the “http” or “https” scheme, such that a client agent starting with that target URI can easily discover that URI definition. Since multiple parties are involved and must act in coordination to achieve the desired result, these conventions are described as a protocol between the URI owner who wishes to provide the URI definition and the client who wishes to discover that URI definition. This discovery protocol builds on other existing protocols, most notably HTTP.

Under this protocol, the URI owner may either provide an explicit URI definition, by use of hash URI or 303-redirect techniques, or an implicit URI definition, by serving a representation of the resource that the URI is intended to identify. The implicit URI definition technique is applicable in cases where the resource is an information resource, whereas the explicit URI definition techniques can be used for any kind of resource (including information resources).

Although the discovery protocol described here covers only the “http” and “https” schemes, in principle it could be extended to cover others.

The uses targeted here are those involving notations such as RDF [rdf-concepts], and languages such as OWL[owl] that are layered on RDF, but other languages and notations are not excluded.

Although this specification defines a protocol for providing and discovering a URI definition, this specification is not concerned with the interpretation or “meaning” of a URI definition that is conveyed. This specification places no requirements whatsoever on the form, truth or usefulness of any statements contained in a URI definition, nor does it dictate whether or how an application must use such statements. Furthermore, this specification does not prohibit applications from using other means of obtaining URI definitions outside of this protocol. Such questions are outside the scope of this specification.

After a review of the history of the principal controversy around URI definition discovery, there is a discussion of the central concepts involved in conveying URI definitions, followed by discovery rules for target URIs either with or without a hash sign. The document concludes with discussion of inconsistency risks.

1.1 Historical note

This document is the result of a conversation first started circa 2002 around the appropriate use of "hashless" URIs that use the “http” or “https” schemes. At the time two different conventions were proposed for the declarative use of such URIs. One convention, inherited from the hypertext Web, was for a hashless “http” or “https” URI to implicitly refer to the document-like entity ("information resource") served at that URI. This convention collided with a separate desire to use such a URI to refer to an entity described by that information resource, i.e., to view the information resource as providing an explicit definition for that URI. Which use would, or should, have priority was not clear at the time. After deliberation, the TAG adopted its so-called httpRange-14 resolution [issue-14-resolved], asking "the community" to use hashless “http” URIs to refer to information resources. However, an exception signaled by a 303 HTTP response to a GET request allowed a hashless “http” URI to refer to an arbitrary resource whose definition could be provided at a separate URI.

A parallel question for URIs with fragment identifier arose, but was easier to settle, because RFC3986[rfc3986] states that the meaning of a fragment identifier is determined by the media type of the response to a potential representation from the "stem" URI (without the fragment identifier). In some cases the fragment identifies a document fragment; in other cases the representation obtained from the stem URI defines the meaning of the fragment identifier. In particular, if a media type specifies RDF/XML content, then that RDF graph provides the URI's definition.

With the growth of linked data [linked-data], some resistance to the conventions required by the httpRange-14 resolution has been expressed. Reports of hash URIs being unacceptable in some situations, coupled with performance difficulties arising from the 303 redirection and the impossibility of deploying 303 redirects at all on many Web hosting services, have led to the current reexamination of these conventions. Some of the criticisms of the two approaches, and possible alternatives to them, are captured in [issue-57-report].

2 Definitions (Normative)

This section defines terms and concepts that are used in the rest of this specification.

ISSUE 1: For historical reasons, this document was written using the term “URI” instead of “IRI” or “URI or IRI”. How should it be modified to cover IRIs? Should we just say that this document should be understood as applying equally to URIs and IRIs?

2.1 Hash URI and Hashless URI

A hash URI is a URI that contains a number sign (“#”) character; a hashless URI is a URI that does not. Another way of stating this is that a hash URI contains a fragment identifier component[RFC3986], because a fragment identifier component 'is indicated by the presence of a number sign ("#") character'[RFC3986]. Correspondingly, a hashless URI does not contain a fragment identifier component.

2.2 Target URI, hash target URI, hashless target URI

A target URI is a URI that uses the “http” or “https” scheme and whose URI definition is provided or sought. A hash target URI is a target URI that is a hash URI; a hashless target URI is a target URI that is a hashless URI.

2.3 Definition URI

A definition URI is a hashless URI from which a URI definition can be retrieved as a representation of the definition URI's associated resource. (See below for discussions of “retrieval” and “representation”.)

2.4 URI definition, explicit URI definition and implicit URI definition

A URI definition is whatever content (if any) is yielded as a result of applying the Discovery Algorithm in section 3 "Provision and discovery". Ostensibly it contains information that describes the URI owner's intended meaning of the target URI, however this specification places no constraints on the form, interpretation, truth or usefulness of such information. A URI definition may also include other information that does not help define the meaning of the target URI, such as information about the change policy for the URI definition, or definitions of other URIs, without any particular demarcation between the target URI's definition and the other information. A typical example might be an ontology document in which one finds integral documentation for a set of URIs. The ontology document provides a URI definition for a number of URIs at the same time.

A URI definition typically takes the form of a set of statements, involving the target URI, that are intended to be true of the entity to which the target URI refers. However, as previously noted, this specification makes no claim about the truth or usefulness of any such statements.

An explicit URI definition is a URI definition provided via: (a) the HTTP “Link” response header as described in section 3.2.2; (b) an rdfs:isDefinedBy assertion in RDF content as described in section 3.2.3; (c) an HTTP 303 redirect as described in section 3.2.4; or (d) a representation from the stem of a hash target URI (i.e., a target URI of the form stem#id) as described in section 3.1.

An implicit URI definition is a URI definition that is indicated by the successful retrieval of a representation from a hashless target URI as described in section 3.2.1 "Information resource".

2.5 Representation

The word "representation" is used in [rfc3986] and elsewhere, as a type. It is a term of art meaning an octet sequence (the "content") together with metadata, such as media type, that directs the interpretation of the content. In [rfc2616] the word "entity" is used for this. In discussion that follows "representation" on its own should always be understood this way (that is, as a type), following the usage in [webarch], and [httpbis-2].

Given a resource <U> identified by URI U, the relationship between a representation X and resource <U> (as in “X is a representation of <U>”) is not clearly defined in [rfc3986]. However, we take it to mean the relationship that exists between X and <U> if a successful retrieval using URI U yields (or could yield, given a suitable retrieval request) representation X. For brevity we will also speak of this relationship as "X is a representation from U", meaning that X is a representation of the resource identified by U.

2.6 Retrieval

This specification rests on Web retrieval, as defined in [rfc3986]. In general, retrieval means “making use of a URI in order to retrieve a representation of its associated resource”[rfc3986]. In the case where HTTP is used, retrieval is performed using an HTTP GET method. However, it is important to clarify which HTTP status codes signal the successful retrieval of a representation from a given URI. For an HTTP 1.1 retrieval attempt (signaled by the GET method) using URI U, the following rules apply:

HTTP Status Code
Does this status code indicate the successful retrieval of a representation of the resource identified by U?
100 Continue

101 Switching protocols
202 Accepted
203 Non-Authoritative Information
304 Not Modified
305 Use Proxy
400 Bad Request
401 Unauthorized
402 Payment Required
403 Forbidden
404 Not Found
405 Method Not Allowed
406 Not Acceptable
407 Proxy Authentication Required
408 Request Timeout
409 Conflict
410 Gone
412 Precondition Failed
414 Request-URI Too Long
416 Requested Range Not Satisfiable
417 Expectation Failed
500 Internal Server Error
501 Not Implemented
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout
505 HTTP Version Not Supported

No. Other action may be required by the client to obtain a representation from U, or a representation from U may not be available.
206 Partial Content
No. However, the client may be able to construct a representation of the resource identified by U, by appropriately assembling multiple pieces of partial content to form the intended complete representation.
200 OK

Yes.

201 Created

204 No Content
205 Reset Content
306 (Unused)
411 Length Required
413 Request Entity Too Large
415 Unsupported Media Type

No. This status code is not applicable to a retrieval request.

300 Multiple Choices

301 Moved Permanently
302 Found
307 Temporary Redirect

Yes if the response contains a “Location” header indicating new URI U2, and a successful retrieval using U2 yields a representation rep from U2, in which case rep is a representation from U. Otherwise no.

303 See Other

No. However, this status code may be used to indicate a definition URI, as described in section 3.2.4 “303 redirect”.

2.7 Information resource

The following passage in [webarch] introduces the term "information resource":

It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as “resources”. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as “information resources.” [...] Other things, such as cars and dogs (and, if you've printed this document on physical sheets of paper, the artifact that you are holding in your hand), are resources too. They are not information resources, however, because their essence is not information.

However, that definition has proven to be controversial, because it is unclear which characteristics of any given resource are to be considered "essential," what it means for their characteristics to be "conveyable in a message", and whether or not such distinctions are even important to the web's architecture.

Fortunately, since this protocol specification is only concerned with the mechanisms for conveying a URI definition, and not with the meaning or interpretation of a URI definition, the issue is irrelevant to this protocol specification. Therefore, this protocol specification does not define the term information resource.

 

ISSUE 12: Should the notion of "information resource" be eliminated entirely from this document? It is not needed. On the other hand, it does have some utility as a "marker class" with no initial semantics, as it lets clients conveniently attach additional semantics if they choose to do so. For example, some clients may choose to use the assumption that the class of "information resources" is disjoint with the class of people. (The reason this is a choice, and not baked into this specification's definition of "information resource" is that there is no clear boundary between the class of "information resources" and the class of "non-information resources".

2.8 Prima facie RDF media types

Because RDF is syntax independent, any media type could be interpreted as an RDF serialization through the use of an appropriate interpreter or deserializer. For example, GRDDL[grddl] specifies how this can be done for arbitrary kinds of XML documents, and RDFa [rdfa] specifies how RDF assertions can be embedded in HTML documents. Of course, most media types were not intended to be treated as RDF serializations, although some intended to allow RDF statements to be embedded, as a secondary goal. A prima facie RDF media type is a media type whose primary purpose is to represent RDF data. Prima facie RDF media types include the following:

  • application/rdf+xml
  • text/n3
  • text/turtle
  • [Editorial comment: Did I forget any? AFAICT TriX uses the XML media type, nTriples uses text/plain, and TriG uses application/x-trig, which I'm not sure if I should list, since it is an experimental media type. -- DBooth]

 

ISSUE 8: The prima facie RDF media types should be listed in a registry, so that the list can be clear but extensible.

 

3 Provision and discovery (Normative)

There are various ways under this protocol that a URI owner can provide a URI definition for a target URI, such that a client can discover that URI definition. The URI owner makes the choice of which of these ways to support by bearing in mind the discovery algorithm that a client will use under this protocol, and then: (a) deciding whether to mint a hash target URI or a hashless target URI; and (b) deciding what information to serve when when retrieval from the target URI (or its stem, in the case of a hash target URI) is attempted.

Discovery Algorithm (Normative)

The following algorithm specifies how a client can discover a URI definition that the URI owner has provided, following this protocol, for a target URI:
1. If the target URI is a hash URI, then section 3.1 “Discovery for a hash target URI” applies.
2. Otherwise, if the target URI is a hashless URI, then section 3.2 “Discovery for a hashless target URI” applies. This case is further subdivided into the following subcases:

2.1. If a retrieval attempt using the target URI results in the retrieval of a representation from the target URI, then section 3.2.1 “Information resource” applies.
2.2. Furthermore, if an HTTP retrieval attempt using the target URI results in the retrieval of a representation from the target URI, and the associated HTTP response contains a "Link" header, then section 3.2.2 “Link header” also applies, as described in that section.
2.3. Furthermore, if the retrieved representation can be interpreted as RDF content, then section 3.2.3 "200 response with RDF content" may also apply, as described in that section.
2.4. Otherwise, if an HTTP retrieval attempt using the target URI results in a 303 “See Other” status code, then section 3.2.4 “303 redirect” applies.

If none of these cases applies, then the URI owner has not provided a URI definition for the target URI under this discovery protocol.

Multiple URI definitions
If the URI owner provides multiple URI definitions for a target URI during the same time period, then all such URI definitions apply, i.e., the effective URI definition that is conveyed during that time period is the conjunction of all such URI definitions. This can happen, for example, through content negotiation or by providing multiple “Link” headers. This specification does not constrain the bounds of such a time period.

 

GOOD PRACTICE (Non-normative): A URI owner providing an explicit URI definition should publish and adhere to a change policy for that URI definition.

 

GOOD PRACTICE (Non-normative): A URI owner should avoid changing a URI definition.

 

GOOD PRACTICE (Non-normative): A URI owner should avoid publishing inconsistent URI definitions.

3.1 Discovery for hash target URI

The syntax stem#id has come to be used not just for document fragment references as originally specified, but for any reference determined relative to content found at stem. Therefore the term “fragment identifier” has become a bit of a misnomer. Nonetheless, for consistency with other documents this term will still be used.

If the target URI is a hash URI of the form stem#id, and X is a representation from stem, then a URI definition for the target URI is determined by the media type of the representation X. In this case, the media type registration acts as a URI definition for the target URI. The media type registration may, in turn, delegate definitional responsibility to other documents.

Note that, given a hash URI stem#id, user agents normally perform a retrieval using stem.

3.1.1 Example: URI definition via RDF graph

If the target URI is of the form stem#id, and stem has a representation X having an RDF graph media type, such as application/rdf+xml, application/xhtml+xml (for RDFa), or text/turtle, then the RDF graph expressed by representation X is a URI definition for the target URI, in accordance with the media type registration. For example, the media type registration for application/rdf+xml states: [rfc3870]

In RDF, the thing identified by a URI with fragment identifier does not necessarily bear any particular relationship to the thing identified by the URI alone. This differs from some readings of the URI specification, so attention is recommended when creating new RDF terms which use fragment identifiers. More details on RDF's treatment of fragment identifiers can be found in the section "Fragment Identifiers" of the RDF Concepts document.

ISSUE 2: (Obsolete)

 

3.1.2 Example: URI definition via markup

A media type registration can also define a hash target URI as referring to a document part (fragment). For example, the @name attribute in HTML, or the @xml:id attribute in XML, are defined per their respective media types to provide 'anchors', and thereby to document that the fragment identifier refers to the enclosing element. By delegation from the media type registration, the relevant markup then acts as a URI definition for the hash target URI.

3.2 Discovery for hashless target URI

If the target URI is a hashless URI, then URI definition discovery is performed by attempting to retrieve a representation from the target URI, as described in the following subsections.

3.2.1 Information resource

If a target URI U is a hashless URI, and retrieval using the target URI results in a representation rep from the target URI, then the URI owner has provided the following implicit URI definition for the target URI:

  • The target URI identifies an information resource; and
  • rep is a representation of that resource.

Note that this rule is protocol independent: it applies to the HTTP protocol, but also to other protocols.

 

ISSUE 12: If a new header such as "Document" is adopted, as suggested in Change Proposal 25, then this rule will need to be changed to state that "Document: U2" means that the URI definition retrieved from U2 _supersedes_ the above implicit URI definition, instead of being in addition to it.

 

ISSUE 11: The implied URI definition for the "information resource" case should also be written in RDF, in addition to the above prose version. This means coming up with a suitable namespace and predicates. See also Generic Resources and Generic Resources ontology by TimBL, and Generic Resource and Web Metadata by Jonathan Rees.

 

ISSUE 3: (Obsolete)

 

3.2.2 Link header

If the HTTP protocol is used, then the “Link” response header can be used to provide an additional explicit URI definition, as follows.

If a target URI is a hashless URI, and HTTP retrieval using the target URI results in a representation from the target URI, and the representation includes a “Link” response header[rfc5988] that specifies a “definedby” relation[@@TBD@@] (with no “rev” parameter) to new URI U2, and U2 is a hashless URI, then any representation retrieved from U2 is an explicit URI definition for the target URI, and U2 is a definition URI for the target URI. In this case, both this explicit URI definition and the implicit URI definition specified in section 3.2.1 apply, as explained for "Multiple URI definitions" near the beginning of section 3.

For example, here is a hypothetical HTTP response:

200 OK
Link: <http://example.com/uri-definition>; rel="definedby"

 

Note that the "definedby" relation does not mean the same thing as the "describedby" relation defined in Powder[powder]. In particular, under this specification the following hypothetical HTTP response:

200 OK
Link: <http://example.com/other-uri-description>; rel="describedby"

does not imply that a representation retrieved from http://example.com/other-uri-description is a URI definition for the target URI. The reason for this difference is that the URI owner may well wish to point clients to additional useful information involving the target URI, without implying that such information constitutes the URI owner's URI definition for the target URI. For example, it could be ancillary (but non-definitional) information about the identified resource, or it could be an alternate definition authored by a third party.

 

ISSUE 7: The "definedby" relation needs to be registered per RFC5988[rfc5988] as corresponding to rdfs:isDefinedBy[rdfs].

 

ISSUE 4: What if U2 is a hash URI? Then what constitutes the URI definition?

 

ISSUE 6: Should the Link header supersede the implicit URI definition conveyed by an HTTP 200 status code (which indicates that the target URI identifies an information resource)? As this specification is currently written, both the implicit URI definition conveyed by an HTTP 200 status code and the explicit URI definition conveyed via the Link header would apply, which means that the Link header should only be used when the URI owner wishes to indicate that the target URI identifies an information resource, but also wants to provide additional metadata about it.

3.2.3. 200 response with RDF content

If the HTTP protocol is used, then RDF content can also be used to provide an explicit URI definition, as follows.

If a target URI is a hashless URI, and HTTP retrieval using the target URI results in a representation from the target URI, and the representation can be interpreted as providing RDF content because:

  • it is carried in a prima facie RDF media type; or
  • it contains RDFa[rdfa] markup that can be interpreted as expressing RDF content

and that RDF content contains an assertion with subject <target URI>, property rdfs:isDefinedBy[rdfs] and object <U2>, where U2 is a hashless URI, then any representation retrieved from U2 is an explicit URI definition for the target URI, and U2 is a definition URI for the target URI. In this case, both this explicit URI definition and the implicit URI definition specified in section 3.2.1 apply, as explained for "Multiple URI definitions" near the beginning of section 3. Note that U2 is not required to be a different URI from the target URI.

For example, if the following RDF content (written here in Turtle[turtle]) is retrieved in a representation from target URI http://example/macaw , then any representation retrieved from http://example/macaw-definition is a URI definition for http://example/macaw :

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://example/macaw> rdfs:isDefinedBy <http://example/macaw-definition> .

 

Here is a second example, in which the target URI is the same URI as U2. This example indicates that any representation retrieved from http://example/toucan is a URI definition for http://example/toucan .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
<http://example/toucan> rdfs:isDefinedBy <http://example/toucan> .

 

ISSUE 9: Should RDF content include entailments? For example, if a representation says "_:macaw owl:sameAs <http://example/macaw>" and "_:macaw rdfs:isDefinedBy <http://example/macaw-definition>", then should http://example/macaw-definition be taken as a definition URI under this specification? (DBooth's opinion: Probably not, as this would open a can of worms about how to determine which entailment regimes to use, and would burden the client.) In general, "RDF content" needs to be more precisely defined.

 

ISSUE 10: Should RDF content from an HTTP 200 response be considered a URI definition by default, even if it does not contain an rdfs:isDefinedBy assertion for the target URI? Are there cases where the URI owner would want to serve an RDF document from a target URI without intending it as a URI definition of that URI? Given that RDF content may be hard to distinguish from other content, are there cases where the URI owner would want to serve _any_ document without intending that document as a URI definition of that URI? See discussion: http://lists.w3.org/Archives/Public/www-tag/2012Mar/0115.html

 

Editorial Note: See issue 4 above, as the same issue applies here if U2 is a hash URI.

3.2.4. 303 redirect

An HTTP 303 “See Other” status code can also be used to signal the provision of an explicit URI definition, as follows.

If the target URI is a hashless URI, and an HTTP retrieval attempt using the target URI results in a 303 “See Other” status code (following a potentially empty chain of redirects indicated by 300, 301, 302 and/or 307 status codes), and the associated response contains a “Location” header with a new URI U2, and U2 is a hashless URI, then any representation retrieved from U2 provides an explicit URI definition for the target URI. In such case, U2 is a definition URI for the target URI. For example, here is a hypothetical HTTP response:

303 See Other
Location: http://example.com/uri-definition>

 

Editorial Note: See issue 4 above, as the same issue applies here if U2 is a hash URI.

Note that user agents that perform a retrieval yielding a 303 See Other response ordinary perform a secondary retrieval using the URI in the ”Location” link. In this case, the term "landing page" is sometimes applied to the redirect target document: it is "where you land" when you attempt a retrieval.

Also note that this use of a 303 redirect in itself says nothing about whether the target URI identifies an information resource. The URI definition that is obtained by this mechanism may well define the target URI as identifying an information resource.

4 Avoiding and resolving inconsistencies (Non-normative)

This protocol does not prevent a URI owner from providing multiple URI definitions. There are many ways this can happen, such as:

  • providing different URI definitions during different time periods, as new "improved" versions are released;
  • using HTTP content negotiation to serve different URI definitions in response to different requests, perhaps using different media types; and
  • providing both implicit and explicit URI definitions.

In some cases the provision of multiple URI definitions for a target URI is helpful to clients, because a client may be able to select a URI definition that is most convenient to process, or that best reflects the URI owner's latest understanding of the resource that the URI is intended to identify. In other cases it may cause confusion and places a burden on clients, particularly if those URI definitions are inconsistent, because it forces the client to figure out which URI definition(s) to use and which to ignore, and this may require the client to guess the URI owner's intent. Different clients, perhaps having different capabilities or objectives, may make different guesses, thus leading them to different conclusions about the resource that the target URI was intended to identify. If those clients subsequently published statements involving that URI, those statements may be inadvertently talking about different resources even though they use the same URI, thus causing URI collision[awww].

The Good Practice notes in this specification are designed to help avoid this risk and encourage URI owners to avoid placing this burden on clients. But there is a trade-off: they place a concomitant burden on URI owners instead. For example:

  • the URI owner may not yet know exactly how he/she wishes to define a particular URI permanently, and thus may wish to publish an initial, approximate URI definition that is later modified as more is learned;
  • it may be difficult or impossible for the URI owner to configure his/her server to give HTTP 303 response codes, or to include "Link" headers in an HTTP response; and
  • it may be administratively burdensome to maintain pairs of URIs -- one URI identifying a particular resource, and a separate URI for serving a URI definition of that resource -- rather than single URIs.

If this burden on URI owners is not mitigated, in some cases it could inhibit them from publishing potentially useful (though imperfect) URI definitions, which would be a loss to the community. Therefore the following Good Practice is designed to best benefit the community by balancing this trade-off.

 

GOOD PRACTICE (Non-normative): If a URI owner must choose between publishing URI definitions and following the Good Practice notes of this specification, it is normally better to publish. "The perfect is the enemy of the good."

 


4.1 Transactional inconsistency

Consider the situation where a URI owner publishes a URI definition for a target URI, and later revises and publishes a different URI definition for that URI. Suppose further that the target URI is later used in a statement made by another party (the "statement author"), and a client subsequently reading that statement wishes to understand what the statement author meant. Unfortunately, it may be difficult or impossible for the client to determine which URI definition the statement author assumed when writing the statement.

 

GOOD PRACTICE (Non-normative): Before using a target URI in a statement, a statement author should obtain fresh versions of the transitive closure of a target URI's URI definition and the definitions of any URIs used in that URI definition, and should only use the target URI in a manner that is consistent with those URI definitions.

 

To mitigate the effect of URI definitions that change over time, a consumer encountering a target URI in a statement and wishing to understand the author's intent may attempt to determine the URI definition (and its transitive closure) that existed at the time the statement was written. Obviously, this may or may not be successful.

4.2 Resolving inconsistencies

If a client following this protocol discovers multiple URI definitions and wishes to make a best guess attempt to determine which URI definition most reliably reflects the URI owner's intent, the client is encouraged to use the following criteria in judging reliability:

  • An explicit URI definition should take precedence over an implicit URI definition.
  • A URI definition that is expressed in a prima facie RDF media type should take precedence over a URI definition that is expressed in another media type, other factors being equal.
  • A URI definition that is provided via a "definedby" "Link" header or an HTTP 303 response code should take precedence over a URI definition that is provided via any other mechanism.
  • A later URI definition should take precedence over an earlier URI definition, unless the client is seeking a URI definition that was in effect at an earlier time.

4.3 Incompatible URI definition discovery protocols

Common practice in protocol design is to include an explicit indicator of the protocol being used, when data is transmitted. For example, every HTTP/1.1 request or response contains the string "HTTP/1.1" in a fixed location, and each XML document starts with an XML processing directive giving the XML version number. Such an indicator is meant to indicate that the originating agent is respecting some particular specification, and urges receiving agents to either understand according to that specification, or reject as not understood.

This specification does not provide a protocol indicator. This means that there is no reliable, automatable way that a client can determine whether a URI owner has followed this protocol. If different parties use different protocols, this can result in misunderstandings about what URI definition the URI owner intended to convey. For example, if a URI owner followed protocol A, but a client assumed that protocol B had been followed, and those protocols yield different URI definitions, then the client may erroneously obtain the wrong URI definition without realizing the error. See http://lists.w3.org/Archives/Public/www-tag/2012Mar/0001.html for a more detailed example. In essence, this is the problem that motivated the httpRange-14 debate and this protocol specification.

However, if this specification is widely accepted and followed, then the risk of this error will be low.

ISSUE 5: Should this protocol use a protocol indicator? In cases where an explicit URI definition is provided a protocol indicator could be included as part of the URI definition. For example, an RDF statement such as '<> uddp:definesUri "http://example/foo" .' could indicate that the current document provides a URI definition for http://example/foo . For implicit URI definitions -- for URI's that identify information resources -- a web site could use the "/.well-known/" conventions described in RFC5785, perhaps in conjunction with Powder[powder] to indicate that URIs hosted on that site (or a portion of that site) conform to this protocol.

 

5 Acknowledgments

Larry Masinter, Henry S. Thompson, Ashok Malhotra, and other TAG members gave valuable advice on drafts of this document. Many of the ideas grew out of work done by the TAG's AWWSW Task Group.

6 References

   cooluris
          Leo Sauermann and Richard Cyganiak. [95]Cool URIs for the
          Semantic Web. W3C Interest Group Note, 03 December 2008. (See
          http://www.w3.org/TR/2008/NOTE-cooluris-20081203/.)

     [95] http://www.w3.org/TR/2008/NOTE-cooluris-20081203/

   generic
          Jonathan A. Rees, editor. [96]Generic resources and Web
          metadata. Editor's draft, W3C, 2012. (See
          http://www.w3.org/2001/tag/awwsw/ir/20120127/.)

     [96] http://www.w3.org/2001/tag/awwsw/ir/20120127/

   grddl
          @@ TODO: add complete GRDDL reference @@
          http://www.w3.org/TR/grddl/

   httpbis-1
          R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P.
          Leach, T. Berners-Lee, Y. Lafon (editor), and J. Reschke
          (editor). [97]HTTP/1.1, part 1: URIs, Connections, and
          Message Parsing. Revision of [98][rfc2616]. Work in progress,
          version 18, IETF, 2012. (See
          http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-18
          .)

     [97] http://tools.ietf.org/html/draft-ietf-httpbis-p1-messaging-18

   httpbis-2
          R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P.
          Leach, T. Berners-Lee, Y. Lafon (editor), and J. Reschke
          (editor). [99]HTTP/1.1, part 2: Message Semantics. Revision
          of [100][rfc2616]. Work in progress, version 18, IETF, 2012.
          (See
          http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-18
          .)

     [99] http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-18

   issue-14-resolved
          Roy Fielding. [101][14] Resolved. Email to www-tag
          list, 2005. (See
          http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
          .)

    [101] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
     [14] http://www.w3.org/Consortium/Legal/ipr-notice#Legal_Disclaimer

   issue-57-report
          Jonathan A. Rees, editor. [102]Providing and discovering
          definitions of URIs. W3C editor's draft, 25 June 2011. (See
          http://www.w3.org/2001/tag/awwsw/issue57/20120202/.)

    [102] http://www.w3.org/2001/tag/awwsw/issue57/20120202/

   linked-data
          Tim Berners-Lee. [103]Linked Data. Design note, June 2009.
          (See http://www.w3.org/DesignIssues/LinkedData.html.)

    [103] http://www.w3.org/DesignIssues/LinkedData.html

   owl
          @@ TODO: Add OWL reference @@

   powder
          Phil Archer, Kevin Smith, and Andrea Perego, editors.
          [104]Protocol for Web Description Resources (POWDER):
          Description Resources. W3C Recommendation, 1 September 2009.
          (See http://www.w3.org/TR/powder-dr/#appD.)

    [104] http://www.w3.org/TR/powder-dr/#appD

   rdf-concepts
          Graham Klyne and Jeremy J. Carroll, editors. [105]Resource
          Description Framework (RDF): Concepts and Abstract Syntax.
          W3C Recommendation, 10 February 2004. (See
          http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/.)

    [105] http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/

   rdfs
          @@ TODO: Add RDF Schema reference @@
          http://www.w3.org/TR/rdf-schema/

   rfc2616
          R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P.
          Leach, and T. Berners-Lee. [106]Hypertext Transfer Protocol
          -- HTTP/1.1. RFC 2616, IETF, 1999. (See
          http://www.ietf.org/rfc/rfc2616.txt.)

    [106] http://www.ietf.org/rfc/rfc2616.txt

   rfc3870
          A. Swartz. [107]application/rdf+xml Media Type Registration.
          RFC 3870, IETF, 2004. (See
          http://www.ietf.org/rfc/rfc3870.txt.)

    [107] http://www.ietf.org/rfc/rfc3870.txt

   rfc3986
          T. Berners-Lee, R. Fielding, L. Masinter. [108]Uniform
          Resource Identifier (URI): Generic Syntax. RFC 3986, IETF,
          2005. (See http://www.ietf.org/rfc/rfc3986.txt.)

    [108] http://www.ietf.org/rfc/rfc3986.txt

   rfc5988
          M. Nottingham. [109]Web Linking. RFC 5988, IETF, 2010. (See
          http://www.ietf.org/rfc/rfc5988.txt.)

    [109] http://www.ietf.org/rfc/rfc5988.txt

   webarch
          Ian Jacobs and Norman Walsh, editors. [110]Architecture of
          the World Wide Web, Volume One. W3C Recommendation, December
          2004. (See http://www.w3.org/TR/webarch/.)

    [110] http://www.w3.org/TR/webarch/

   xhtml-ns
          [111]XHTML namespace document. Namespace document,
          occasionally revised, retrieved 31 January 2012. This
          document is revised from time to time as new specifications
          bearing on the XHTML namespace are published. (See
          http://www.w3.org/1999/xhtml.)

    [111] http://www.w3.org/1999/xhtml