W3C

URNs, Namespaces and Registries

[Editor's Draft] TAG Finding CVS $Id: URNsAndRegistries-50-20060403.html,v 1.1 2006/04/04 08:19:44 vquint Exp $

This version:
http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-20060403
Latest version:
http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
Editors:
Henry S. Thompson, University of Edinburgh <ht@inf.ed.ac.uk>
David Orchard, BEA Systems <dorchard@bea.com>

This document is also available in these non-normative formats: XML.


Abstract

This finding attempts to address the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" and "Should registries be provided for such identifiers?". The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.

Status of this Document

Editorial note: HST2006-03-14
Further to a request from Roy Fielding, I had a brief look at XCAP, seems to be using http: URIs now, although it introduces a new Application UID registry, and uses ietf: URNs for its namespaces. . . If anyone (including Roy) remembers what Roy was particularly concerned at here, please let me know.

This document has been produced by the W3C Technical Architecture Group (TAG). This finding addresses TAG issue URNsAndRegistries-50.

This is the second draft of this finding. This finding is an editorial draft, not yet accepted by the TAG.

Additional TAG findings, both accepted and in draft state, may also be available. The TAG expects to incorporate this and other findings into [what?] that will be published according to the process of the W3C Recommendation Track.

Editorial note: HST2005-03-29
Are we ready to tell the world what will follow AWWW?

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Table of Contents

1 Introduction
2 Examining the need for new approaches to naming information resources
    2.1 Persistence
    2.2 Standardized
    2.3 Protocol Independence
    2.4 Location Independence
    2.5 Structured names
    2.6 Uniform access to metadata
    2.7 Rich authority
    2.8 Trusted resolution
3 The value of http: URIs
4 Case study: Naming namespaces
5 Detailed Illustration

Appendix

A References


1 Introduction

In [AWWW] we find the following recommendations:

Avoiding URI aliases

"A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource."

Reuse URI schemes

"A specification SHOULD reuse an existing URI scheme (rather than create a new one) when it provides the desired properties of identifiers and their relation to resources."

URI opacity

"Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource."

Available representation

"A URI owner SHOULD provide representations of the resource it identifies."

Recently, however, a number of proposals have emerged to create new identification mechanisms for the Web. They propose new URN (sub-)namespaces or URI schemes and provide registries for instances thereof, in order to allow them to be used to identify and retrieve information resources. This would appear to be incompatible with [AWWW]'s simple positive recommendations. In this finding we enumerate the arguments given in favor of these new proposals, which often turn out to be arguments against using http: URIs, and explain why they are mistaken and how the above principles can be understood to point the way constructively to alternative designs which do in fact make use of http: URIs.

2 Examining the need for new approaches to naming information resources

This section is structured in terms of goals or requirements for resource identification mechanisms which have been offered as justifications for adopting a new approach. They are drawn from a number of recent proposals ([RFC 3688], [oasis URN], [XRI]) abstracting, merging and summarizing them. [Definition: Throughout these summaries we will refer to instances of the proposed new identifier mechanism as NRIs.] In each case we state the requirements and examine the extent to which the existing http:-based identifier mechanism addresses them.

2.1 Persistence

NRI Goal

The relation between NRIs and the information resource they identify should persist indefinitely.

Or, more realistically, that individual NRIs should manifest syntactically whether or not they are intended to persist indefinitely.

This goal is difficult to get to grips with, as it appears to mean different things in different contexts:

  1. At its simplest, this is just a wish for an end to 404 Not Found, i.e. that you should always be able to resolve an NRI.

  2. In the Information Science community, 'persistence' is a stronger requirement, namely, that what you get when you resolve an NRI should never change.

http: fact

http: URIs support persistence as well as it is in-practice possible to do so.

As has been frequently observed, achieving either of the numbered types of persistence above is not a technology issue, it's a management issue. It's up to the owners and operators of the mechanisms which implement NRI resolution to enforce whatever degree of persistence they choose. It follows that there is no difference here between NRI and http:.

What of the more sophisticated reading, that an NRI should manifest its minter's intentions with respect to persistence? That's just a matter of naming conventions, and perfectly possible using http:. We could, for example, say that all versionable/time-varying resources on our site are named with all lower-case letters, and all persistent/stable/non-varying resources are named with all upper-case letters.

2.2 Standardized

NRI Goal

NRIs should be susceptible to standardization within administrative units

This goal appears to be directed at guaranteeing certain invariants, for example with respect to the structure of identifiers and the availability of the resources they identify. This means they should not be creatable in a distributed or unsupervised fashion.

http: fact

Again, this is largely a management issue, not a technical one. Whatever invariants are in view can as well be enforced on (sub-parts of) http:-served resource collections as on those identified via NRIs.

Nothing in a specification can stop people from uttering URIs of any kind. Domain names are as good, or as bad, at conveying ownership of a particular form of URI as URN namespaces or URI schemes.

Centralized authorities can be established for parts of domain space as easily as for areas "off the web", and enforcement mechanisms can be as effective. For example, my employers constrain the mechanisms by which web pages are accepted for serving from certain parts of their domain so as to enforce invariants both of path structure and content markup.

2.3 Protocol Independence

NRI Goal

Access to resources identified by NRIs should not be dependent on any particular protocol.

Exactly what this means is not clear -- although it is listed as a requirement in several cases, there is little or no discussion, so exactly why it should be a requirement for NRIs is not clear.

http: fact

http: URIs are no more protocol-dependent than any other identification mechanism.

For pure naming, that is, if retrieval is never intended, http: is as good as any NRI approach, because no protocol at all is involved. If retrieval is anticipated, then any NRI approach must specify a mapping to one or more protocols. All existing NRI approaches in practice specify only one such mapping, to the HTTP protocol. So they are in exactly the same position as http: -- if for some reason in the future the HTTP protocol becomes unavailable or inappropriate, both NRIs and http: will have to specify a new mapping.

True protocol independence is difficult to imagine in practice, as many protocols depend on a tight coupling between message formats and client/server application models. Protocols which don't allow servers any escape mechanism are thereby pretty much ruled out as transports for retrieval from NRIs (or http: URIs).

It's appropriate to note here that in cases where the necessary form of client/server interaction for a particular kind of information resource, for example streaming video, cannot be provided by the protocols normally associated with existing URI schemes, new schemes may be appropriate. Detailed discussion of this point can be found in [Schemes and Protocols]. But none of the NRI proposals are for resources of this kind.

2.4 Location Independence

NRI Goal

NRIs should not be locations.

Practical realities and administrative changes will always defeat any attempt to guarantee that the representation of a particular resource will always be stored in exactly the same host/server/filestore/directory/file. Any naming mechanism which equates locations in that sense with names is by construction inadequate. It follows that this goal is a sensible one.

http: fact

http: URIs are not locations.

Misunderstanding of http: URIs as locations has a long and, in part, justifiable history (they were, after all, originally call Uniform Resource Locators). But it's not longer justifiable either in principle (the RFC for URIs [RFC 3986] is quite clear on the subject) or in practice (there's lots of software support for server-side management of the relationship between http: URIs and their representations). See for example the classic [Cool URIs] for a more detailed discussion of these points.

2.5 Structured names

NRI Goal

NRIs should provide for structuring resource identifiers with shareable tags

This requirement has only been suggested by the authors of [XRI]. It amounts to a wish to structure resource names using name/value pairs, with the names having some standardized, widely understood meaning. This requirement is related to requirements appealed to in the design of End Point References [EPRs], [TAG on EPRs].

http: fact

The query component of http: URIs supports non-hierarchical structured naming.

It is open to any naming authority to establish conventions for the use of the query component of http: URIs under its control. Since the query component is already structured in terms of simple name/value pairs, it is a good fit for the requirement.

2.6 Uniform access to metadata

NRI Goal

NRIs should provide as well for access to metadata about as to representations of a resource.

http: fact

[conneg does this for http: just fine]

2.7 Rich authority

NRI Goal

NRIs should allow for sophisticated authority models, including delegation.

http: fact

[not sure]

Editorial note: HST2006-03-14
Not clear what this means -- the XRI examples suggest it's mostly about late-binding/encapsulation, e.g. xri://shoreline.library.example.com/(urn:isbn:0-395-36341-1) and xri://broadview.library.example.com/(urn:isbn:0-395-36341-1) are given as examples of 'copies of the same book at two different libraries'

2.8 Trusted resolution

NRI Goal

NRIs should support trusted resolution.

http: fact

[not sure]

Editorial note: HST2006-03-14
This appears only in XRIs, and appears to me to be self-contradictory -- it says a "trusted resolution protocol [is] independent of DNS".

3 The value of http: URIs

The http: URI scheme implements a two-part approach to identifying resources. It combines a universal distributed naming scheme for owners of resources with a hierarchical syntax for distinguishing resources which share the same owner. Widely available mechanisms (DNS and web servers, respectively) exist to support the use of http: URIs to not only identify but actually retrieve representations of information resources.

Any requirement for naming resources, particularly if not only naming but also retrieval of representations is in prospect, which admits to a similar decomposition, that is, into a universal owner name and a hierarchical owner-relative name, can almost certainly be satisfied by the http: URI scheme. http: provides substantial benefits, in terms of installed software base, user comprehension, scalability and, if required, security, at very low cost.

Anyone developing an alternative approach, that is, some form of NRI, should consider carefully whether that approach is either isomorphic to http:, or makes covert appeal to http: for its implementation. In either case, this strongly suggests that the fundamental requirements of the new approach do in fact admit to the two-part description given above, and therefore that http: itself would be a viable, and therefore a preferred, way forward.

Editorial note: HST2006-03-21
This text from DO is currently homeless: A main advantage of http URIs is the use of DNS to allow decentralized creation of vocabularies, but this does bear the cost that humans can be confused by the mixing of location and identifiers. Another possibility is to create a scheme that does not have any protocol associated with it, which I was thinking of at one point. The reason that this does not work and I did not proceed is that it does not address the issue of humans needing to understand context and it does not allow the flexibility of providing a namespace document.

4 Case study: Naming namespaces

In this section we look in some detail into some of the background assumptions for the utility of NRIs for one particular purpose, namely for naming namespaces.

A common reason given for needing NRIs for namespace names, is that an http: identifier appears to humans as a location and hence dereferencable. Another common reason is to come up with an identifier that is location-independent or that is "movable" from one location to another.

The first argument, that http: URIs are "locations", is based upon incomplete understanding of the use of URIs. Any datatype, in this case URIs, exists in a context. The context will define the use of a URI, and includes social and technical context. A URI on the side of a van will convey the social meaning that it can be typed into a browser and used. Other contexts for the use of URIs include namespace names, references to documents, and identifiers for things. There is never the case that a URI is simply "found" without a context.

The case of using NRIs for namespace names is enlightening. Imagine two scenarios, one using an NRI as a namespace name and another using an http: URI. The namespace specification defines a context, which roughly speaking says that a namespace name should not be considered dereferenceable. Any software component that is written assuming that any namespace name must be dereferencable is violating the namespace specification. It may be that the namespace owner has guaranteed that they will provide a document at the namespace name, but this must be on a subset of the entire set of namespace names. Clearly generic XML software should not be written to assume dereferencability of namespace names.

It is natural for a human reading an XML document with a namespace name that they do not know to want to understand more about the namespace. This is why [AWWW] recommends providing a document at a namespace name that provides both human and machine readable information. The use of http: namespace names enables 3 separate scenarios:

  1. an identifier can be created in a decentralized manner;

  2. an identifier may be dereferenced by a person via a browser to aid understanding;

  3. an identifier may be dereferenced by a computer and exploited for automatic processing by reason of its identifying schemas, WSDLs, policies, etc.

These are two distinct interaction patterns, without and with human involvement. The software-only interaction pattern is clearly erroneous if it assumes that a namespace name is dereferenceable, and it is unlikely that XML software written today requires this assumption be valid, but much such software definitely exists which exploits dereferencability when it is present.

Contrasting with this is the approach of using an NRI. An NRI provides an identifier, though in some cases these are not decentralized. A human looking at an xml document with an NRI namespace name will not be confused about whether it is dereferencable or not.

In the http: identifier scenario, the "location" to be used for knowledge is embedded in the identifier and available in a decentralized manner via DNS. In the NRI identifier scenario, the "location" to be used for knowledge is hardcoded somewhere in the application or in some property of the NRI such as a URI scheme or URN (sub)scheme. It is substantially easier for software to use a single identifier and existing DNS/HTTP infrastructure, than to use an intermediary identifier and quite probably the existing DNS/HTTP infrastructure.

Imagine that I create a URI scheme called nri that uses the exact same syntax as the http scheme and specifically does not define a protocol. I can create the nri://example.org/ns/foo URI and start to use it as a namespace name. There is no confusion about the name being dereferencable. But what value is there? If one of these URIs shows up somewhere in a document, how will the human find out about the meaning? They must either try to examine the context surrounding the URI datatype - in which case there is no benefit to nri: versus http: as the work is the same - or they try to examine the namespace name - but it's not deferenceable so they can't do that. The amount of work is either the same or more using an nri scheme.

If the scheme definition for nri says that it is dereferencable, and specifies a mechanism, then either that mechanism is HTTP, or it will have to provide all the functionality, and thus be heir to all the weaknesses, of HTTP. In either case no benefit has been gained over just using the http scheme itself.

Namespace names are just one example of a context of use. Any use of the URI datatype in an XML document has the same issues. A provider of a URI must specify how the URI will be used in each specific sub-context of their XML language, whether it is intended as an identifier, a location, or both. Using an NRI instead of an http: URI does not make the software or human's job any easier.

It's perhaps worth noting that all the NRI proposals include means to transform an NRI into a dereferencable address via lookup using some form of registry server. This in turn requires the use of a deferencable address for the server, or else all software intended for use with NRIs must have the registry server location "hard-coded". As far as we can tell all the NRI proposals expect the results of server lookup to be an http: URI, and also appear to use an http: URL to identify the location of the registry server.

5 Detailed Illustration

In this section we compare the sequence of messages between client, registry(s) and server(s) involved in both simple and complex cases of retrieval of resources named with NRIs and http: URIs, in order to elaborate on the assertions made above about their functional equivalence.

. . .[to be filled in?]. . .

A References

Cool URIs
Berners-Lee, Tim Cool URIs don't change, W3C, 1998. Available online as http://www.w3.org/Provider/Style/URI.
RFC 3986
Berners-Lee, T., Fielding, R. and L. Masinter Uniform Resource Identifier (URI): Generic Syntax, IETF, 2005. Available online as http://www.faqs.org/rfcs/rfc3986.html
oasis URN
Best, K and N. Walsh A URN Namespace for OASIS, IETF, 2001. Available online as RFC 3121
EPRs
Gudgin, M., Hadley, M. and T. Rogers, eds Web Services Addressing 1.0 - Core ("Endpoint References" section), W3C, 2006. Available online as http://www.w3.org/TR/ws-addr-core/#eprs
AWWW
Jacobs, Ian and Norman Walsh, eds. Architecture of the World Wide Web, Volume 1, W3C, 2004. Available online as http://www.w3.org/TR/webarch/
RFC 3688
Mealling, M. ed. The IETF XML Registry, IETF, 2004. Available online at http://ietfreport.isoc.org/idref/rfc3688/
XRI
Reed, Drummond and Dave McAlpin eds. An Introduction to XRIs, OASIS, 2005. Available online as http://www.oasis-open.org/apps/group_public/download.php/11857/xri-intro-V2.0-wd-04.pdf
TAG on EPRs
TAG Request for Change to WS Addressing Core, TAG message to WS Addressing Working Group, 2005. Available online as http://lists.w3.org/Archives/Public/www-tag/2005Oct/0057.html
Schemes and Protocols
"Relationship of URI schemes to protocols and operations", TAG issue schemeProtocols-49, available online as http://www.w3.org/2001/tag/issues.html#schemeProtocols-49