W3C

URNs, Namespaces and Registries

[Editor's Draft] TAG Finding CVS $Id: URNsAndRegistries-50-20060404.html,v 1.2 2006/04/04 08:32:48 vquint Exp $

This version:
http://www.w3.org/2001/tag/doc/URNsAndRegistries-50-20060404
Latest version:
http://www.w3.org/2001/tag/doc/URNsAndRegistries-50
Editors:
Henry S. Thompson, University of Edinburgh <ht@inf.ed.ac.uk>
David Orchard, BEA Systems <dorchard@bea.com>

This document is also available in these non-normative formats: XML.


Abstract

This finding attempts to address the questions "When should URNs or URIs with novel URI schemes be used to name information resources for the Web?" and "Should registries be provided for such identifiers?". The answers given are "Rarely if ever" and "Probably not". Common arguments in favor of such novel naming schemas are examined, and their properties compared with those of the existing http: URI scheme.

Status of this Document

Editorial note: HST 2006-03-14
<edtext>Further to a request from Roy Fielding, I had a brief look at XCAP, seems to be using http: URIs now, although it introduces a new Application UID registry, and uses ietf: URNs for its namespaces. . . If anyone (including Roy) remembers what Roy was particularly concerned at here, please let me know.</edtext>

This document has been produced by the W3C Technical Architecture Group (TAG). This finding addresses TAG issue URNsAndRegistries-50.

This is the second draft of this finding. This finding is an editorial draft, not yet accepted by the TAG.

Additional TAG findings, both accepted and in draft state, may also be available. The TAG expects to incorporate this and other findings into [what?] that will be published according to the process of the W3C Recommendation Track.

Editorial note: HST 2005- 03-29
<edtext>Are we ready to tell the world what will follow AWWW?</edtext>

Please send comments on this finding to the publicly archived TAG mailing list www-tag@w3.org (archive).

Table of Contents

1 Introduction
2 Examining the need for new approaches to naming information resources
    2.1 Persistence
    2.2 Standardized
    2.3 Protocol Independence
    2.4 Location Independence
    2.5 Structured names
    2.6 Uniform access to metadata
    2.7 Rich authority
    2.8 Trusted resolution
3 The value of http: URIs
4 Case study: Naming namespaces
    4.1 Context
    4.2 Identification
    4.3 Dereferencability
    4.4 Location Independence
    4.5 Summary

Appendix

A References


1 Introduction

In [AWWW] we find the following recommendations:

Avoiding URI aliases

"A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource."

Reuse URI schemes

"A specification SHOULD reuse an existing URI scheme (rather than create a new one) when it provides the desired properties of identifiers and their relation to resources."

URI opacity

"Agents making use of URIs SHOULD NOT attempt to infer properties of the referenced resource."

Available representation

"A URI owner SHOULD provide representations of the resource it identifies."

Recently, however, a number of proposals have emerged to create new identification mechanisms for the Web. They propose new URN (sub-)namespaces or URI schemes and provide registries for instances thereof, in order to allow them to be used to identify and retrieve information resources. This would appear to be incompatible with [AWWW]'s simple positive recommendations. In this finding we enumerate the arguments given in favor of these new proposals, which often turn out to be arguments against using http: URIs, and explain why they are mistaken and how the above principles can be understood to point the way constructively to alternative designs which do in fact make use of http: URIs.

2 Examining the need for new approaches to naming information resources

This section is structured in terms of goals or requirements for resource identification mechanisms which have been offered as justifications for adopting a new approach. They are drawn from a number of recent proposals ([RFC 3688], [oasis URN], [XRI]) abstracting, merging and summarizing them. [Definition: Throughout these summaries we will refer to instances of the proposed new identifier mechanism as NRIs.] In each case we state the requirements and examine the extent to which the existing http:-based identifier mechanism addresses them.

2.1 Persistence

nri_des

The relation between NRIs and the information resource they identify should persist indefinitely.

Or, more realistically, that individual NRIs should manifest syntactically whether or not they are intended to persist indefinitely.

This goal is difficult to get to grips with, as it appears to mean different things in different contexts:

  1. At its simplest, this is just a wish for an end to 404 Not Found, i.e. that you should always be able to resolve an NRI.

  2. In the Information Science community, 'persistence' is a stronger requirement, namely, that what you get when you resolve an NRI should never change.

http_fact

http: URIs support persistence as well as it is in-practice possible to do so.

As has been frequently observed, achieving either of the numbered types of persistence above is not a technology issue, it's a management issue. It's up to the owners and operators of the mechanisms which implement NRI resolution to enforce whatever degree of persistence they choose. It follows that there is no difference here between NRI and http:.

What of the more sophisticated reading, that an NRI should manifest its minter's intentions with respect to persistence? That's just a matter of naming conventions, and perfectly possible using http:. We could, for example, say that all versionable/time-varying resources on our site are named with all lower-case letters, and all persistent/stable/non-varying resources are named with all upper-case letters.

2.2 Standardized

nri_des

NRIs should be susceptible to standardization within administrative units

This goal appears to be directed at guaranteeing certain invariants, for example with respect to the structure of identifiers and the availability of the resources they identify. This means they should not be creatable in a distributed or unsupervised fashion.

http_fact

Again, this is largely a management issue, not a technical one. Whatever invariants are in view can as well be enforced on (sub-parts of) http:-served resource collections as on those identified via NRIs.

Nothing in a specification can stop people from uttering URIs of any kind. Domain names are as good, or as bad, at conveying ownership of a particular form of URI as URN namespaces or URI schemes.

Centralized authorities can be established for parts of domain space as easily as for areas "off the web", and enforcement mechanisms can be as effective. For example, my employers constrain the mechanisms by which web pages are accepted for serving from certain parts of their domain so as to enforce invariants both of path structure and content markup.

2.3 Protocol Independence

nri_des

Access to resources identified by NRIs should not be dependent on any particular protocol.

Exactly what this means is not clear -- although it is listed as a requirement in several cases, there is little or no discussion, so exactly why it should be a requirement for NRIs is not clear.

http_fact

http: URIs are no more protocol-dependent than any other identification mechanism.

For pure naming, that is, if retrieval is never intended, http: is as good as any NRI approach, because no protocol at all is involved. If retrieval is anticipated, then any NRI approach must specify a mapping to one or more protocols. All existing NRI approaches in practice specify only one such mapping, to the HTTP protocol. So they are in exactly the same position as http: -- if for some reason in the future the HTTP protocol becomes unavailable or inappropriate, both NRIs and http: will have to specify a new mapping.

True protocol independence is difficult to imagine in practice, as many protocols depend on a tight coupling between message formats and client/server application models. Protocols which don't allow servers any escape mechanism are thereby pretty much ruled out as transports for retrieval from NRIs (or http: URIs).

It's appropriate to note here that in cases where the necessary form of client/server interaction for a particular kind of information resource, for example streaming video, cannot be provided by the protocols normally associated with existing URI schemes, new schemes may be appropriate. Detailed discussion of this point can be found in [Schemes and Protocols]. But none of the NRI proposals are for resources of this kind.

2.4 Location Independence

nri_des

NRIs should not be locations.

Practical realities and administrative changes will always defeat any attempt to guarantee that the representation of a particular resource will always be stored in exactly the same host/server/filestore/directory/file. Any naming mechanism which equates locations in that sense with names is by construction inadequate. It follows that this goal is a sensible one.

http_fact

http: URIs are not locations.

Misunderstanding of http: URIs as locations has a long and, in part, justifiable history (they were, after all, originally call Uniform Resource Locators). But it's not longer justifiable either in principle (the RFC for URIs [RFC 3986] is quite clear on the subject) or in practice (there's lots of software support for server-side management of the relationship between http: URIs and their representations). See for example the classic [Cool URIs] for a more detailed discussion of these points.

2.5 Structured names

nri_des

NRIs should provide for structuring resource identifiers with shareable tags

This requirement has only been suggested by the authors of [XRI]. It amounts to a wish to structure resource names using name/value pairs, with the names having some standardized, widely understood meaning. This requirement is related to requirements appealed to in the design of End Point References [EPRs], [TAG on EPRs].

http_fact

The query component of http: URIs supports non-hierarchical structured naming.

It is open to any naming authority to establish conventions for the use of the query component of http: URIs under its control. Since the query component is already structured in terms of simple name/value pairs, it is a good fit for the requirement.

2.6 Uniform access to metadata

nri_des

NRIs should provide as well for access to metadata about as to representations of a resource.

http_fact

[conneg does this for http: just fine]

2.7 Rich authority

nri_des

NRIs should allow for sophisticated authority models, including delegation.

http_fact

[not sure]

Editorial note: HST 2006-03-14
<edtext>Not clear what this means -- the XRI examples suggest it's mostly about late-binding/encapsulation, e.g. xri://shoreline.library.example.com/(urn:isbn:0-395-36341-1) and xri://broadview.library.example.com/(urn:isbn:0-395-36341-1) are given as examples of 'copies of the same book at two different libraries'</edtext>

2.8 Trusted resolution

nri_des

NRIs should support trusted resolution.

http_fact

[not sure]

Editorial note: HST 2006-03-14
<edtext>This appears only in XRIs, and appears to me to be self-contradictory -- it says a "trusted resolution protocol [is] independent of DNS".</edtext>

3 The value of http: URIs

The http: URI scheme implements a two-part approach to identifying resources. It combines a universal distributed naming scheme for owners of resources with a hierarchical syntax for distinguishing resources which share the same owner. Widely available mechanisms (DNS and web servers, respectively) exist to support the use of http: URIs to not only identify but actually retrieve representations of information resources.

Any requirement for naming resources, particularly if not only naming but also retrieval of representations is in prospect, which admits to a similar decomposition, that is, into a universal owner name and a hierarchical owner-relative name, can almost certainly be satisfied by the http: URI scheme. http: provides substantial benefits, in terms of installed software base, user comprehension, scalability and, if required, security, at very low cost.

Anyone developing an alternative approach, that is, some form of NRI, should consider carefully whether that approach is either isomorphic to http:, or makes covert appeal to http: for its implementation. In either case, this strongly suggests that the fundamental requirements of the new approach do in fact admit to the two-part description given above, and therefore that http: itself would be a viable, and therefore a preferred, way forward.

Editorial note: HST 2006-03-21
<edtext>This text from DO is currently homeless: A main advantage of http URIs is the use of DNS to allow decentralized creation of vocabularies, but this does bear the cost that humans can be confused by the mixing of location and identifiers. Another possibility is to create a scheme that does not have any protocol associated with it, which I was thinking of at one point. The reason that this does not work and I did not proceed is that it does not address the issue of humans needing to understand context and it does not allow the flexibility of providing a namespace document.</edtext>

4 Case study: Naming namespaces

In this section we look in some detail into some of the background assumptions for the utility of NRIs for one particular purpose, namely for naming namespaces. We will compare http and nri schemes for namespaces names.

4.1 Context

Any use of identifiers requires a context. The context will define the use of the identifier and includes social and technical context. A URI on the side of a van will convey the social meaning that it can be typed into a browser and used. Other contexts for the use of URIs include namespace names, references to documents, and identifiers for things. There is never the case that a URI is simply "found" without a context.

The namespace specification is the context defining specification for namespace names. It specifies that namespace names are for defining a Qualified Name consisting of the Namespace Name plus the Local Name. The Qualified Name may then be compared against other Qualified Names. Very common scenarios are for performing well-formedness checking and for content model validation. The namespace specification, roughly speaking, says that a namespace name <rfc2119>should</rfc2119> not be considered dereferenceable. Any software component that is written assuming that any namespace name <rfc2119>must</rfc2119> be dereferencable is violating the namespace specification. It may be that the namespace owner has guaranteed that they will provide a document at the namespace name, but this must be on a subset of the entire set of namespace names. As a result of this, generic XML software should not be written to assume dereferencability of namespace names.

4.2 Identification

The first variant is an http: URI scheme is used for the namespace name. We will choose http://example.org/ns/foo as the namespace name.

Example 1: Namespace with http: scheme
<myns:foo xmlns:myns="http://example.org/ns/foo"/>

The second variant is an xri: scheme is used for the namespace name. XRI deals with delegation by using stars ("*"). One possibility using delegation is xri://=example*home*base/ns*foo as the namespace name. Another solution advised by xri uis to use identifiers that have bang ("!") symbols to indicate persistence. An example is xri://@!9990!AF8F!1C3D/!2495

Example 2: Namespace with xri: scheme
<myns:foo xmlns:myns="xri://=example*home*base/ns*foo"/>

In the case of XML software, both approaches work correctly. The software-only interaction pattern is clearly erroneous if it assumes that a namespace name is dereferenceable, and it is unlikely that XML software written today requires this assumption be valid

A common reason given for needing for namespace names, is that an http: identifier appears to humans as a location and hence dereferencable. The argument that http: URIs are "locations" is based upon incomplete understanding of the use of URIs. Any datatype, in this case URIs, exists in a context. A classic scenario is that a human looks at an XML document and the application shows the xmlns attribute as de-referencable. Assuming for now that there is no document dereferencable from http://example.org/ns/foo, then the human will click on the link and an HTTP 404 will be returned. The obvious downside is that the user has wasted some time, typically around 5-10 seconds. There is no additional harm than that in clicking and getting a 404.

The question to ask, is under what circumstance are identifiers are viewed as "clickable". In this document, neither of the xmlns links have shown up as clickable. When these documents were pasted into an email, they were not converted to clickable. The http: link was converted to clickable when the myns attribute was typed by hand. It was the e-mail program's "auto-complete" that saw an http: within a pair of quotes and made it clickable. It is also required that rich text or HTML formatting is selected in creator and receiver. When viewed in plain text, the link is not clickable. The clickable link arises when a document is typed by hand and then viewed by with HTML formatting. Neither of these applications is treating the document as XML, rather they are treating it as HTML. In particular, none of the applications know anything about XML or the xmlns attribute.

We can see that when people are reading and writing sample XML documents using HTML formatting, the worst downside is that a person may waste 5-10 seconds.

Contrasting with this is the approach of using an . An provides an identifier. A human looking at an xml document with an namespace name will not be confused about whether it is dereferencable or not. No software will "auto-complete" the xri: identifier into a clickable link. The 5-10 seconds of potentially wasted time are avoided.

But, if one of these identifiers appears in a document, how will the human find out the meaning? One approach is examine the context surrounding the datatype, which is the XML document and the Namespaces specification. They will look in the xml namespace specification and see what it says about namespaces. In this case there is no benefit to the xri: versus http: as the work is the same. Alternativey, they try to examine the namespace name, but it's not deferenceable so they get no information.

4.3 Dereferencability

It is natural for a human reading an XML document with an unknown namespace name to want to understand more about the namespace. This is why [AWWW] recommends providing a document at a namespace name that provides both human and machine readable information. The use of http: namespace names enables 3 separate scenarios:

  1. an identifier can be created in a decentralized manner;

  2. an identifier may be dereferenced by a person via a browser to aid understanding;

  3. an identifier may be dereferenced by a computer and exploited for automatic processing by reason of its identifying schemas, WSDLs, policies, etc.

These are two distinct interaction patterns, without and with human involvement.

In all dereferencable identifier scenarios, an identifier be usable to generate an authority. There may be interactions with multiple authorities to determine the "final" authority for the identifier. The final authority uses the identifier to produce a document.

In the http: identifiers, the authority is specified immediately after the scheme. The authority system in URIs is the internet's DNS and IP systems. A DNS authority produces an IP destination as the final authority. That specifies authority is then sent the remaining part of the URI for dereferencing.

Example 3: HTTP GET of namespace name
GET /ns/foo HTTP/1.1
Host: www.example.org

In the identifier scenario, the "location" to be used for knowledge is somewhere in the application or in some property of the such as a URI scheme or URN (sub)scheme. The proposal includes means to transform an into a dereferencable address via lookup using a registry server. This in turn requires the use of a deferencable address for the server, or else all software intended for use with must have the registry server location "hard-coded". As far as we can tell all the proposals expect the results of server lookup to be an http: URI, and also appear to use an http: URL to identify the location of the registry server.

XRI simplified identification specifies that "XRI resolution is a two phase process. The first phase, authority resolution, resolves to the XRI authority responsible for the resource. The second phase, local access, uses URIs and metadata from the authority to interact with the identified resource. " The XRI Resolution document specifies that a xri://=example*home*base/ns*foo is parsed to an XRI Authority of @example.org*ns. XRI Authority endpoints are described using XRI Descriptors. The XRI descriptor will specify that a URI, http://equals.example.com in this case, is the base authority for example*home*base resoution. The client uses the XRI syntax to determine that the "=" represents that global context symbol and that equals.example.com is the authority to send the resolution request.

Example 4: HTTP GET to XRI resolver
GET /xri-resolve/*example*home*base HTTP/1.1
Host: equals.example.org
Accept: application/xrid+xml

response:
200 OK

<XRIDescriptors>
...
</XRI Descriptors>

The Descriptors specifies that the authority for *example* and *home is xri.other.example.com. An HTTP GET request is issued

Example 5: HTTP GET to XRI resolver
GET /xri-resolve/*home*base HTTP/1.1
Host: xri.other.example.com
Accept: application/xrid+xml

response:
200 OK

<XRIDescriptors>
...
</XRI Descriptors>

This return is a Descriptor that specifies that http://xri.other.example.com/xri-local/base is the URI to do the HTTP request for the namespace document, as in:

Example 6: HTTP GET for document
GET /xri-local/base HTTP/1.1
Host: xri.other.example.com

There is the obvious bootstrap issue in the XRI system. Any XRI client MUST have the XRI descriptor format. This is effectively a replacement for DNS, that is mapping names to addresses. Note that it recurses and uses the DNS/HTTP infrastructure in this example. There are 3 separate HTTP GET requests to resolve the xri: namespace name into a document. It is substantially easier for software to use existing DNS and HTTP infrastructure, than to use an intermediary identifier and quite probably the existing DNS/HTTP infrastructure.

Namespace names are just one example of a context of use. Any use of the URI datatype in an XML document has the same issues. A provider of a URI must specify how the URI will be used in each specific sub-context of their XML language, whether it is intended as an identifier, a location, or both. Using an instead of an http: URI does not make the software or human's job any easier.

4.4 Location Independence

Another common reason is to come up with an identifier that is location-independent or "movable" from one location to another. In all cases, there must be some kind of mapping of the identifier to the "new" location if a location is changed. There is a publishing step, where the "new" location is somehow added into the registry for the identifier.

HTTP supports movement through various 3xx status codes. Virtually all Web browsers and servers will correctly utilize the 3xx HTTP Status codes.

Example 7: HTTP GET of namespace name
GET /ns/foo HTTP/1.1
Host: www.example.org

response:
301 Moved Permanently
Location: http://www.example.org/ns/latest/foo 

There are two steps to making the namespace name document dereferencable at the new location: 1) the Web server must be configured to do the 301 and new Location (the redirect); 2) ns/latest/foo must be added to the system.

XRI allows a simpler retrieval once the authority is known by removing the redirect step. The authority maps the identifier to the new document and retrieval of ns/foo is avoided. The change process is the same: the registry must be updated and the new ns/latest/foo must be added to the system.

Now the question is about the relative difficulties in updating an HTTP Server or to update an XRI resolver. In either case there will be some kind of submission and approval process. Given the widespread deployment of HTTP and HTTP Administrators, it seems more likely that an HTTP Server update process will be easier and faster than an XRI Resolver update process.

The most important question to ask of the namespace name example, is how often will namespace name documents move. Historical evidence indicates that once minted, namespace names remain inviolate. There may be new namespace names minted and compatible additions or extensions made to existing namespace names ( as described in [namesinnamespaces]). Would XRI help make namespace name documents more deployable by more movable? So far, indication is that it is not difficult to deploy namespace name documents at namespace names. Perhaps it is the scale of deployment that is a concern, but it is difficult to imagine that scaling on the Web is a source of concern.

4.5 Summary

If the scheme definition for xri says that it is dereferencable, and specifies a mechanism, then either that mechanism is HTTP, or it will have to provide all the functionality, and thus be heir to all the weaknesses, of HTTP. In either case little benefit has been gained over just using the http scheme itself. Note we have not yet compared the authority resolution mechanisms and the dependence upon centralized authority. We have also not compared the distributed authoring of identifiers either.

There are two concrete benefits to using XRIs identified in the previous analsysis: that users cannot waste time by erroneously dereferencing namespace names that do not have namespac documents, and that an extra HTTP GET request is avoided when namespace documents move. The solutions costs are adding a new identifier scheme with the software and human costs and seemingly mandatory increased network costs ( our example shows 3 HTTP GETs instead of 1). Given these costs and benefits, deploying a whole new resolution mechanism and related software to layer on top of existing web functionality is not justified.

A References

Cool URIs
Berners-Lee, Tim Cool URIs don't change, W3C, 1998. Available online as http://www.w3.org/Provider/Style/URI.
RFC 3986
Berners-Lee, T., Fielding, R. and L. Masinter Uniform Resource Identifier (URI): Generic Syntax, IETF, 2005. Available online as http://www.faqs.org/rfcs/rfc3986.html
oasis URN
Best, K and N. Walsh A URN Namespace for OASIS, IETF, 2001. Available online as RFC 3121
EPRs
Gudgin, M., Hadley, M. and T. Rogers, eds Web Services Addressing 1.0 - Core ("Endpoint References" section), W3C, 2006. Available online as http://www.w3.org/TR/ws-addr-core/#eprs
AWWW
Jacobs, Ian and Norman Walsh, eds. Architecture of the World Wide Web, Volume 1, W3C, 2004. Available online as http://www.w3.org/TR/webarch/
RFC 3688
Mealling, M. ed. The IETF XML Registry, IETF, 2004. Available online at http://ietfreport.isoc.org/idref/rfc3688/
XRI
Reed, Drummond and Dave McAlpin eds. An Introduction to XRIs, OASIS, 2005. Available online as http://www.oasis-open.org/apps/group_public/download.php/11857/xri-intro-V2.0-wd-04.pdf
XRIResolution
Gabe Wachob editor XRI Resolution, OASIS, 2005. Available online as http://docs.oasis-open.org/xri/xri/V2.0/xri-resolution-V2.0-cd-01.pdf
TAG on EPRs
TAG Request for Change to WS Addressing Core, TAG message to WS Addressing Working Group, 2005. Available online as http://lists.w3.org/Archives/Public/www-tag/2005Oct/0057.html
Schemes and Protocols
"Relationship of URI schemes to protocols and operations", TAG issue schemeProtocols-49, available online as http://www.w3.org/2001/tag/issues.html#schemeProtocols-49
The Disposition of Names in an XML Namespace
"The Disposition of Names in an XML Namespace", TAG issue nameSpaceState-48, available online as http://www.w3.org/TR/namespaceState/