Central Registries and URIs - like Oil and Water?

Disclaimer: This document strictly represents my own opinion and not necessarily those of the W3C.

Status: This document is in VERY ROUGH draft mode - too early to comment!

The explosion of the Web and the increasingly decentralized evolution of HTTP, HTML and URIs has pointed out a series of fundamental problems in the traditionally centralized use of registries of protocols, protocol extensions, media types, URI schemes and the like.

On the other hand, central registries have been essential in order for the Web to prosper in the first place by providing the common ground needed for independent applications to communicate. As an example, the default behavior for an HTTP client resolving an http: URI is to open a TCP connection and talk some version of HTTP to a server. Both the binding between the http: URI and the HTTP protocol and the link between HTTP and TCP are in fact centrally registered bindings.

A tempting question often asked is: Will there continue to be a role for central registries or should we move towards a completely decentralized solution based on URIs? But clearly there is no simple yes/no style answer to this question: Both central registries and decentralized URIs have shown to have their strength as well as weaknesses.

I think the trick is not to consider this an either/or situtaion but rather to define the scope of central registries and their interactions with the URI space in such a way that we can benefit from both worlds and that this is not only desirable but essential for interoperability and evolvability to coexist on the Net.

However, before defining this interaction, we need to first have a look at the strengths and weaknesses of URIs and

Evolution

Semantics

Limitations of URIs

Not everything can be solved by indirection

In fact I think it is not a either/or situtation but rather an and

Or can the two coexists

Is there a role forWhat is the role o

Requirement: The two parties do not have to have any apriori knowledge about each others capabilities

This document advocates that by using URI's instead of or in conjunction with a central registry, many of the bottlenecks due to the inherit limitations of a central registry can be avoided. In order to make this clear, let's first have a look at the deficiencies of a central registry and how a URI based system may overcome them.

In the following, I call an entry in a registry for a "feature"; the unit of a feature can be as small or as big as you like; it can be a protocol, a subpart of a protocol, an interface, an encoding, a media type etc. The important part is that it covers some functionality provided by some implementation and that it may be used by two or more the parties communicating in a distributed system.

Scarce resources create bottlenecks

A central registry is really a type of a scarse resource

Central Registries are Constellations in URI space

Not everything can be solved by indirection!

The Problem of Scalability

As the Web grows, the amount of features that are candidates for registration increases as well. As an example, the number of MIME header fields used by SMTP, NNTP, and HTTP, although not currently registered, amount to several hundreds and are likely to increase into the thousands over time. Often new header fields are added by choosing a long name in order to make it less likely to conflict with existing features. Likewise, the set of media types in use on the Internet is extremely big and growing fast.

Feature versioning is notoriously hard to solve. Unless a new version is fully backwards compatible with previous versions, each version requires a new entry in the registry hence increasing the amount of entries significantly and posing problems of describing the relationships between the various registered versions.

Webdav has been thinking long and hard about providing a good versioning model for URIs and their feeling is that different versions are assigned separate names and the relationship can be handled using a metadata model like RDF. This means that features can be described using the same features as any other resources on the Web including relationships to other resources. The point here is that even though using URIs does not decrease the set of features, the URI space makes available a set of features that can be deployed by features which not are (or at last not currently) accessible in any central registry.

Another problem is that it is hard for a registry to gain and maintain central authority in a global system like the Internet. If the registry does not respond within a time frame acceptable to the users, they will tend to circumvent the registration process hence avoiding the overhead. And if the registry can not provide a near complete list of deployed features then it is likely to be used less, hence headed for a downward spiral.

Lack of Flexibility

Many existing Web applications support software component interfaces that allow dynamic installation of new facilities that either add functionality or replace existing functionality provided by that application.

The power of the Web lies in a homogeneous naming system based on the URI name space. The beauty is that if a resource has a URI then it is likely to be available to a Web application. There are of course some practical limitations in that the user may need special privileges in order to access the resource and the Web application must support the particular mode of access (the URL scheme). If a feature is accessible through a URI, however, then by resolving that URI, the application may be able to download a piece of software that provides the functionality defined by that feature.

You can then say that a registry could contain a set of URIs pointing to implementations handling the registered feature. However, that opens the registry for a new set of problems that are notoriously hard to solve like guaranteeing that an implementation referenced by the registry actually does the job right and does not do nasty things on your machine etc. Also, it does not encourage multiple independent implementations of the same feature.

The problem of security doesn't of course go away by using URI's instead of registered tokens. However, it allows introduction of new features using the same security mechanisms as are being developed for generic resources, for example the W3C DSig initiative which provides support for signing assertions about resources.

Another problem that I haven't mentioned is: how is the central registry intended to be accessed? Is it bound to be accessible as ASCII text, or should it be accessible through some sort of query language? By using URI's instead of a central registry, the accessibility of a feature is up to the publisher of the feature and not the capabilities of the central registry.

Lack of Semantics

In order to cope with the scalability and flexibility problems of a central registry, the common solution has been to decrease the amount of information registered, in some cases down to a single token, for example a media type or a content encoding. Sometimes the limited information is also due to the fact that the owner of the registered token doesn't want to publish the semantics of the registered token in any form supported by the central registry. As an example, a provider may make a viewer available for a certain media type but not make public the format itself, which is the case for most media types of type "application".

It is obvious that a registry of tokens without any semantics to follow is of little use in a distributed system and to Web applications supporting component interfaces. As mentioned above, URI's allows (but does not require) dynamic extensibility of applications by using component interfaces. This means that support for new features can be downloaded dynamically hence extending the application on the fly.

Persistence of Semantics

Everything is going to change!

Not all features are good candidates for static registration in a registry. Some may be dynamic by nature, for example any type of features that have been negotiated over time between the involved agents on a one-to-one basis. URI's don't make any assumptions of the "persistence" of an entry - the persistence is up to the provider of the feature, which means that a feature can live as long as it is appropriate.

Using URIs, features can be assigned a URI on the fly, hence defining new features with no or very little overhead indeed. Even agents that do not have access to a persistent part of the URI space can create a URI on the fly to describe a short-lived feature. If an agent is talking to another agent that does not understand a URI, then the first agent can expand the URI on the fly - essentially introducing a callback functionality, which fits very nicely into the HTTP-NG framework.

Is a central registry and the URI space Mutually Exclusive?

Does the limitations of a central registry mean that a central registry is not useful? A registry like IANA makes a lot of sense as essentially an index of the features of all Internet Protocols published by the IETF. This avoids any inconsistency between multiple protocol definitions within the scope of IETF. However, a central registry does not make sense for features that have not gone through the IETF process.

Henrik Frystyk Nielsen,
@(#) $Id: CentralRegistries.html,v 1.21 1999/05/03 22:59:32 frystyk Exp $