Ideas

The difficult problem is, as we have seen, to provide a system which will provide both persistence and resolvability.

Hints

An approach (suggested for example by Karen Sollin's "InfoMesh" work) to store not hard facts, but hints about an object's history, on the assumption that these can be used in some cases to lead directly to n object, and in other cases will give enough information for an archive or moved version to be located.

David Gifford's work uses a formalization of hints as attribute value pairs to allow each server to describe what sorts of information it will store. This allows a seearch for a given expression to be directed toward servers which willbe able to help.

Massive indexes

Other approach is try to crack the entire problem with a very large hammer, by making huge distributed indexes of all documents. This requires no structure or hints in the name at all. One suggestion (which I heard first from Robert Acskyn) was to encircle earth with index servers, each of which would serve one part of the index. (A hashing function on the name would select the index server.) A fascinating thing is that these approaches are not in fact ridiculous, as the number and size of servers each can scale with the square root of the number of documents, and so it is conceivable, though not by any means proven, that technology can scale with the demand.

In any case, the Archie project demonstrated clearly that one can do a lot on a global scale with large disks and fast processors.

The Domain Name Service

The Domain Name Service, which registers host names such as nxoc01.cern.ch and returns the corresponding IP address, such as 128.141.201.74, bears a lot of the glory for the scaling of the internet up to 2 million hosts. Domain names are relatively abstract names, which allow the underlying physical addresses to be changed. This is done using a distributed replicated database system which exploits the hierarchical nature of the name space. A practical necessity for scaling was also the delegation of naming authority, which also follows the hierarchy. This is exactly the sort of thing we want for documents.

Although "fully qualified domain names" (FQDNs) are a lot better than IP addresses, there is a reluctance to use them in document names because with time, hosts change. In fact, as well as host record in the name system there are "Alias" records -- such as www.cern.ch -- which are even more abstract and point to host names. Many organizations make aliases for their servers, to allow them to change host name as well as IP address. There are also another sort of record, "Mail exchange" (MX) records, which may exist at any point in the tree, and indicates that a given host will take any mail for anyone in the DNS tree bellow that point. The persistence of domain names is limited by the persistence of the organizational structures (for example, companies) to which they point and to the persistence of the domain name system itself.

A combination

It is clear that DNS itself will not scale to contain an estimated 10 billion documents when all objects created by "civilized" people have names.

(This is, I am told, because DNS needs delicate tuning of various parameters even to work now). However, the success of large indexes suggests that the problem could be split into a use of the DNS or some equivalent to locate a set of thousands of index servers, each of which would index millions of documents. If a hierarchical scheme is used, one can be free to decide at any branch the tree at which point to move from the DNS structure into a local indexer. For example, one could use "IX" records in a manner similar to MX records, or one could cheat and use an alias "urn.domain" instead of defining a new record type. The IX records may have the same sort of granularity as MX records or server aliases now.

The only practical suggestions at the IETF to date (25 October 1993) have all fallen back on alias records in the DNS for the top level, so it may be that a system of this type will emerge soon.

Separating the namespace and the protocol

When a name space is well designed, there is no reason that a single protocol should be used to provide a lookup service. Though DNS may, for example, be an initial implementation of a hierarchical lookup service, there is no reason for the name space not to outgrow any network protocols. There is no problem with this, as transistions are easy to arrange, and parallel systems working with the same data. This applied to both hierarchical and attribute/value based name spaces.

Conclusions

We can conclude that existing name schemes which rely on aliases in the DNS name space, and withing the server use a clean time-invariant document naming, are in fact very practical and will serve us well for some time.

Practical projects have yet to show whether attribute/value based name spaces will be more resilient than hierarchical name spaces and still scale with efficiency, or whether a simple hierarchical naming space for all objects is just around the corner. In any case, the outlook is bright that naming will not be a sticking point for the information revolution.

(up to naming )

Tim BL