The difficult problem is, as we have
seen, to provide a system which will
provide both persistence and resolvability.
Hints
An approach (suggested for example
by Karen Sollin's "InfoMesh" work)
to store not hard facts, but hints
about an object's history, on the
assumption that these can be used
in some cases to lead directly to
n object, and in other cases will
give enough information for an archive
or moved version to be located.
David Gifford's work uses a formalization
of hints as attribute value pairs
to allow each server to describe
what sorts of information it will
store. This allows a seearch for
a given expression to be directed
toward servers which willbe able
to help.
Massive indexes
Other approach is try to crack the
entire problem with a very large
hammer, by making huge distributed
indexes of all documents. This requires
no structure or hints in the name
at all. One suggestion (which I heard
first from Robert Acskyn) was to
encircle earth with index servers,
each of which would serve one part
of the index. (A hashing function
on the name would select the index
server.) A fascinating thing is that
these approaches are not in fact
ridiculous, as the number and size
of servers each can scale with the
square root of the number of documents,
and so it is conceivable, though
not by any means proven, that technology
can scale with the demand.
In any case, the Archie project demonstrated
clearly that one can do a lot on
a global scale with large disks and
fast processors.
The Domain Name Service
The Domain Name Service, which registers
host names such as nxoc01.cern.ch
and returns the corresponding IP
address, such as 128.141.201.74,
bears a lot of the glory for the
scaling of the internet up to 2 million
hosts. Domain names are relatively
abstract names, which allow the underlying
physical addresses to be changed.
This is done using a distributed
replicated database system which
exploits the hierarchical nature
of the name space. A practical
necessity for scaling was also the
delegation of naming authority, which
also follows the hierarchy. This
is exactly the sort of thing we want
for documents.
Although "fully qualified domain
names" (FQDNs) are a lot better than
IP addresses, there is a reluctance
to use them in document names because
with time, hosts change. In fact,
as well as host record in the name
system there are "Alias" records
-- such as www.cern.ch -- which are
even more abstract and point to host
names. Many organizations make aliases
for their servers, to allow them
to change host name as well as IP
address. There are also another
sort of record, "Mail exchange" (MX)
records, which may exist at any point
in the tree, and indicates that a
given host will take any mail for
anyone in the DNS tree bellow that
point. The persistence of domain
names is limited by the persistence
of the organizational structures
(for example, companies) to which
they point and to the persistence
of the domain name system itself.
A combination
It is clear that DNS itself will
not scale to contain an estimated
10 billion documents when all objects
created by "civilized" people have
names.
(This is, I am told, because DNS
needs delicate tuning of various
parameters even to work now). However,
the success of large indexes suggests
that the problem could be split into
a use of the DNS or some equivalent
to locate a set of thousands of index
servers, each of which would index
millions of documents. If a hierarchical
scheme is used, one can be free to
decide at any branch the tree at
which point to move from the DNS
structure into a local indexer.
For example, one could use "IX" records
in a manner similar to MX records,
or one could cheat and use an alias
"urn.domain" instead of defining
a new record type. The IX records
may have the same sort of granularity
as MX records or server aliases now.
The only practical suggestions at
the IETF to date (25 October 1993)
have all fallen back on alias records
in the DNS for the top level, so
it may be that a system of this type
will emerge soon.
Separating the namespace and the
protocol
When a name space is well designed,
there is no reason that a single
protocol should be used to provide
a lookup service. Though DNS may,
for example, be an initial implementation
of a hierarchical lookup service,
there is no reason for the name space
not to outgrow any network protocols.
There is no problem with this, as
transistions are easy to arrange,
and parallel systems working with
the same data. This applied to both
hierarchical and attribute/value
based name spaces.
We can conclude that existing name
schemes which rely on aliases in
the DNS name space, and withing the
server use a clean time-invariant
document naming, are in fact very
practical and will serve us well
for some time.
Practical projects have yet to show
whether attribute/value based name
spaces will be more resilient than
hierarchical name spaces and still
scale with efficiency, or whether
a simple hierarchical naming space
for all objects is just around the
corner. In any case, the outlook
is bright that naming will not be
a sticking point for the information
revolution.
(up to naming )
Tim BL