Re: [Fwd: Re: Approval of initial Dublin Core Interoperabiity Qualifiers] from Martin J. Duerst on 2000-04-30 (uri@w3.org from April 2000)

From: Martin J. Duerst <duerst@w3.org>
Date: Sun, 30 Apr 2000 17:03:46 +0900
To: Ray Denenberg <rden@loc.gov>, W3C URI List <uri@w3.org>
Message-Id: <4.2.0.58.J.20000430154646.009fd770@sh.w3.mag.keio.ac.jp>
At 00/04/28 11:19 -0400, Ray Denenberg wrote:
>It's been recomended that I forward this message, originally posted to
>the DC list, as this list is probably the appropriate place for this
>discussion.

Thanks for forwarding.


>From: Ray Denenberg <rden@loc.gov>

>I'd like to consider the deeper implications of Roy's message.
>
>Roy Tennant wrote:
>
> > Say what?!? Did I miss a pronouncement from W3C that URI finally has some
> > kind of meaning? Since when is "URI" an encoding scheme?

Could somebody from the DC community explain what an
'encoding scheme' is in DC? Or give a pointer?


>These are two separate questions, what URI means, and whether URI is an
>encoding scheme.
>
>I think the first is the relevant question: the meaning of "URI" is clearly
>the subject of pervasive confusion.  (As to the second question, yes, URI is
>an encoding scheme, it just hasn't been nailed down yet as such, and it won't
>be, until we know the meaning of "URI".)
>
>There is a rather profound implication, I think, to the fact that the sole
>encoding scheme prescribed by DC for a resource identifier is URI. Though I
>don't take issue with this approach I think it is necessary to consider the
>implication:  the assumption that there will be a URI scheme developed for
>any type of identifier to be used in Dublin Core.

This corresponds to the assumption, on the WWW, that there is an URI
for everything that you want to identify. This makes a lot of sense.


>There's a long way to go before this a reality, and it all begins with
>agreement on what URI means (or, conversely, there won't be any progress
>until this is resolved).

Well, the best way to approach the 'meaning' question in this case
is probably by looking at utility, i.e. what's it used/useful for.
The central points are:

- Identification (of pretty much anything, not only web pages)
- Common syntax elements (as defined in RFC 2396): If you want a
   certain functionality (such as hierarchy) in a new identifier scheme,
   you do it in a certain way; if you don't need this functionality,
   you avoid using certain characters.


>The reason I'm addressing this issue is that we (LC) have been
>trying to convince the W3C to initiate an activity on URIs; several of
>the people active in DC have ties to the W3C, and I think that this issue is
>a good illustration of the importance of this to the DC community.

The W3C has spent some time a while ago on preparing something like
an URI activity; however, we have received signals from some of our
Members that were very tired of discussions about these topics.
If people from the DC community who would like to see such an
activity could write up what kinds of things such an activity
should address, and so on, that would help. Please have a
look at the W3C process document on how to do this
(sorry, would give a link, but I'm currently offline),
or contact me or anybody from the W3C Team. The
upcomming WWW9 in Amsterdam may also be a good place
for discussion, I'll be there.


>On the issue "what is a URI?", I see two differing views:
>
>(a) URI schemes fall into two (or more) broad classes, URL and URN. Each URI
>scheme is cast into one or the other class; thus for example "HTTP:"  is a
>URL scheme and "hdl:" a URN scheme. (Of course, there are RFCs that still
>hypothesize about an additional class, URC.)
>
>(b) The distinction between URL and URN is artificial, and therefore this
>"level" in the URI hierarchy  un-necessary. Thus  "HTTP:" and "hdl:" are
>simply URI schemes. By this view, the concept of "URL" and "URN" would go
>away, and be replaced by "URI".  (Which does not mean that any of the
>proposed URN schemes would go away; they would just become URI schemes,
>as would HTTP.)

You also have to be careful about two uses of the URN/URL
distinction. Sometimes it's used in the formal sense, i.e.
what has an urn: prefix is an URN, sometimes it's used in
a different sense, e.g. what can be resolved, such as http:
and ftp:, is called URL, whereas things such as cid: and
mid: are called URNs. You use the later in (a) above,
but below, you change to the former.

I have to say that I had a long time to even half-way understand
why people wanted to distinguish between names and locations
in such a cut-across-everything way as they tried to do it with URLs
and URNs. I saw some light (not in that I could approve this distinction,
but in the sense that I felt I started to understand why some people
found it was so important) once when I spoke with somebody from
the library community.

In a traditional library, looking for a book (or document in the
general case) is a two-step process:
1) Pin down which book you want (e.g. Tom Sawyer by Mark Twain)
2) Pin down where to get the book
    (which shelf, or which other library)
This two-step (name/location) distinction is very natural and
ubiquitous in the library trade.

The problem is that such a model is not general enough for
the digital, networked world. Distance in time and space doesn't
count that much anymore, and the location becomes transparent
(very few surfers have any idea where their pages actually
come from, nor do they care). On the other hand, looking at
what it happening in more detail, there is a highly sophisticated
chain of resolutions and copies going on, mapping domain names
to Internet numbers, file names to inodes, copying documents
from one cache to the next, and so on.

The two-step model (which would lead to a clear distinction
of two categories of identifiers) is therefore not general enough,
and it has been replaced (or should be replaced) by a model
that at the core works with one very general category of
identifier (URI) that can be resolved in many different ways
and potentailly over many different steps.

This of course doesn't mean that all URI schemes, or all URIs,
have the same properties. It just means they have enough in
common to make the commonalities more important for the big
picture than the differences, and that the differences can
be manifold.


>Most RFCs pertaining to URIs support view (a), however there seems to be
>growing sentiment for view (b).

Very much so. Of course if you look at all the RFCs, they will also
support the view that the ftp: URI and the http: URI are different,
and so on. The main reason for this is that the commonalities of all
these are described in one single place, whereas it takes many RFCs
to describe the specifics.


>I don't personally see it as an important
>issue, worthy of bringing progress on uri schemes to a complete impasse;
>it is an  issue that needs quick resolution (either way, as far as I'm
>concerned), and it needs IETF/W3C collaboration.
>
>The argument for view (a) is the conceptual and practical distinction
>between a URL and URN.  The conceptual distinction is expressed in terms
>of persistence (which those who support view (b) think is an artifical
>distinction).

This is is not true, at least not for me. Persistence (or not) is
an important distinction, but:
- There are many other important distinctions, and picking up one
   and splitting the realm of identifiers into two only based on
   this distinction seems not so useful.
- Persistence, as many other things, is more a social than a
   technical problem. It is possible to create and maintain very
   persistent things based on http:, on the other hand, it is
   possible to create completely unstable URNs. Trying to use
   naming differences or implementation differences may
   give a dangerous impression of guaranteed persistence.


>The practical basis for the distinction is probably the
>difference in how URLs and URNs will be resolved, the theory being that
>URN schemes will share certain resolution traits, as will URL schemes, and
>the URN resolution traits will differ from those of URL schemes. Thus all of
>the URN commonality could be absorbed into a single "common-scheme" (or
>"common protocol") simplifying the "individual-schemes" (or "individual
>protocols"); furthermore (as the theory goes),  much of the common resolution
>process corresponding to URNs would be supported by the DNS.

Much of the common resolution process of URLs is supported
by the DNS, too. The 'bootstrapping' idea of URN resolution has
some appeal to it, but it hasn't seen that much testing and
deployment yet to give everybody enough confidence.


>(There are RFCs
>that describe, in fairly specific detail, how the DNS will be used for
>resolution of URNs.)  The counter argument  is that  web browsers don't
>understand URNs nor do they seem to offer any prospects for URN support;
>moreover,  there doesn't appear to be any prospects of seeing this
>special DNS facility developed either.

Yes, this is indeed a problem for URNs. But this is a problem for
any other new URI scheme, too.

But browsers nowadays are in most cases equiped with some kind of
extension facility to add new schemes.


>As a more general observation,  consider that there are dozens of
>URI/URN-related RFCs; it is difficult to even identify them all, much less
>determine which are relevant and which aren't, which are current and which
>are out-of-date, and understand their relationships to one another.

The most central one is RFC 2396. Compared to that one,
anything else is in some way fairly minor.


>What's
>needed is a coherent document (preferably, normative), that ties all of the
>relevant information together, perhaps pointing to the appropriate RFCs.

What should such a document say? Should it be updated every time
a new URI scheme is defined? Isn't it dangerous to try and say
everything in one single place?


>We simply feel that  confusion over URIs is so pervasive that without a
>proactive initiative, progress is unlikely -- we'll never even begin to
>discover what URI schemes are going to be necessary, useful, and/or
>popular;

Well, this is a feature, not a problem. If we knew exactly which
URI schemes would be needed/popular in a few years, we would
already have invented them.


>which will be supported; and which schemes vendors will need to support.  The
>only way that this discovery process can even begin is if we begin to
>register URI schemes (some will ultimately be winners, others won't), and it
>isn't even clear how to do that.

Not, it's very clear. There is an RFC on it, RFC 2717.


>(There are certain potential URI schemes
>that we're fairly sure are going to be necessary, and for which registration
>hasn't been attempted yet,  because of confusion over registration
>procedures.  "isbn:" may or may not be a popular scheme; we won't begin to
>find out until it is at least  registered. Its definition as a URN scheme has
>been described hypothetically in an RFC, but registration of "isbn:"  hasn't
>even been explored, apparently because it is assumed that it simply cannot be
>registered, for legal reasons.

What would these reasons be? I would be interested to know.


It seems that the main question that is actually asked,
although somewhat implicitly, is:

We have certain things that we would like to identify, in
the context of [URI/URL/URN], how should we do that?

There are at least tree answers:

1) Define and register your own URI scheme
2) Define and register your own URN scheme
3) Use an existing URI scheme

The distinction between 1) and 2) has mostly been given above.
But 3) is an alternative well worth considering. The main
advantage is that you don't have any problems with browser
or infrastructure support at all. The main disadvantage
is that it may not look as attractive as a new one,
and the choice between functionality and looks is of
course yours.

Another real problem of the DC community may be that for
some identifiers, e.g. isbn, somebody has to do some real
work (writing registration documents, running some servers,...),
but it is not clear who should do that.

Regards,   Martin.
Received on Sunday, 30 April 2000 03:59:31 UTC