The introhash URI Formation:
Browsable and Portable URI Names

Personal Working Draft 5 August 2003

This version:
http://www.w3.org/2003/08/introhash/v2
Latest version:
http://www.w3.org/2003/08/introhash/latest
Previous version:
http://www.w3.org/2003/08/introhash/v1
Editors:
Sandro Hawke,W3C/MIT <sandro@w3.org>

Abstract

This document specifies and advocates a way to form URIs which may make them retain their name-like properties in the face of changes in their address-like properties.

Status of this document

No official W3C status. This is not a W3C Technical Report. (Basically, it's is an idea I had on 2003-08-03. Not yet implemented, but it sure interesting on paper.)

Please send comments to sandro@w3.org cc: www-archive@w3.org or some other archived list, such as www-rdf-interest@w3.org. uri@w3.org, or public-sw-meaning@w3.org .

Changes since v1: Added a local-part after the hash, to fit with the whole idea of namespaces.

Overview

RDF and Semantic Web applications use URIs to identify everything about which they convey information, including physical objects (such as cars, galaxies, and coffee makers) and conceptual ones (such as particular properties, classes, and datatypes).

The relationship between this use of URIs and their use in earlier Web applications is controversial. In RDF, a URI is used directly as a name for something, but there is no consensus among the designs or designers of URI-based systems about how URIs might function as names. There is no agreement about exactly what thing is identified by most URIs, and against many proposals to define some meaning for some URI, there are strong counter arguments based on existing user and system behavior.

The introhash technique tries to navigate the controversy and provide useful, backward-compatible, interoperable, URI-based names suitable for use in both RDF and HTML. These names create families of URIs which can be easily determined to mean the same thing in RDF, while still providing different web-retreival behavior. Essentially, we view each URI as naming a thing via some information source. The connection to a source is both an asset and a liability: it allows us to follow links for more information, but it puts all the RDF content using that URI at risk when the associated source falls into disrepair. By syntactically grouping URIs which name the same thing, independent of URIs and their associated information sources, we essentially eliminate the liability.

This technique is non-traditional, and may offend some people'se sense of web archictecture. Please raise specific concerns on the appropriate mailing lists, as above.

Some of the issues addressed here, which arise when using URIs to name things in RDF, might be phrased like this:

  1. "I don't want the URI for my dog to be confused with my web page about him." See When Browsable and Unambiguous Collide.
  2. "I named the thing; you should accept what I say about it, as part of the definition. If you say my dog is white, using my name for him, and I say he's black, you're being logically inconsistent." See Use Implies Consent.
  3. "Sure I use your name for that dog, because that's the name everyone uses for him. Yes, you said he's black. But I've seen him, and that color, in animal fur, is technically called 'brindle'. Naming him doesn't make you right about everything."
  4. "I don't want to use an HTTP name for something because I don't have a stable web presence. Who knows when my department's domain name will change because of some reorganization? Someone else might get our old domain name, and who knows what web content they'll provide there. We can't really trust the future that way."

How to Use "introhash"

1. Compose a Naming E-mail

When you want to name something, compose an e-mail about it which describes or identifies the thing. This is the "introduction" (or "intro"). The act of creating this e-mail is a sort of naming ritual. It will create a namespace; in it you may fully enumerate or leave open the local parts of the names which will be in the namespace.

You don't need anyone to be present at the naming, or to see the e-mail as e-mail. You just need to keep the resulting octet-string around for future reference and to share as needed.

Every introduction will be different, because at very least the Message-ID field will be different. The From and Date header fields are likely to be different, too.

The introduction message may be MIME multipart, perhaps containing images, RDF, and plain text sections.

It may contain various names for the things being given introhash names (although not the introhash names for them), URIs for where to get information about them, and policies stating who you would like to be considered to be authorized to say privileged things about them. These statements might be made in natural or formal languages, given suitable vocabularies.

2. Compute the introhash

Compute a cryptographic hash of the e-mail, using MD5 or SHA (FIPS 180-2) and express it in lowercase hexadecimal. That string, the hash of the introduction, is the named thing's "introhash".

Anyone can make a introduction, and thus an introhash, for anything. Making a second introhash for something should generally be avoided, since determining co-reference between the two introhashes may be difficult or impossible. Of course if you need a name for something and you can't find any existing acceptable introduction for it, go ahead and make your own.

3. Make A URI

To refer to the thing in RDF, use a URI like:

http://joe.example.net/introhash/e590e18471897c8f3d72b53235ac172d/fred

That gives people a URI on which to perform a GET operation in order to obtain information (from you, via your website joe.example.net) about the thing (fred).

The URI MUST end in "/introhash/" followed by a hex string resulting from one of the approved cryptographic algorithms, followed by a "local part" (here "fred")

4. Offer Useful Web-Content

Other people can use that URI to get information about it, but no matter who controls content into the future, the introduction string can be checked against the given hash. The current content may very, and the introduction may not be available any more, but the introduction itself is immutable (assuming the security of the hash algorithm used).

Content should not be served directly from these http URIs, to reduce the risk of getting confused between references to a thing and references to information or sources of information about the thing. Instead, redirects such as "303 See Other" should be offered, redirecting the client to your chosen source of information about the thing.

Alternative URIs: The Equality Assumption

Other people can also use URIs like

http://fred.example.net/introhash/e590e18471897c8f3d72b53235ac172d/fred

to refer to the object. Readers may invoke the "introhash equality assumption", that two URIs containing matching introhashes and local-parts always denote the same thing in RDF. (Of course the served content (after redirection) may be different, so the URIs are not operationally equivalent. The point is that we can vary the associated information source without changing what thing is being named.)

This allows 3rd party mirroring without breaking RDF content. This assumption may not always be warranted, but I think it will turn out to be a safe assumption, and one best left turned on in software. Of course it violates certain principles of URI opacity, but... so be it.

This assumption would be bad if someone were to accidentally create a URI which ended in "/introhash/" followed by a hash which matched that generated by some "introduction" e-mail. But that would require a collision of a secure hash algorithm, an event viewed as extremely unlikely.


This work is part of the MIT-LCSAI DAML Project under the MIT/AFRL cooperative agreement number F30602- 00-2-0593. This work is not on the W3C recommendation track, is not the product of a W3C working group or interest group, and has not been considered for endorsement by the membership.

Sandro Hawke
First: 2003/08/04, $Id: v2.html,v 1.3 2003/10/19 01:03:33 sandro Exp $