From Descriptions To Names (The Identifier-Opacity Spectrum)

Status

No official status. Just some thoughts. Revision history in CVS. See related writings.

Background Problem

Should we identify objects by assigning them an identifier string which is unique in some namespace (eg my social security number), or should we just describe them in enough detail to narrow the field to one object (eg the only guy over 30 who resided at 39 Greenwood Lane, Waltham MA during calendar year 2000)?

The answer is: they're pretty much the same thing.

This process assumes a distributed naming environment.

1. Start with a human-readable description.


   [The file] can be obtained by connecting to host SRI-NIC
   (10.0.0.73) [with] your local FTP [client], logging in as
   user=ANONYMOUS, password=GUEST, and doing a 'get' on
   HOSTS.TXT. 

                                   -  RFC 810 (March 1982)

2. Make it machine-readable.

Requires a description syntax and vocabulary. Depending on your modeling of the description information, this can get quite complicated.

It might start like this:

        there exists host such that
           name(host, "SRI-NIC") and 
           ip_address(host, "10.0.0.73).
        there exists client_program such that
           protocol(client_program, "FTP").
        there exists client_program_invocation such that
           program(client_program_invocation, client_program) and
	   ...
        there exists host_table such that

3. Give it a simple, canonical structure.

Transform the description into a vocabulary where the description is a graph of depth 1 with all arcs leading from the described object to names in underlying namespaces.

In Perl syntax we can specify the description:

         { 
           host_name=>"SRI-NIC",
	   protocol=>"FTP",
	   user=>"ANONYMOUS",
	   password=>"GUEST",
	   operation=>"get",
	   operation_parameter=>"HOSTS.TXT",
         }

In SQL syntax we can even make it look like a query for the object:

       SELECT data
       FROM giant_table_of_all_data_on_the_internet
       WHERE
           host_name="SRI-NIC" AND
	   protocol="FTP" AND
	   user="ANONYMOUS" AND
	   password="GUEST" AND
	   operation="get" AND
	   operation_parameter="HOSTS.TXT";

4. Convert to tuple form.

Requires standardizing the order of possible property names and sometimes defining a null name. Let's standardize on the order (operation, protocol, user, password, host_name, operation_parameter), and define the word "null", without quotes, to signify an omitted relationship.

        ("get", "FTP", "ANONYMOUS", "GUEST", "SRI-NIC", "HOSTS.TXT")

5. Clean up the element names.

Tuple with element names translated (by a reversable transformation) into names with restricted character use. So we can get rid of the quotes, etc. Let's replace characters that are not alphabetic or "-" or "." with "%" followed by their hex value in ascii. Also maybe normallize the forms of names (eg downcasing).

       (get, ftp, anonymous, guest, sri-nic.arpa, %3cNETINFO%3eHOSTS.TXT)

6. Concatenate with delimeters --> Opaque Identifiers

Concatenate them with some delimiters, possibly chosen for easy parsing by humans or machines, or historical precident, or whatever.

        get ftp://anonymous:guest@sri-nic.arpa/%3cNETINFO%3eHOSTS.TXT

The "get" stands out because in the design of URI, the operation was left as a kind of passthru. If you want the fetch that data at a URI, you do an FTP get. If you want to store data at a URI, you do an FTP put.


Sandro Hawke
$Date: 2001/03/23 19:17:21 $