Identifying Things on the Semantic Web

Status

The Problem

How do you Identify things on the Semantic Web?

How do you say something about something? You need to identify the things. Maybe a person, a character string, a web page, the mathematical concept of a number being prime, ... I can write about those things and you generally understand me, but it's complicated.

From informal first principles, here are our options.

The Solutions

Naming vs. Describing

If you want to talk about Sandro Hawke, you can either use some name for him ("Sandro Hawke") or some description ("The person who started working at W3C on 2000-12-15").

To be more precise, when you name things (for our purposes) you are defining a mapping from some character strings to the things those character strings are said to name.

Alternatively, you can talk about things without naming them, using an approach like first-order logic. Instead of saying "Sandro has long hair" (naming Sandro), you can say "There exists X such that X started working at W3C on 2000-12-15 and X has long hair." The second form communicates exactly the same information. In fact, you can see naming as a special case of description: "There exists X such that X is named 'Sandro' and X has long hair."

Descriptive identification has a bootstrapping problem. The system must have built in names for enough conceptual objects to start identifying the properties that will be used to identify things.

One Mapping vs. Many

Should we have exactly one mapping from character strings to objects, or is it fair game to have many of them?

If we only have one, we have to all agree on its definition. On the other hand, if we have many of them, we have to agree on a way to name those mappings, so we can indicate which we want to use. And the names for the mappings will need to be stored in some agreed-upon central mapping.

In short, we must have one initial standard mapping, and can branch out from there.

The One Mapping

Some desirable qualities of the initial semantic mapping (ISM) are:

recursive delegation, so a central authority can delegate partitions of the space of all possible names to another authority, which can, in turn, partition and delegate its space, and so on.
one or more parallel mappings, in which names in the ISM are mapped to documentation about the name, the naming event, and the object which is identified by the name. An English dictionary maps words not to their meanings, but to text which describes their meaning.
names should be mnemonic and easy to type and say
names should be free of legal pitfalls, such as trademarks. (this generally conflicts with the previous desirable quality.)

Leveraging

It is possible (trivial, even) to create one mapping from another by adding a level of indirection. The DNS doesn't provide one mapping -- it provides many mappings, named by the record type (A, MX, CNAME, TXT, etc). So there is the DNS A map, the DNS MX map, the DNS TXT map, etc, plus another kind of mappings, like the one from DNS names to their owner.

This kind of leveraging of one mapping from another gives us an interesting new set of options: instead of using an existing mapping (eg URIs as defined by IETF standards), we can just run a parallel map with the same syntax and different (but probably related) semantics.

In fact, this is how XML Namespaces work. The name "http://foo.com" means one thing in the IETF URI name mapping and something totally different in the XMLNS URI name mapping. The relationship between the maps is that the authority responsible for a specific URI in the IETF interpretation is allowed to use that URI as a unique identifier to avoid uninentionally conflicts in the XMLNS interpretation.

Tuple Mappings

You can generalize leveraging into tuple mappings, where a tuple of character strings maps to objects. Thus we can say:

   < ietf, http://foo > maps to X, and
   < xmlns, http://foo > maps to Y

And you can define a mapping from tuples of character strings to character strings, so that name tuple mappings are just orginary name mappings again. (An example combination function is to preceed each \ or , with a \, then concatenate the elements of the tuple delimited by ,. The process is even easier if the names have restricted syntax.)

URIs are, of course, a combined name tuple mapping of < scheme, dns name, etc>.

Some Options

   < semweb, http://foo >  ==?==  < ietf, http://foo >

   < date, email/dns, string >

URI
URI-Reference
{URI-Reference}xmlname
dns
email

Description: does pretty much all of this for us. we'd like: y-m-d GMT string -> concept time.gregorianYearUTC=number time.gregorianMonthOrdinalUTC= time.gregorianDayOrdinalUTC= (ISOxxxxx) ? a way to name ISO standards and/or RFCs much better grounding than http uris. bootstrap: the thing described by this english text ** englishDenotation x.AmericanDenotation2001="The human being named Sandro Daniel Hawke with the e-mail address sandro@w3.org as of March 16, 2001" "X, such that: X is a human being; as of March 16, 2001, X is known as Sandro Hawke, X is the main subject of the web page at http://www.w3.org/People/Sandro" You laugh, but that's a good bootstrapping!!!! Well, the danger is people will just use "Sandro" People should probably throw in a UUID. "The person who sent mid:sdgtegdfgdfd" 1. By Description 2. Have a standard many-to-one mapping from character strings to objects, with a central authority describing and/or delegating parts of the mapping, more and more over time. Mapping can change over time. URI (RFC-foo) works this way but generally the objects defined are very complex -- not documents, but partially reproducable browser experiences with having properties which depend on time, source address, browser identification string, cookies, passwords, and content preferences. 3. Have multiple many-to-one mappings which you select between based on the situation. Bootstrapping Descriptions needs only: literal strings and AmericanDenotation2000 (2000-EN-US) Denotation_2000_EN_US QualifyingDescription_2000_EN_US "uniqdesc" then everything is an , a data:literal, or uniqdesc. ! which is more clumsy. But write it up. so < semweb, http://foo > != < ietf, http://foo > which Tim thinks is evil. so opaquelocktokens! Two Papers: Global Names (much of the above) work in IETF URI map or not? a name map is as ... dictionary? and Living Without Global Names (exivars only) (except for strings and uniqdesc, and what we bootstrap from them) AHHHHHHHH. Here it is: Semantic Web (WSL) Identifiers: URI's for things that have IETF Standard URIs URI-References if that does it for existential variable {2001|sandro@roads.org|blah} or some such.... still no whitespace ((2001 01 01) (sandro@roads.org) ....) or local existential variable (sandro@roads.org/2000) global existential variable *** The WSL format is "^\S+ \S+ \S+$" where word is URI | URI-Reference | "<" xmlname ( "," tann-stuff ) ">" http://foo http://foo#bar data:,literals "....." mbox mailto:eric@w3.org equiv http://www.w3.org/People/Eric debugNotesAt http://www.w3.org/People/Eric#me debugNotesAt http://www.w3.org/People/Sandro#me ^http://foo http://foo http://foo mine is better because you can say what the relationship is to the web page. http://foo believed true **** {sandro@roads|2001}foo {http://foons}name

Sandro Hawke
$Date: 2001/03/23 19:17:21 $