Identifying Things on the Semantic Web
Status
draft. See related writings.
The Problem
How do you Identify things on the Semantic Web?
How do you say something about something? You need to identify the
things. Maybe a person, a character string, a web page, the
mathematical concept of a number being prime, ... I can write about
those things and you generally understand me, but it's complicated.
From informal first principles, here are our options.
The Solutions
Naming vs. Describing
If you want to talk about Sandro Hawke, you can either use some name
for him ("Sandro Hawke") or some description ("The person who started
working at W3C on 2000-12-15").
To be more precise, when you name things (for our purposes)
you are defining a mapping from some character strings to the
things those character strings are said to name.
Alternatively, you can talk about things without naming them, using
an approach like first-order logic. Instead of saying "Sandro has long
hair" (naming Sandro), you can say "There exists X such that X
started working at W3C on 2000-12-15 and X has long hair." The second
form communicates exactly the same information. In fact, you can
see naming as a special case of description: "There exists X such that
X is named 'Sandro' and X has long hair."
Descriptive identification has a bootstrapping problem. The system
must have built in names for enough conceptual objects to start
identifying the properties that will be used to identify things.
One Mapping vs. Many
Should we have exactly one mapping from character strings to
objects, or is it fair game to have many of them?
If we only have one, we have to all agree on its definition. On
the other hand, if we have many of them, we have to agree on a way to
name those mappings, so we can indicate which we want to use. And the
names for the mappings will need to be stored in some agreed-upon
central mapping.
In short, we must have one initial standard mapping, and can branch
out from there.
The One Mapping
Some desirable qualities of the initial semantic mapping (ISM) are:
- recursive delegation, so a central authority can delegate
partitions of the space of all possible names to another authority,
which can, in turn, partition and delegate its space, and so on.
- one or more parallel mappings, in which names in the ISM are
mapped to documentation about the name, the naming event, and the
object which is identified by the name. An English dictionary maps
words not to their meanings, but to text which describes their
meaning.
- names should be mnemonic and easy to type and say
- names should be free of legal pitfalls, such as trademarks. (this
generally conflicts with the previous desirable quality.)
Leveraging
It is possible (trivial, even) to create one mapping from another
by adding a level of indirection. The DNS doesn't provide one mapping
-- it provides many mappings, named by the record type (A, MX, CNAME,
TXT, etc). So there is the DNS A map, the DNS MX map, the DNS TXT
map, etc, plus another kind of mappings, like the one from DNS names
to their owner.
This kind of leveraging of one mapping from another gives us an
interesting new set of options: instead of using an existing mapping
(eg URIs as defined by IETF standards), we can just run a parallel map
with the same syntax and different (but probably related) semantics.
In fact, this is how XML Namespaces work. The name
"http://foo.com" means one thing in the IETF URI name mapping and
something totally different in the XMLNS URI name mapping. The
relationship between the maps is that the authority responsible for a
specific URI in the IETF interpretation is allowed to use that URI as
a unique identifier to avoid uninentionally conflicts in the XMLNS
interpretation.
Tuple Mappings
You can generalize leveraging into tuple mappings, where a tuple of
character strings maps to objects. Thus we can say:
< ietf, http://foo > maps to X, and
< xmlns, http://foo > maps to Y
And you can define a mapping from
tuples of character strings to character strings, so that name tuple mappings
are just orginary name mappings again. (An
example combination function is to preceed each \ or , with a \, then
concatenate the elements of the tuple delimited by ,. The process is
even easier if the names have restricted syntax.)
URIs are, of course, a combined name tuple mapping of < scheme, dns
name, etc>.
Some Options
< semweb, http://foo > ==?== < ietf, http://foo >
< date, email/dns, string >
- URI
- URI-Reference
- {URI-Reference}xmlname
- dns
- email
Description:
does pretty much all of this for us.
we'd like:
y-m-d GMT string -> concept
time.gregorianYearUTC=number
time.gregorianMonthOrdinalUTC=
time.gregorianDayOrdinalUTC=
(ISOxxxxx) ?
a way to name ISO standards and/or RFCs
much better grounding than http uris.
bootstrap:
the thing described by this english text
** englishDenotation
x.AmericanDenotation2001="The human being named Sandro Daniel
Hawke with the e-mail address sandro@w3.org as of March 16,
2001"
"X, such that: X is a human being; as of March 16, 2001, X is
known as Sandro Hawke, X is the main subject of the web page
at http://www.w3.org/People/Sandro"
You laugh, but that's a good bootstrapping!!!!
Well, the danger is people will just use "Sandro"
People should probably throw in a UUID.
"The person who sent mid:sdgtegdfgdfd"
1. By Description
2. Have a standard many-to-one mapping from character strings to
objects, with a central authority describing and/or delegating
parts of the mapping, more and more over time. Mapping can change
over time. URI (RFC-foo) works this way but generally the objects
defined are very complex -- not documents, but partially
reproducable browser experiences with having properties which
depend on time, source address, browser identification string,
cookies, passwords, and content preferences.
3. Have multiple many-to-one mappings which you select between based
on the situation.
Bootstrapping Descriptions needs only:
literal strings and AmericanDenotation2000 (2000-EN-US)
Denotation_2000_EN_US
QualifyingDescription_2000_EN_US
"uniqdesc"
then everything is an , a data:literal, or uniqdesc.
!
which is more clumsy. But write it up.
so < semweb, http://foo > != < ietf, http://foo >
which Tim thinks is evil.
so opaquelocktokens!
Two Papers:
Global Names
(much of the above)
work in IETF URI map or not?
a name map is as ... dictionary?
and
Living Without Global Names (exivars only)
(except for strings and uniqdesc, and what
we bootstrap from them)
AHHHHHHHH. Here it is:
Semantic Web (WSL) Identifiers:
URI's for things that have IETF Standard URIs
URI-References if that does it
for existential variable
{2001|sandro@roads.org|blah} or some such.... still no whitespace
((2001 01 01) (sandro@roads.org) ....)
or local existential variable
(sandro@roads.org/2000) global existential variable
***
The WSL format is
"^\S+ \S+ \S+$"
where word is
URI | URI-Reference | "<" xmlname ( "," tann-stuff ) ">"
http://foo
http://foo#bar
data:,literals "....."
mbox mailto:eric@w3.org
equiv http://www.w3.org/People/Eric
debugNotesAt http://www.w3.org/People/Eric#me
debugNotesAt http://www.w3.org/People/Sandro#me
^http://foo
http://foo
http://foo
mine is better because you can say what
the relationship is to the web page.
http://foo believed true ****
{sandro@roads|2001}foo
{http://foons}name
Sandro Hawke
$Date: 2001/03/23 19:17:21 $