Re: suggestions for datatyping (long) from Sergey Melnik on 2001-10-25 (w3c-rdfcore-wg@w3.org from October 2001)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Thu, 25 Oct 2001 10:19:36 -0700
To: Pat Hayes <phayes@ai.uwf.edu>
CC: w3c-rdfcore-wg@w3.org
Message-ID: <3BD849A8.D30BF25C@db.stanford.edu>
Pat Hayes wrote:
> 
> >
> >b) typing information can either be represented in an instance graph
> >only,
> >    in a schema graph only, or both.
> 
> Can you clarify this distinction? I wasn't aware that we had such a
> distinction in RDF (?)

Recall the example from
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0343.html:

              (John_Smith weight "160 1/8") goes together with a rule
like
               'if X is a person living in the US and (X weight Y),
                then Y is a "pieces-of-eight" number that gives weight
in pounds'

In this example, the information that "160 1/8" maps to an integer by
means of some pieces-of-eight encoding is contained in a schema
entirely. Part of this information can in principle appear in the
instance graph.

> 
> >The suggestion that I'd like you to think about concerns the dimension
> >(a).
> >
> ><SUG2>: to focus on representing typing info in the triple structure
> >         and keep literals atomic.
> 
> I'm not quite clear exactly what is meant by 'in the triple
> structure'. (It makes sense that one should be able to tell from the
> triples that a particular literal token is supposed to be, say, an
> XSD: integer. I don't think it makes sense to require that the entire
> content of what that means, ie the entire XSD spec, should be
> represented or encoded in the triples.) If you mean the first, OK.

Thanks for the clarification, I do mean the first.

> >
> >3 DATATYPES AND DATATYPING
> >--------------------------
> >
> >3.1 Value spaces and lexical spaces in [XSD]
> >--------------------------------------------
> >
> >A nice conceptual intro to the datatyping issue is provided in the
> >[XSD] document. According to [XSD], each datatype is characterized by
> >a *value space* and a *lexical space*. For example, take the type
> >"decimal". Its value space are all arbitrary precision decimal
> >numbers, whereas its lexical space includes all character strings that
> >match a certain pattern. In [XSD], a datatype definition specifies a
> >mapping between the value space of the datatype and its lexical
> >space. Notice that in general more than one lexical token may map to
> >the same data value.
> >
> >My working understanding of the [XSD] document in terms of the current
> >model theory draft is that the elements of a lexical space are literal
> >values.
> 
> That is not mine. I would characterize literals as the lexical space
> and literal values as the value space. That is the working assumption
> behind the pfps/ph datatyping extension to the MT.

Well, this is exactly what I don't like at all about it. By making
literals Heroes with a Thousand Faces we make the life of a Desperate
Perl Hacker (of the Life of Brian ;) who tries to model some domain
quite tough. I think we can avoid this additional complexity.

> >3.2 Datatypes as classifiers in [UML,CWM]
> >-----------------------------------------
> >
> >[UML] and [CWM] treat datatypes as some kind of classes (or
> >classifiers in UML terminology). In other words, datatypes have
> >"instances" which are called *data values*. Here are the relevant
> >quotes:
> >
> >[UML], Sec 2.5.2.14: "Datatype"
> >
> >     A data type is a type whose values have no identity; that is, they
> >     are pure values. Data types include primitive built-in types (such
> >     as integer and string) as well as definable enumeration types
> >     (such as the predefined enumeration type boolean whose literals
> >     are false and true).
> >
> >[CWM], Sec 7.6.1.1: "DataValue"
> >
> >     A data value is an instance with no identity. In the metamodel,
> >     DataValue is a child of Instance that cannot change its state,
> >     i.e. all operations that are applicable to it are pure functions
> >     or queries that do not cause any side effects. DataValues are
> >     typically used as attribute values.  Since it is not possible to
> >     differentiate between two data values that appear to be the same,
> >     it becomes more of a philosophical issue whether there are several
> >     data values representing the same value or just one for each
> >     value. In addition, a data value cannot change its data type and
> >     it does not have contained instances.
> >
> >[UML], Sec 2.5.2.34: "Primitive"
> >
> >     A Primitive defines a predefined DataType, without any relevant
> >     UML substructure; that is, it has no UML parts. A primitive
> >     datatype may have an algebra and operations defined outside of UML
> >     (for example, mathematically). Primitive datatypes used in UML
> >     itself include Integer, UnlimitedInteger, and String.  The
> >     run-time instances of a Primitive datatype are DataValues. The
> >     values are in many-to- one correspondence to mathemetical elements
> >     defined outside of UML (for example, the various integers).
> >
> >[UML], Sec 2.5.4.10: "Miscellaneous"
> >
> >     ... A data type is a special kind of classifier, similar to a class,
> >     but whose instances are primitive values (not objects). For
> >     example, the integers and strings are usually treated as primitive
> >     values. A primitive value does not have an identity , so two
> >     occurrences of the same value cannot be differentiated. Usually,
> >     it is used for specification of the type of an attribute. An
> >     enumeration type is a user-definable type comprising a finite
> >     number of values. ...
> >
> >Translated into RDF terms, a data value corresponds to a bNode in a
> >graph.
> 
> I disagree. That begs several important questions, but in any case a
> bNode can denote any kind of value. Why would we want to say that
> bNodes *are* values?

Ok, more precisely, a data value maps to I(some bNode), nothing said
about the reverse direction.

> >3.2 Datatyping: classes or mappings?
> >------------------------------------
> >
> >As pointed out above, reading [XSD] gives an impression that
> >datatyping is a kind of mapping that establishes a relationship
> >between data values and literal values. In contrast, [UML] talks
> >merely about the value spaces of datatypes and does not say anything
> >about their lexical spaces. As a consequence, [UML] does not establish
> >any mappings between value spaces and lexical spaces of the primitive
> >datatypes. Still, [UML] does define the "features" of value spaces
> >that include ordering, operations etc.
> >
> >To sum up, the specs [XSD,UML,CWM] utilize two abstract concepts:
> >
> >- datatype as a class(ifier)
> >- datatyping as a mapping between a value space and a lexical space
> >
> >My feeling is that both views may be useful for representing typed
> >data (just as wave-particle dualism is helpful for explaining
> >different phenomena in physics ;). On the one hand, if data values do
> >not have fixed URI identifiers, we need a *mapping* that allows us to
> >identify resources as data values using their lexical representations.
> >On the other hand, for defining and resticting datatypes, the class
> >view is superior (although it looks like the class view is in
> >principle dispensable).
> 
> I think we can have both. We have a class/property distinction at the
> basis of RDFS, and it seems natural to map this entire discussion
> into that vocabulary. Data type mappings are rather like (the
> extensions of) properties assigning data values to lexical strings,
> and the ranges of these properties are the classifiers whose class
> extensions are the sets of data values themselves

This is exactly what I think, too. I reckon it's worth it to demonstrate
explicitly how the class/property distinction applies to datatyping just
to clarify things.

> >3.3 Literal properties as "datatyping mappings"?
> >------------------------------------------------
> >
> >One final point that I'd like to make before turning to examples is
> >that properties with literal values possess a high resemblance to
> >datatyping mappings.
> 
> Right, exactly. They differ only in their special relationship to the
> RDF syntax.
> 
> >Assume that the interpretation of each literal
> >symbol is fixed and is determined by its textual contents.
> 
> No, do not make that assumption! That begs the central question. That
> is the entire point of datatyping, that this assumption breaks down
> for literals, so datatyping is required.

Comment below.

> >Then, since
> >each literal symbol denotes just a lexical token,
> 
> Why does it *denote* a lexical token? It *is* a lexical item.

Sorry, I missed another required clarification. In the posting, I used
"lexical token" as a synonym for "literal value". I see that this is
misleading. Please do find-and-replace in the text ;)

> >it presumably does
> >not make sense to use it as object for properties like "age", "size",
> >"price", "weight", etc. In fact, such use would suggest that e.g. the
> >weight of a thing is a lexical token; typically, we'd like it to
> >denote some abstract entity that corresponds to say 5 pounds.
> 
> No, no. If I USE a literal as a value, I am not MENTIONING a lexical
> token; I am using the literal to indicate a literal value. So for
> example by writing
> 
> phayes weightAtAge50inPounds "165" .
> 
> I am saying that my weight was 165 pounds, not that it was a lexical item.

To reiterate my point, with substantial mental effort we (actually, you
and Peter P.-S.) can make the above statement work, i.e. to have some
meaningful interpretation. My point is that *clarity* is what matters
first for the SW to take off. Recall the recent suggestion by Peter to
give each and every XML document some meaningful semantic
interpretation. This just doesn't work, because developers would
generate a lot of "meaning" which is in fact just jibberish. Same
argument applies to the above. In order to make applications work,
people who encode the data must cooperate. Sorry about getting into
rhetorics.

In the above statement, the property weightAtAge50InPounds fulfills in
fact two purposes at once:

1) it tells us how to interpret the token "165"
2) it establishes some relationship between the interpretation of this
token with phayes.

My suggestion is to separate these two purposes. To be even more
human-friendly, you'd write:

    phayes weightAtAge50inPoundsInDecimalEncodedByISO8601 "165"

But that is far from being machine-friendly.

> >In other words, for most meaningful representations, we can think of a
> >property whose objects are literals as a mapping that associates a
> >value space with some lexical space.
> 
> No, that is what the datatyping mapping does, not the property. It is
> LIKE a property, but it is not itself an RDF property. If we assume
> that, then we are begging the question, since we have simply
> described the datatying in RDF; and then there is no datatying as
> such.

Perfect. So we can sit back and relax. Still, it's like saying that
since ICEXT is an abbreviation that uses the extension of I(rdf:type),
there are no classes and instances in RDFS. IMO, RDF and the MT draft
has already got enough means to introduce other concepts like classes
and datatyping elegantly, without much friction.

> >In yet other words, each
> >literal-valued property may be though of (by convention) as a
> >"datatyping property" (also referred to as "interpretation property"
> >by TimBL).
> >
> >If <SUG2> turns out to be acceptable, the next thing I would suggest
> >to nail down is the nature of literals. A further proposal from my
> >side would therefore be
> >
> ><SUG3>: the interpretation of each literal symbol is fixed
> >         and is determined by its textual contents.
> 
> If we adopt this convention then there is no need to invoke any
> special treatment of datatyping in RDF itself, since all the
> datatyping is purely a lexical matter. (?) Seems to me that this
> trivialises the discussion.

Yes, as does ICEXT. Basically, with <SUG3> we can build datatyping on
top of RDF just by providing some standard interpretation for a bunch of
properties, just as RDFS builds on RDF/MT. But that's great, isn't it?

Sergey
Received on Thursday, 25 October 2001 12:53:15 UTC