Re: RDF datatyping

On 2002-01-10 23:57, "ext Graham Klyne" <GK@NineByNine.org> wrote:

> Patrick,
> 
> I'm going to focus on your examples, because they stand the best chance
> of 
> bringing us to some kind of common understanding...
> 
> NOTE:  for those who are not following this exchange in detail, this
> message suggests a possible formal interpretation for rdf:value.
> 
> 
> At 05:56 PM 1/10/02 +0200, Patrick Stickler wrote:
>> If you mean allowing literal nodes to be subjects, per P+, then I
>> fully agree. But certainly the D idiom is far easier to contract
>> to an eventual P+ representation when that time comes. I.e. going
>> to P+
>> 
>>    Bob ex:age _:1:"30" .
>>    _:1 rdf:type xsd:integer .
>> 
>> from D
>> 
>>    Bob ex:age _:1 .
>>    _:1 rdf:value "30" .    (simnply move to anon-node label)
>>    _:1 rdf:type xsd:integer .
>> 
>> is far more straightforward than from S
> 
> Well, the first of those could be pure S (if literal subjects are
> allowed):

No. It's P+. That idiom is already defined. Feel free to call it
S idiom C or such, but its still P+.

>  xsd:integer would be understood to be a class containing the lexical
> space of integers.

Great. Now we have both xsd:integer and xsd:integer.lex as synonyms,
but then what identifies the entire data type?!

xsd:integer identifies the data type, not a component of the data type.

> And the second of those is entirely consistent with S if we also allow a
> very simple definition of rdf:value to be understood;  i.e. that it is
> the 
> identity predicate.  Thus:
> 
>  FORALL v: <v,v> in IEXT(I(rdf:value))

But the pairing-based model, along with the P and D idioms require
no such (re)definition of rdf:value; and do the job just fine. Again,
yet more machinery...

> (The rdf:value relational extension could be limited to literal strings,
> but why bother?)

My question is why bother with any rdf:value relational extension? It's
only needed for S...

> 
>>    Bob ex:age _:1 .
>>    _:1 xsd:integer.map "30" .
>>    xsd:integer.map rdfs:range xsd:integer.lex .
>>    xsd:integer.map rdfs:domain xsd:integer.val .
> 
> Well, I think that for "ordinary" use, it wouldn't be necessary to
> include the
> rdfs:range and rdfs:domain statements;  i.e.  just say:
> 
>   Bob ex:age _:1 .
>   _:1 xsd:integer.map "30" .
> 
> Since, at some level, the meaning of xsd:integer.map must be understood,
> ex-RDF, by communicating applications.  (Just as the meaning of
> xsd:integer 
> must be understood in your example.)

But xsd:integer being "understood" by some application does not require
parsing of the URIref, but only using it as an opaque, unique identifier.
The S proposal requires (in the absence of a schema) the parsing of the
URIref to determine not only the data type but the data type component.

More work...

> The purpose of including range/domain statements in the examples I gave
> previously was to make the equivalence between the different idioms
> explicit in the RDF.  For example, this:
> 
>   Bob ex:age _:1 .
>   _:1 xsd:integer.map "30" .
>   xsd:integer.map rdfs:domain xsd:integer.val .
> 
> RDFS-entails this:
> 
>   _:1 rdf:type xsd:integer.val .
> 
> without needing to refer to any special understanding of xsd:integer.map
> or 
> xsd:integer.val.

Similar can be said for

   Bob ex:age _:1 .
   _:1 rdf:type xsd:integer .

where it is understood that the subject of a statement with
the rdf:type predicate and a data type object denotes the
value in question. 

This works for all P, P+, D, and U idioms
based on the (lexical_form, data_type) pairing model:

--

P:

   Bob ex:age "30" .
   ex:age rdfs:range xsd:integer .

infers

   "30" rdf:type xsd:integer .

where the literal node denotes the value and whos label
is also the lexical form.

--

P+:

   Bob ex:age _:1:"30" .
   _:1 rdf:type xsd:integer .

where, as with P, the literal node denotes the value and
whos label is the lexical form.

--

D:

   Bob ex:age _:1 .
   _:1 rdf:type xsd:integer .
   _:1 rdf:value "30" .

where the anonymous node denotes the value and the lexical
form is represented by rdf:value property.

--

U:

   Bob ex:age <xsd:integer:30> .

infers

   <xsd:integer:30> rdf:type xsd:integer .

where the URV labeled node denotes the value and contains
the lexical form encoded in the URI structure.

--

In all cases, the subject of rdf:type for a data type denotes
the value.
   
Simple. Consistent. Clear.

And, by the way, identitical to the semantics for *all*
resources having an rdf:type property -- where the subject
of an rdf:type statement is the instance of the class
specified.

>> and presumably
>> 
>>    xsd:integer.lex rdf:type s:LexicalSpace .
>>    xsd:integer.val rdf:type s:ValueSpace .
>>    xsd:integer.map rdf:type s:Mapping .
>>    xsd:integer.cmap rdf:type s:CanonicalMapping .
>>    xsd:integer s:lexicalSpace xsd:integer.lex .
>>    xsd:integer s:valueSpace xsd:integer.val .
>>    xsd:integer s:mapping xsd:integer.map .
>>    xsd:integer s:canonicalMapping xsd:integer.cmap .
>> 
>> and we still haven't actually said anything about
>> the data type xsd:integer itself, but have to "know"
>> about how to parse the special names and remove the
>> '.lex', '.val', and '.map' suffixes.
> 
> Whoa!!  Who said anything about having to *parse* the special names?

I did ;-)

Either you have to define the relations between the data type
and its components in some ontology, or you have to both (a) use
the suffixation proposed consistently and (b) parse URIrefs to
extract the data type URIref (prefix) in question.

> I
> see 
> them as just a convention for introducing datatype names so that the
> discussion is easier to follow.  *Any* name could be used, as long as a
> common meaning is understood by applications that exchange information.

Again, more machinery and requirements for deployment.


>> And again, with S we get into the fun stuff with the
>> ability to say
>> 
>>    Bob ex:age "30" .
>>    ex:age rdfs:subPropertyOf xsd:integer.map .
>> 
>> which given the above range and domain statements
>> thereby declares Bob to be an instance of
>> xsd:integer.val, etc. etc.
> 
> Yes, and I don't believe that any useful solution to the data typing
> issue 
> can prevent people from making statements with silly consequences - i.e.
> that don't correspond to our understanding of reality.

But some solutions discourage sillyness (not that I'm into standards
that "mother" the users).

The use of data type specific properties to type literals *suggests*
the ability to subclass them by other non-data type properties as
a logical and reasonable convenience. The side effects are IMO far
from obvious, and folks will be falling into that whole right and left.

> Anyway, my point of focusing on your examples was to illustrate that, in
> terms of RDF usage, the various idioms discussed are not so dissimilar
> in 
> what they express.

To the extent that the S idioms could be interpreted as defining pairings
of lexical forms with data types (albeit with alot of indirection and
extra machinery), sure, they all seem to be compatible.

The key difference is that the S proposal mandates all that extra machinery
in the representation at the expense of the users -- where most of it only
belongs in the model theory.

And, of course, because of the unexpected (and still even unknown)
interactions with the semantics of property constraints (range/domain)
the two S idioms cannot coexist and also raise usability issues regarding
subclassing of properties, etc.

>  Further, the choice of S as a basis does not
> preclude 
> the idiom you appear to prefer.

It is essential that there be as few standardized idioms as possible,
ideally only two: one for local typing and one for global typing, and
that they work together in the same knowledge base elegantly. Otherwise,
we loose a great deal of portability of knowledge and complexity of
RDF applications increases by having to support a plethora of idioms.

P and D, in that regard, are far better choices IMO then S/A and S/B.

The conceptual discussion of data type components and the relations between
components is essential, but not in the explicit graph representation, as
S demands.

> (I think much of the complexity you see
> in 
> S is a misunderstanding of its basic approach,

Nope. Sorry. I understand what S attempts to accomplish, I just
disagree about the level of resolution by which data typing
knowledge of literals is defined.

>which is very simply to
> say 
> that the denotation of a literal-labelled node is the literal label
> value, 

If you mean by 'literal label value' the member of the value space
denoted by the lexical form (literal) then this is false, as it is
the anonymous node (A) that denotes the value. The literal labeled
node only denotes the lexical form.

If the latter is what you mean, that 'literal label value' equates
to lexical form, then fine, but the problem is that in the case
of global typing, the literal labelled node denotes *both* the lexical
form and the value, and S does not seem to be able to handle that
without treating rdfs:range differently for each idiom A and B.

P/U has no such problem. The literal itself represents the lexical form
and the subject of the rdf:range property (explicit or implicit) denotes
the value. Simple, and no conflict between idioms.

> ...  The proposals differ more fundamentally in the
> model-theoretic treatment of the denotation of literal-labelled nodes,
> and 
> here S has a clear virtue of simplicity.

I disagree. S misses the whole elegance of the pairing model by
requiring the explicit distinction between data type components,
and thereby is more complex than the pairing model.

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Friday, 11 January 2002 03:39:40 UTC