Re: Datatyping, reification, syntactic tidyness from Patrick Stickler on 2002-09-11 (w3c-rdfcore-wg@w3.org from September 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Wed, 11 Sep 2002 14:57:54 +0300
To: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Message-ID: <001c01c2598a$763503e0$864416ac@NOE.Nokia.com>
Jeremy,

I think you've touched on some very important points, though
it appears that we are in fact not in agreement on how they
should be addressed.

That's a pity, as I thought we were both in favor of 
syntactically untidy and explicitly named inline literals.

(I'm secretly hoping you're playing Devil's advocate here ;-)

Comments follow...

----- Original Message ----- 
From: "ext Jeremy Carroll" <jjc@hplb.hpl.hp.com>
To: <w3c-rdfcore-wg@w3.org>
Sent: 11 September, 2002 12:26
Subject: Datatyping, reification, syntactic tidyness


> 
> 
> Proposal:
> 
> The RDF specification explicits says that implementations of the RDF graph
> may represent literal nodes with the same label as a single node or as
> multiple nodes; and that nothing in the specs allow these different
> implementations to be distinguished. Hence, an operation like:
> 
>   RDFGraph.countLiteralNodes()
> 
> cannot be defined in a way that conforms with our recommendation.

Well, that depends. If it is counting nodes specific to the
internal, application-specific representation, then no, it can't,
but if it is meant to reflect the number of nodes as defined for
the abstract syntax, then it should.

I.e. it has to be clear whether the above function reflects the
implementation graph or the abstract graph (and I can think of
lots of utility for the latter, such as an implementation-neutral
query API, etc.).

> ========================
> 
> 
> Consider
> 
> <rdf:Description rdf:bagID="Reify">
>   <eg:p1 rdf:datatype="&xsd;int">10</eg:p1>
>   <eg:p2 rdf:datatype="&xsd;int">10</eg:p2>
>   <eg:p3 >10</eg:p1>
>   <eg:p4 >10</eg:p2>
> </rdf:Description>
> 
> This creates a graph with:
> 
>  four initial triples
>  sixteen triples reifying those four triples
>  five triples forming the bag

Do you mean according to the abstract syntax? Or some
hypothetical implementation? Or perhaps ARP?

I'm presuming you mean the abstract graph here (but then,
this thread is specifically about how many literal nodes
there are in the abstract graph so...)

> This message is about:
> - how many Literal nodes are there?

3 

> - do we care?

In the abstract graph? Absolutely.

In some application's internal structures? Not at all.

> My preference is to be able to systematically say we do not care.

If we are to have generic, portable APIs which allow disparate
RDF applications to interact consistently on the same knowledge
base, I would argue that we should care a whole lot precisely
how many nodes are in the abstract syntax.

As for the application syntax, we should explicitly not care
nor ever impose any requirements on internal representations.

> There are at least two literal nodes, one labelled with an int 10, the other
> labelled with a RDF String Literal "10". Since these labels are different
> the nodes must be different.
>
> Of the twentyfive triples in the graph eight have literal objects, thus
> there are at most eight literal nodes.
> 
> A syntactically tidy implementation would stop at two nodes.
> 
> A thorough untidy one would have eight nodes.
> Some would argue that the object of the rdf:object triple in the reification
> is the same node as the object of the original triple. Thus an
> implementation following this rationale would get four literals.

I would suggest that in the abstract syntax (leaving semantics
aside) there would be exactly three literal nodes. One node denoting
the explicitly typed literal (xsd:integer, "10") and two nodes denoting
the non-explicitly-typed literals, e.g. (_:x, "10") and (_:y, "10").

> Of course, sensible implementations could choose to treat datatyped literals
> tidyly and RDF String Literals untidyly (or vice versa) which suggests that
> maybe six is also a plausible number of literals.

Sensible implementations will be employing numerous mechanisms to
maximize storage and processing efficiency. That is not our concern.

> If in fact, our normative  serialization  of the graph does not allow us to
> distinguish these cases then we do not need to, and in fact, SHOULD NOT say
> either way.

I would expect that N-Triples would explicitly and accurately reflect 
the abstract syntax, and that RDF/XML would implicitly yet accurately
reflect the abstract syntax. Thus both normative serializations
would say precisely how many literal nodes are in the abstract graph.

Whether that abstract syntax is used literally as the basis for some
implementation is not our concern -- though one would expect and
hope that generic APIs would reflect the abstract syntax, hiding
all implementation-specific deviations from users.

> The model theory needs to reflect this inability to represent the two
> different cases and not depend on some hidden node identity that we cannot
> serialize (this only rules out certain types of untidiness in the model
> theory).

Or rather, the MT needs to reflect that all literal nodes have either
a URIref or systemID prefix, and given that, they are all syntactically
tidy. Explicitly typed literal nodes with URIref prefix are also
semantically tidy and denote datatype values. It remains to be seen
whether we say anything more about systemID prefixed literal nodes,
as to whether they are semantically tidy (by string equality of
the string literal) or untidy, with the systemID implicitly denoting
a datatype.

As for serializing the "hidden" node identity, I would suggest that
the attribute rdf:nodeID is precisely the correct means to do so.
See below...

> In fact, we should explicitly say that we are not saying, and that this is
> deliberately underspecified, since nothing depends on it.
> 
> I believe that these two RDF/XML documents are entirely equivalent:
> 
> <rdf:Description rdf:bagID="Reify">
>   <eg:p1 rdf:datatype="&xsd;int">10</eg:p1>
> </rdf:Description>
> 
> 
> <rdf:Description rdf:nodeID="subj">
>   <eg:p1 rdf:datatype="&xsd;int">10</eg:p1>
> </rdf:Description>
> <rdf:Bag rdf:ID="Reify">
>   <rdf:li>
>    <rdf:Statement>
>       <rdf:subject rdf:nodeID="subj"/>
>       <rdf:predicate rdf:resource="&eg;p1/>
>       <rdf:object rdf:datatype="&xsd;int">10</rdf:object>
>     </rdf:Statement>
>   </rdf:li>
> </rdf:Bag>

I agree. However, I do not consider the following two
RDF/XML documents as equivalent (syntactically at least):

 <rdf:Description rdf:bagID="Reify">
   <eg:p1>10</eg:p1>
 </rdf:Description>
 
 <rdf:Description rdf:nodeID="subj">
   <eg:p1>10</eg:p1>
 </rdf:Description>
 <rdf:Bag rdf:ID="Reify">
   <rdf:li>
    <rdf:Statement>
       <rdf:subject rdf:nodeID="subj"/>
       <rdf:predicate rdf:resource="&eg;p1/>
       <rdf:object>10</rdf:object>
     </rdf:Statement>
   </rdf:li>
 </rdf:Bag>


Though, the following would IMO be equivalent (if made legal):

 <rdf:Description rdf:bagID="Reify">
   <eg:p1>10</eg:p1>
 </rdf:Description>
 
<rdf:Description rdf:nodeID="subj">
   <eg:p1 nodeID="x">10</eg:p1>
 </rdf:Description>
 <rdf:Bag rdf:ID="Reify">
   <rdf:li>
    <rdf:Statement>
       <rdf:subject rdf:nodeID="subj"/>
       <rdf:predicate rdf:resource="&eg;p1/>
       <rdf:object nodeID="x">10</rdf:object>
     </rdf:Statement>
   </rdf:li>
 </rdf:Bag>

(in the case of the bagID refication, it's up to the parser
 to use the same systemID for the literal in both the
 eg:p1 statement and the rdf:object statement)

> And I buy Guha's point at the Bristol F2F that with untidy literal semantics
> rdf:object refers to the syntax of the triple not its semantics.

Well, I thought that was the official view. After all, a "stating"
is about the expression of the statements, not the meaning,
right? And an expression is captured in the syntax, not the MT.

Cheers,

Patrick
Received on Wednesday, 11 September 2002 07:57:56 UTC