LC2 Responses/SPARQL1

From OWL
Jump to: navigation, search

To: public-rdf-dawg@w3.org
CC: public-owl-comments@w3.org
Subject: [LC response] To SPARQL WG

Dear SPARQL WG,

Thank you for your comment
     <http://lists.w3.org/Archives/Public/public-owl-comments/2009May/0009.html>
on the OWL 2 Web Ontology Language last call drafts.

We fully appreciate your comments and concerns. There might, however, be a misunderstanding of certain assumptions. This misunderstanding is in large part due to our poor presentation of the material in the first working draft. Because of that, please allow us to "wipe the slate clean" and start all over.

In short, rdf:text should do nothing more than provide just an alternative way of referring to plain literals. Please note that, already without rdf:text, this situation arises between typed xs:string literals and plain literals without a language tag. Therefore, let us for the moment completely take rdf:text out of the picture and try to focus on the relationship between typed xs:string literals and plain literals without the language tag. Consider the following example:

(1) "Peter Griffin" (2) "Peter Griffin"@en (3) "Peter Griffin"^^xs:string

Here, (1) is a typed literal without a language tag, (2) is a plain literal with a language tag, and (3) is a typed xs:string literal. The relationship between (1) and (2) can be described as follows:

(a) These are two syntactically *distinct* literals. That is, at the syntactic level, (1) is different from (3). (b) In an rdf- or rdfs-interpretation, (1) and (3) can be, but need not be interpreted as the same object. (c) In a D-interpretation, these two literals are always interpreted as the string "Peter Griffin".

SPARQL already handles this case quite well. Semantically, the entailment regime says whether (b) or (c) applies for BGP matching; hence, there should be no confusion there. For the DATATYPE(), LANG(), and STR() functions, we should bare in mind that these operate on the *syntactic* form of the literal, rather then on the semantic data value assigned to the literal. Due to (a), there should be no confusion: the definitions are precise.

Now let us add rdf:text into the mix. We believe that this should be just another datatype just like any other. The only thing that rdf:text does is it provides an additional way for identifying plain literals *in a D-interpretation*. But this should cause no confusion. Let us consider the following example:

(4) "Peter Griffin@"^^rdf:text (5) "Peter Griffin@en"^^rdf:text

Now the addition of this datatype does not change in any essential way the behavior of SPARQL tools. Here is actually what changes:

(d) In a D-interpretation, (4) is mapped to the same value as (1) and (3); furthermore, (5) is mapped to the same value as (2). Be that as it may, this merely affects implementations that use D-interpretations for BGP matching. These implementations will clearly need to know something about rdf:text if they want to use D-interpretations; however, rdf:text is no different from, say, xs:integer. If, however, a SPARQL implementation uses rdf- or rdfs-interpretations for BGP matching, then (4) and (5) should "opaque" to it, just like it would be the case with any other unknown datatype.

Furthermore, (4)--(5) are syntactically (pairwise) different from (1)--(3). Because of that, there should be no confusion with the definitions of DATATYPE(), LANG(), and STR():

(i) DATATYPE("Peter Griffin@"^^rdf:text) == DATATYPE("Peter Griffin@en"^^rdf:text) == rdf:text

(ii) The value of DATATYPE() on (1)--(3) is unchanged to "before rdf:text". (Since rdf:text is no different from any other datatype, we do not see why this should change.)

(iii) LANG("Peter Griffin@"^^rdf:text) == LANG("Peter Griffin@en"^^rdf:text) == LANG("xyz"^^a:foo) = type error. Again, rdf:text is a datatype no different from a:foo; hence, there should not be anything in particular here.

(iv) STR("Peter Griffin@"^^rdf:text) == "Peter Griffin@", STR("Peter Griffin@en"^^rdf:text) == "Peter Griffin@en", STR("xyz"^^a:foo) = "xyz". Again, there is no difference between rdf:text and any other datatype, such as a:foo.


In summary, it seems to us that the following statements are true of rdf:text:

- rdf:text is a datatype like any other (e.g., a:foo or xs:integer). - rdf:text does not change any aspect of RDF or SPARQL. In particular, the plain literals (with or without the language tag) are as they were before and are untouched by the addition of rdf:text. Furthermore, the typed literals of rdf:text are syntactically distinct from the plain literals of RDF. - If you use rdf- or rdfs-interpretations for BGP matching, then rdf:text literals are opaque. - If you use D-interpretations for BGP matching, then you clearly need to know something about the proper semantics of rdf:text literals.


Since rdf:text is no different than any other datatype, we believe actually that we should omit the normalization rules in Section 4. That is, if someone puts an rdf:text literal into an RDF graph, then the graph should simply contain that literal and should not change it during graph exchange. This is similar to someone putting an a:foo literal in the graph: it may well be that this literal is equivalent in a D-interpretation to some other literal, but, since the two literals are not syntactically equivalent, nobody should expect that one gets "spontaneously" changed into the other.


To summarize, we would like to address your comments by performing the following changes to the document:

- We would remove any mention of graph exchange and of normalization of rdf:text literals to plain literals. - We would change the text to make sure that, whenever we talk about equivalence between rdf:text literals and plain literals, we make it clear that we talk about equivalence in a D-interpretation. Thus, there will be no unqualified statements of the form "rdf:text literals are equivalent to plain literals"; instead, we would always talk about D-equivalence. - We would make it clear that rdf:text literals are no different from any other literal. Therefore, no existing RDF or SPARQL feature gets affected by the addition of rdf:text. We would not go further into commenting the details of DATATYPE(), LANG(), and STR(): this should be clear from our definitions.

We hope that this would dispel any misunderstandings regarding the role of rdf:text. Please let us know if you feel the same; if so, we would change the document and send you the diff.

Please acknowledge receipt of this email to <mailto:public-owl-comments@w3.org> (replying to this email should suffice). In your acknowledgment please let us know whether or not you are satisfied with the working group's response to your comment.

Regards,
Boris Motik
on behalf of the W3C OWL Working Group



CUT AND PASTE THE BODY OF THE MESSAGE (I.E. FROM "Dear" TO "Group") INTO THE BODY OF AN EMAIL MESSAGE. SET THE To:, CC:, AND Subject: LINES ACCORDINGLY.

PLEASE TRY TO REPLY IN A WAY THAT WILL ALLOW THREADING TO WORK APPROPRIATELY, I.E., SO THAT YOUR REPLY CONTINUES THE THREAD STARTED BY THE ORIGINAL COMMENT EMAIL



Hello OWL and RIF working groups,

The SPARQL WG has reviewed the rdf:text Last Call document on our mailing list[1], in a teleconference [2], and today at our face-to-face meeting [3].

The group resolved to send the following comments. At this time, we do not have proposed spec text to resolve these comments, but would be glad to consult on possibilities.

The comment is at http://www.w3.org/2009/sparql/wiki/index.php?title=Rdf_text_LC_WG_comment&oldid=758 and is reproduced here for your convenience.

IanHorrocks Summary

SPARQL queries act on the graph, not on the serialized form. Thus, we suggest to the editors state the interactions with SPARQL in respect to:

   1. the restriction to rdf:text not appearing in RDF graphs should be 

extended such that rdf:text MUST NOT appear in SPARQL XML results. This extends the existing coverage of RDF graph exchange to include SPARQL results from SELECT, in the same way that CONSTRUCT and DESCRIBE queries are already covered.

   2. the use of "semantic equivalence" shall be clarified and it 

should be noted that rdf:text is a D-entailment and is accessed by SPARQL via a BGP entailment regime extension.

   3. that functions STR/DATATYPE/LANG act on the lexical 

representations and will be affected depending on the way an rdf:text aware entailment regime manifests it's results.

In addition it should be noted that rdf:text relates to the assumption in RDF that a literal has a datatype or a language tag but not both. Existing, deployed code relies on this invariant. [edit] Overview

There are some SPARQL-specific issues that arise that are not addressed in the document. The rdf:text only refers to "graph exchange" when saying that rdf:text must not appear in RDF graphs serializations but that does not apply to SPARQL directly.

Because rdf:text document says nothing about SPARQL operations and it's not clear to me whether changes to existing SPARQL queries are being assumed. At one time, they were.

Since SPARQL is defined over simple entailment, NOT datatype entailment, the notion of "semantic equivalence" (mentioned but not defined in the rdf:text document) does not make sense and this spec appears to require changes to SPARQL behaviour. This would be undesirable since it affects:

1. SPARQL Query Result XML Format

2. Interactions with simple entailment matching of BGPs, and extension of SPARQL via BGPs.

3. Effects on DATATYPE, LANG and STR

Note: In RDF, a literal has either a language tag or a datatype but not both. rdf:text changes this assumption so deployed code or SPARQL implementations that rely on this invariant may break.

We believe that these concerns can be remedied, if rdf:text talks about D-entailment specifically, instead of "semantic equivalence" (and thus not affecting simple entailment as well) in general. [edit] SPARQL XML Results Format

This is not "graph exchange" so the prohibition use of rdf:text in a serialization does not apply. It could be applied, but might not help systems that do want to see rdf:text literals, for example, SPARQL/OWL2.

The problem here, again, is that the semantic implications of rdf:text are not forward-compatible with existing RDF. This concern would be remedied by defining the semantic implications of rdf:text in terms of D-entailment only, as suggested above. In fact, we think that this fix makes the restrictions of the usage of rdf:text in RDF graphs redundant. [edit] Datatype Property

What happens if a datatype property is restricted to a rdf:text? What does the RDF serialization look like? Does it include rdf:text? [edit] BGP matching

The SPARQL standard defines SPARQL with respect to simple entailment and provides a mechanism for extension to other entailment regimes. See the section "12.6 Extending SPARQL Basic Graph Matching".

Since SPARQL is defined over simple entailment, NOT datatype entailment, the notion of "semantic equivalence" (mentioned but not defined in the rdf:text document) does not make sense. SPARQL is not acting on the serialization of an RDF graph. It acts on the value space of literals.

Simple entailment does not cover the RDF-MT entailments xsd1a and xsd1b, which are the rules for plain literals without language tag being the same value as XSD strings. So these are not required of a SPARQL processor using simple entailment.

Additional semantic equivalences implied by rdf:text should only affect D-entailment (where rdf:text is part of the datatype map D following [1]) but not simple entailment. Thus, the document should not talk about "semantic equivalence" in general terms but just in terms of D-entailment. This should fix the main problem raised and would only affect SPARQL engines that follow a (yet to be defined).

We suggest that it is explicitly noted that access to rdf:text aware entailment regimes by a SPARQL query is via the extension mechanism. [edit] Effects on DATATYPE, LANG and STR

Noting that this SPARQL-WG should maintain compatibility with SPARQL as published Jan 2008.

These functions are accessors to the components of a literal term. Different ways of manifesting a value from BGP matching will lead to different resutlts from these functions.

For these example, the serialized form using rdf:text is used although in an RDF graph it exists as a value and when the graph is serialised rdf:text does not appear. The examples relate to a variable bound to such a value and how the literal accessor function (DATATYPE, LANG and STR) of SPARQL can be impacted.

rdf:text does define some functions on rdf:text.

DATATYPE is defined so that the type of a plain literal without language tag is xsd string. There is no datatype for a literal with language.

SPARQL has the concept of a "simple literal" for a plain literal without language tag.

These functions are applied as part of the algebra, not as part of BGP matching - the entailment extension mechanism does not modify these functions. There may be different entailment regimes, maybe on different graphs, in the same query. [edit] DATATYPE

DATATYPE of a literal with language tag

SPARQL/2008:

 DATATYPE ("Padre de familia"@es) ==> error

When a literal is bound to a variable and subsequently used in a call to DATATYPE, what return value is expected? Is it true that if instead it is presented as below, a different result is obtained?

 DATATYPE("Padre de familia@es"^^rdf:text) ==> rdf:text

Similarly:

SPARQL/2008 defines:

 DATATYPE ("Padre de familia") ==> xs:string

but what is:

 DATATYPE ("Padre de familia") ==> rdf:text ?? xs:string ??

because one value space is a subset of the other.

The reason for rdf:text is the uniform treatment of literals so the query to find all the untyped literals ("untyped" meaning as per the current SPARQL REC - without type - simple literal or literal with language tag) might be changed. [edit] LANG

In RDF, a literal has either a language tag or a datatype but not both. So:

SPARQL/2008:

 Lang("Padre de familia"@es) ==> "es"

but

 Lang("Padre de familia@es"^^rdf:text) ==> ""

rdf:text:

 Lang("Padre de familia@es"^^rdf:text) ==> ??

c.f. rtfn:lang-from-text(Padre de familia@es"^^rdf:text) ==> "es" [edit] STR

rdf:text is a datatype with lexical space including the language tag

SPARQL/2008 defines:

 STR("Padre de familia@es"^^rdf:text) ==> "Padre de familia@es"
 STR("Padre de familia"@es) ==> "Padre de familia"

rdf:text:

 STR("Padre de familia@es"^^rdf:text) ==> "Padre de familia" ??

because STR returns the lexical form.

The lexical space of literals with language tags is changed by rdf:text. [edit] FILTERs

SPARQL FILTERs evaluate to an effective boolean value (defined in XQuery "2.4.3 Effective Boolean Value" and referenced by SPARQL "11.2.2 Effective Boolean Value (EBV)".

The EBV of a string is false if the string is of length zero else true.

Do any rdf:text literals have an EBV of false?


[edit] Intra-spec Compatibility [edit] IRIs vs. URIs

"This specification uses Uniform Resource Identifiers (URIs) for naming datatypes and their components" indicates that language tags in RDF are URIs, where SPARQL Query interpreted them as IRIs. Using URIs would imply that

<X> <p> <http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85&channel=R%26D> . would be matched by the SPARQL graph pattern <X> <p> <http://伝言.example/?user=أكرم&channel=R&D> .

References

1. http://www.w3.org/TR/rdf-mt/#dtype_interp

2. http://www.w3.org/TR/rdf-sparql-query/#sparqlBGPExtend

3. http://lists.w3.org/Archives/Public/public-rdf-text/2008OctDec/0036.html IanHorrocks


Lee on behalf of the SPARQL WG

[1] http://lists.w3.org/Archives/Public/public-rdf-dawg/2009AprJun/0107.html
[2] http://www.w3.org/2009/sparql/meeting/2009-04-28#rdf__3a_text
[3] raw IRC log: http://www.w3.org/2009/05/06-sparql-irc