Re: ISSUE-12: xs:string VS plain literals: proposed resolution

On May 6, 2011, at 9:18 PM, Eric Prud'hommeaux wrote:

> * Pat Hayes <phayes@ihmc.us> [2011-05-06 18:26-0500]
>> 
>> On May 6, 2011, at 12:11 PM, Andy Seaborne wrote:
>> 
>>> It was Sandro who introduced SPARQL into the thread.  I don't agree that its a "grave mistake" in SPARQL.  Treatment should be uniform whether using SPARQL or some other way of accessing the data (SPARQL engines are often written over a base API anyway).
>>> 
>>> The proposed text is:
>>> """
>>> Recommend that data publishers use plain literals instead of xs:string typed literals and tell systems to silently convert xs:string literals to plain literals without language tag
>>> """
>>> 
>>> This is an RDF-as-data view; this is not D-entailment.
>> 
>> But my understanding is that the main (only?) reason for this suggestion is to make RDF data more accessible to SPARQL querying, because at present a query has to be couched in both forms in order to find both kinds of literal. If there is any other reason for this suggestion (which runs directly counter to all the thinking and discussion and advice that has so far been published on this topic since 2004) then I would like to see it spelled out in detail. And we should actively request input from OWL 2 and RIF representatives before making this recommendation. 
>> 
>> If this is the primary reason for this suggestion, then my point is that this effect - of having one query find both kinds of literal as answers - can be achieved by SPARQL using {xsd:string}-entailment rather than simple entailment. And, further, that if this is the only reason for this suggestion, that this is business for the SPARQL WG to consider rather than us. I do not believe that it appropriate for us to recommend that people write their RDF graphs in a certain way, unless we have very strong reasons for this and can articulate them clearly (and then also explain why we did not alter RDF to make this suggestion mandatory, if the reasons are so strong.) 
> 
> SPARQL happens to use graph equivalence to establish a viable set of variables bindings for a graph pattern, but I don't expect that any of us think it will be the last tool to use graph equivalence. What's important is that RDF provide the core property of equivalence so that SPARQL, OWL 7, Revenge of RIF 2, etc. all work with the same model (otherwise implementing something which e.g. executes SPARQL queries over closures of OWL and RIF inference will be unpleasantly fuzzy). SPARQL happens to play the canary as it it's the easiest way for us to test precise graph equivalence.

OK, but (to make the same point in a slightly larger context), RDF already does provide this notion of equivalence. It is called {xsd:string}-entailment, and it is fully and thoroughly documented in the existing RDF specs. Two graphs are equivalent in the required sense when they {xsd:string}-entail each other. 

(BTW, SPARQL isn't going to work properly over mere closures in any case, even for OWL 1, let along OWL 7 :-)

>>> It is not necessarily a change to SPARQL query, which has to work with old and new data.
>>> 
>>> :x :p "foo" .
>>> :x :p "foo"^^xsd:string .
>>> 
>>> One triple or two? The proposal says (ideally) one.
>> 
>> Actually, the proposal as written does not say this. This is definitely two literals. The proposal would rewrite this graph to one with a single literal, but it would not be the same graph. 
> 
> I think we're best off retroactively saying that every graph that looks like this has only one triple.

We can say this all we want, but saying it does not make it true. Right now, it is false. Those are two triples. If you want this to be one triple, you need to explain how to rewrite RDF Concepts to make it come out that way. Good luck.

Pat

> Telling the world that "abc"^^xsd:string is a deprecated form of "abc" (and systems are encouraged to normalize) is probably the best balance between simplification and disruption.
> 
> 
> 
>> Pat
>> 
>>> 
>>> [[
>>> The strongest I can find in the RDF docs is in Concepts: sec 6.5.2 as a note.
>>> ]]
>>> 
>>> 	Andy
>>> 
>>> On 06/05/11 15:35, Pat Hayes wrote:
>>>> 
>>>> On May 6, 2011, at 9:09 AM, Andy Seaborne wrote:
>>>> 
>>>>> See
>>>>> 
>>>>> http://www.w3.org/TR/sparql11-entailment/#id35808654
>>>> 
>>>> OK, so how many SPARQL engines support D-entailment? How do they indicate to the world which form of D-entailemnt they use (ie what D is, exactly) ?
>>>> 
>>>> Why not include xsd:string into the basic SPARQL entailment regime? It wouldnt be difficult to make this change in the specs wording, though the test cases would need some revision.
>>>> 
>>>> BTW, if the answer is, it would screw up existing implementations, then this is also an argument against RDF making any changes.
>>>> 
>>>> Pat
>>>> 
>>>>> 
>>>>> which depends on RDFS entailment
>>>>> 
>>>>> 	Andy
>>>>> 
>>>>> On 06/05/11 14:57, Pat Hayes wrote:
>>>>>> This discussion illustrates in a nutshell the essential tension at the core of SPARQL. Should a query be 'semantic', entirely about meanings, or should it be basically a process of syntactic matching? If one believes the semantic position, then it is natural to express the basic process in terms of entailment (the graph entails the query instance) and natural to treat semantically equivalent things as indistinguishable. However, it is also natural to not have such things as answer counts, no-match filters, and most of the actual apparatus of SPARQL, since none of this is *entailed* by the query graph, indeed by any graph at all. All of this is essentially syntactic information *about* the graph. Which is why I slowly came to the realization that to even talk about entailment in the context of querying is wrong. Querying is not a semantic operation, it is about the syntactic form of the graph.
>>>>>> 
>>>>>> OK, we can always talk about simple entailment to make us feel warm and fuzzy, but simple  )entailment is so simple that it amounts to a syntactic match anyway. But consider the following resolution of this meaningless issue: SPARQL should use {xsd:string}-entailment rather than simple entailment. (That is, D-entaiment where D is {xsd:string}. This will give exactly the behavior Sandro wants, and the required ideas and definitions have been in the RDF spec since 2004. So why are we, the RDF WG, even discussing this at all? We have already given SPARQL enough room in the RDF specs to do it properly.
>>>>>> 
>>>>>> Now, this resolution will not fly, I predict, because SPARQL does not want to get into any richer kind of entailment than simple entailment, but wants RDF to make things work out nicely even while it is doing simple syntactic matching. Because simple *syntactic* matching is the only kind of matching that is fine-grained enough to satisfy people who want to write filters on query results.
>>>>>> 
>>>>>> Pat
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On May 6, 2011, at 7:32 AM, Andy Seaborne wrote:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On 06/05/11 13:16, David Wood wrote:
>>>>>>>> On May 6, 2011, at 7:44, Sandro Hawke<sandro@w3.org>    wrote:
>>>>>>>> 
>>>>>>>>> On Fri, 2011-05-06 at 09:33 +0100, Andy Seaborne wrote:
>>>>>>>>>> 
>>>>>>>>>> I wonder if most people would be happen if we emphasised that it's
>>>>>>>>>> the
>>>>>>>>>> value that matters.  xsd:string and simple literal have the same
>>>>>>>>>> value,
>>>>>>>>>> as do 00123 and +123.
>>>>>>>>> 
>>>>>>>>> I guess it depends what you mean by 'emphasise'...
>>>>>>>>> 
>>>>>>>>> I was shocked to discover SPARQL cared about the difference, and thought
>>>>>>>>> it was a grave mistake at the time (but I didn't notice until it was too
>>>>>>>>> late).  I had assumed everyone already knew you should just care about
>>>>>>>>> the value, and that every API should convert for you, hiding the
>>>>>>>>> difference.  But I was wrong, and I don't really know how to get people
>>>>>>>>> to use the "Semantic Web" technologies at a "semantic" level.
>>>>>>>> 
>>>>>>>> +1. Of course, it would help if we standardized it that way :)
>>>>>>> 
>>>>>>> And better
>>>>>>> "if we *had* standardized it that way"  :-)
>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> Dave
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>   -- Sandro
>>>>>>> 
>>>>>>> There are a couple of factors that matter here:
>>>>>>> 
>>>>>>> 1/ Users expect what goes to be the same as what comes out.
>>>>>>> (tools do as well sometimes)
>>>>>>> 
>>>>>>> If they read in
>>>>>>> 
>>>>>>> :x :p "foo"^^xsd:string .
>>>>>>> 
>>>>>>> and get back:
>>>>>>> 
>>>>>>> :x :p "foo" .
>>>>>>> 
>>>>>>> enough of them are surprised (=>   they send email to support lists asking about it).
>>>>>>> 
>>>>>>> 2/ SPARQL FILTERs don't care - it's graph matching that does because graph matching is simple entailment.  And that's what most toolkit provide - the direct manipulation of the RDF terms, lexical form, datatype and all.
>>>>>>> 
>>>>>>> :x :p "foo" .
>>>>>>> :x :p "foo"^^xsd:string .
>>>>>>> 
>>>>>>> One triple or two?
>>>>>>> 
>>>>>>> 	Andy
>>>>>>> 
>>>>>>> (For the record : "foo"^^xsd:string matches "foo" in a Jena memory model -- there would be two triples.)
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> ------------------------------------------------------------
>>>>>> IHMC                                     (850)434 8903 or (650)494 3973
>>>>>> 40 South Alcaniz St.           (850)202 4416   office
>>>>>> Pensacola                            (850)202 4440   fax
>>>>>> FL 32502                              (850)291 0667   mobile
>>>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> ------------------------------------------------------------
>>>> IHMC                                     (850)434 8903 or (650)494 3973
>>>> 40 South Alcaniz St.           (850)202 4416   office
>>>> Pensacola                            (850)202 4440   fax
>>>> FL 32502                              (850)291 0667   mobile
>>>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> -- 
> -ericP
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes

Received on Saturday, 7 May 2011 04:39:36 UTC