Re: RDF-ISSUE-109 (ill-typed-so-what): What's the consequence of a literal being ill-typed? [RDF Concepts]

Pat,

I'm not requesting a particular change at this point. I am concerned that the treatment of ill-typed literals as neither syntax errors nor as inconsistencies imposes a complexity cost, and am asking what the motivation for this treatment is. Understanding the motivation would allow me to at least explain in Concepts why the complexity exists.

And yes, if there is no good motivation, then it ought to be changed. But a change also imposes a cost. Concerns here are:

- What is the effect on OWL and RIF?
- I take it that the treatment of literals whose datatype is not in the datatype map is unaffected?

Best,
Richard


On 13 Nov 2012, at 09:43, Pat Hayes wrote:

> Richard-
> 
> After further thought, and partly inspired by a remark you made earlier, a rather simpler way to do it: we say that the IL mapping can be partial, and we say that IL("sss"^^ddd) is undefined - has no value - when the literal is ill-formed, ie when sss is not in the lexical space of I(ddd). Then the current semantic conditions for RDF triples, if they are followed to the letter, would make the triple false. This is the same basic idea as suggested previously, but treating failure to have a value as being the trigger condition rather than denoting a special "error" value. This is a very minor change to the RDF semantic conditions, although it does use them in a new way and probably should be pointed out in a few extra sentences of commentary. 
> 
> This would be the simplest way to achieve the result you want, that any graph containing an illformed literal would be inconsistent. And I think it corresponds closely to your intuitions about bad literals being meaningless, right?
> 
> This is indeed do-able, so we should probably put it up to the WG for discussion as an issue. 
> 
> Pat
> 
> On Nov 13, 2012, at 12:31 AM, Pat Hayes wrote:
> 
>> 
>> On Nov 12, 2012, at 5:41 AM, Richard Cyganiak wrote:
>> 
>>> ...
>>> But that's all besides the point. My question was why the specs distinguish between the two things in the first place. Even in XSD entailment, this is consistent:
>>> 
>>> :a :b "xxx"^^xsd:integer.
>>> 
>>> But this is not:
>>> 
>>> :a :b "xxx"^^xsd:integer.
>>> :b rdfs:range rdfs:Literal.
>>> 
>>> I don't understand the point of singling out the concept of the “ill-typed literal”, declaring it to be a “non-syntactic error”, but not declaring it to be an inconsistency.
>>> 
>> 
>> You completely miss the point. Nothing is being "singled out" in the 2004 design. Inconsistency isn't something you get to "declare", it's a word with a technical meaning. It means, the graph is false in every interpretation. You get to define truth conditions on triples, and then consistency is a product of that. 
>> 
>> So, you want 
>> 
>>> :a :b "xxx"^^xsd:integer.
>> 
>> to be always false in any XSD-interpretation, no matter what :a and :b are interpreted to mean, right? That is what it means to say that this triple is XSD-inconsistent. 
>> 
>> I don't think this is possible without tweaking the basic RDF semantic rules. 
>> 
>> Here's the only way I can see of doing it. We require the universe of every interpretation to contain a special semantic value called "error", and we stipulate that no relational extensions ever contain a pair <x, error>, and we also tweak the blank node conditions so that bnodes are never mapped to error. Then we say that any ill-formed literal is required to denote error. This ensures that all triples with an ill-formed literal will be false, so such graphs are always inconsistent. Note, this has to be done in the basic RDF semantics, not just in the XSD semantics, since adding datatypes to literals doesn't change what URIs like :a and :b mean.
>> 
>> We didn't do this in 2004 because the basic RDF semantic machinery was seen as being defined independently of literal datatyping, but we could revisit this decision, I guess. I really don't like this idea, but we could do it. It would amount to incorporating datatype correctness checking into the basic RDF semantic machinery, and we ought to check any consequences it might have for OWL and RIF, as it is a rather basic change to the design.
>> 
>> Pat
>> 
>> 
>> 
>>> Best,
>>> Richard
>>> 
>>> 
>>> 
>>>> 
>>>> pa
>>>> 
>>>> PS: regarding the mandatory inclusion of xsd datatypes, it was just an idea, but I won't fight for it :) It sounded to me as a more acceptable compromise toward your proposal.
>>>> 
>>>> 
>>>> On Sun, Nov 11, 2012 at 1:29 PM, Richard Cyganiak <richard@cyganiak.de> wrote:
>>>> Hi Pierre-Antoine,
>>>> 
>>>> Thanks for the response.
>>>> 
>>>> On 11 Nov 2012, at 10:59, Pierre-Antoine Champin wrote:
>>>>> I'm affraid this distinction between "inconsistent graph" and "graph containing ill-formed literals" is inevitable, in general.
>>>> 
>>>> I still don't see why.
>>>> 
>>>>> The problem is that you can not expect all RDF-consumming agents to know about all possible datatypes.
>>>> 
>>>> Right, you can't expect all agents to know about all datatypes, and neither should you. But...
>>>> 
>>>>> Consider:
>>>>> 
>>>>> @prefix : <http://example.org/ns/>
>>>>> :foo :prop "bar"^^:custom-datatype .
>>>>> 
>>>>> If RDF-consistency depended on the well-formed-ness of the literal,
>>>>> then a general-purpose RDF processor could simply not decide whether the above graph is consistent or not.
>>>>> This would be embarassing...
>>>> 
>>>> Then consider this:
>>>> 
>>>> :foo :prop "bar"^^:custom-datatype .
>>>> :prop rdfs:range rdfs:Literal .
>>>> 
>>>> Is this consistent or not, under D-entailment? A general-purpose RDF processor cannot decide, because it depends on the custom datatype...
>>>> 
>>>> How is this any less embarrassing than the case you mention?
>>>> 
>>>> In fact I don't think it's embarrassing. Any entailment regime that involves datatypes requires a *datatype map* as part of its definition. For types within the datatype map, it is well-defined whether literals are ill-typed or not. For types outside the datatype map, no special equivalences or entailments hold, and no inconsistencies are detected.
>>>> 
>>>>> That being said, recommending that all RDF processors MUST or SHOULD understand the semantics of the datatypes listed in the abstract syntax documents would certainly sound like a good idea, IMHO.
>>>> 
>>>> I'm not convinced that this would be a good idea. Some RDF processors simply push triples around and don't care about the semantics of datatypes. Others only need to understand the semantics of a few (e.g., SPARQL requires understanding the semantics of *some* but not all). Also, just a few months we've spent a lot of time making rdf:XMLLiteral optional, and I don't think we want to reverse that decision :-)
>>>> 
>>>> I think the current spec text is sufficient:
>>>> 
>>>> [[
>>>> Specifications that conform to RDF may impose additional constraints on the datatype map, for example, require support for certain datatypes.
>>>> ]]
>>>> http://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#datatype-maps
>>>> 
>>>> Best,
>>>> Richard
>>>> 
>>>> 
>>>>> 
>>>>> pa
>>>>> 
>>>>> 
>>>>> On Fri, Nov 9, 2012 at 10:11 AM, RDF Working Group Issue Tracker <sysbot+tracker@w3.org> wrote:
>>>>> RDF-ISSUE-109 (ill-typed-so-what): What's the consequence of a literal being ill-typed? [RDF Concepts]
>>>>> 
>>>>> http://www.w3.org/2011/rdf-wg/track/issues/109
>>>>> 
>>>>> Raised by: Richard Cyganiak
>>>>> On product: RDF Concepts
>>>>> 
>>>>> (Raising and issue on this for referencing in the upcoming new Concepts WD)
>>>>> 
>>>>> What's the relevance of the distinction between “graphs containing ill-typed literals” and “inconsistent graphs” in the Semantics?
>>>>> 
>>>>> The text stresses that the presence of an ill-typed literals does not constitute an inconsistency. But why does the distinction matter? Is there any reason anybody needs to know about this distinction who isn't interested in the arcana of the model theory?
>>>>> 
>>>>>> From the perspective of someone who authors RDF data, or works with RDF data, they both seem like belonging to the same class of problem, and I'm a bit at a loss as to how to explain the difference.
>>>>> 
>>>>> What should an implementation do? Should authors avoid generating such graphs? Should consumers reject it? Is an implementation that rejects ill-formed xsd:dates conforming?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
>> 
> 
> ------------------------------------------------------------
> IHMC                                     (850)434 8903 or (650)494 3973   
> 40 South Alcaniz St.           (850)202 4416   office
> Pensacola                            (850)202 4440   fax
> FL 32502                              (850)291 0667   mobile
> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
> 
> 
> 
> 
> 
> 

Received on Tuesday, 13 November 2012 12:17:02 UTC