Re: RDF-ISSUE-79 (undefined-datatype): What is the value of a literal whose datatype IRI is not a datatype? [RDF Concepts] from Pat Hayes on 2011-11-20 (public-rdf-wg@w3.org from November 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Sun, 20 Nov 2011 15:55:28 -0600
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>, RDF Working Group Issue Tracker <sysbot+tracker@w3.org>
Message-Id: <9C3A19FB-692B-4B38-A079-19A00D1E42AC@ihmc.us>
On Nov 20, 2011, at 2:16 PM, Richard Cyganiak wrote:

> Hi Pat,
> 
> On 18 Nov 2011, at 16:55, Pat Hayes wrote:
>>> The RDF Concepts spec (in both 2004 and 1.1 versions) does not answer the question what's the value of a literal where the datatype IRI doesn't actually denote a datatype, like <"foo",http://example.com/not-a-datatype>. This is surprising, as there is a section that normatively defines the value of *all other* literals.
>> 
>> I dont find it surprising, and I think you have slightly mischaracterized it.
> 
> I'm not criticizing the design. I'm criticizing the fact that RDF Concepts doesn't say anything about what happens in this case.
> 
>> A typed literal only has a fixed meaning relative to an actual datatype. So, to fix the meaning, you have to invoke a datatype denoted by the datatype URI. If this is not available, then the literal's value is not determined, and it becomes in effect something like an unknown URI.
> 
> Right – and that's exactly what I expected RDF Concepts should say. At the moment it says *nothing*, so users and implementers have to guess (or read RDF Semantics).

Ah, OK, fair enough. 

> 
>>> There are many possibilities:
>>> 
>>> (i) the spec leaves it undefined
>>> (ii) that's not a valid RDF graph
>>> (iii) it's a valid RDF graph, but the value, if any, is unknown
>>> (iv) it's a valid RDF graph, and the literal is ill-typed
>>> 
>>> This should be made explicit.
>>> 
>>> The status quo is (i). I believe that the model theory says it's (iii).
>> 
>> Yes, it is (iii) at the moment, if by "valid" you mean syntactically correct. (See below.) However, the semantics does (rather vaguely) talk about the possibility of having datatypes "declared" in a graph (see end of section 5.1, http://www.w3.org/TR/rdf-mt/#DTYPEINTERP ):
>> 
>> "If every recognized URI reference in a graph is the name of a known datatype, then there is a natural datatype map DG which pairs each recognized URI reference to that known datatype (and 'rdf:XMLLiteral' to rdf:XMLLiteral). Any rdfs-interpretation I of that graph then has a corresponding 'natural' DG-interpretation which is like I except that I(aaa) is the appropriate datatype and the class extension of rdfs:Datatype is modified appropriately. ApplicationsMAY require that RDF graphs be interpreted by D-interpretations where D contains a natural datatype map of the graph. This amounts to treating datatyping triples as 'declarations' of datatypes by the graph, and making the fourth semantic condition into an 'iff' condition. Note however that a datatyping triple does not in itself provide the information necessary to check that a graph satisfies the other datatype semantic conditions, and it does not formally rule out other interpretations, so that adopting this requirement as a formal entailment principle would violate the general monotonicity lemma described in section 6, below."
> 
> I'm sorry but I can't make sense of this paragraph. And I've been trying, honestly. What's a “recognized URI”? What's a “known datatype”?
> 
> (I'd like to flag this paragraph for editorial attention – I can make sense of most of the Semantics document if I try hard enough, but this part beats me.)

OK, flag noted. I obviously need to express this idea better (or remove it altogether.)

> 
>> You can never know that literal is ill-typed unless you have the datatype to check that it is, so (iv) can't ever be right. 
> 
> Well, I know that this is the intended design, but given that “ill-typed” is never precisely defined anywhere,

Hmm, I thought it was. Section 5.1 in semantics: 

 "The condition also requires that an <ital>ill-typed,/ital> literal, where the literal string is not in the lexical space of the datatype, ..."  

I guess I should have stated it as a formal definition in the glossary, my bad.

> it's not unreasonable for a reader to start with the working theory that literals with a datatype IRI that isn't known to denote a datatype are also considered ill-typed.

Not unreasonable, but still wrong. The problem is that when we don't know what the datatype is, the value of the literal really could be anything. In particualr, it could be a perfectly good literal value. So we can't in this case impose the 'not in LV' condition on the value: we just don't know enough to tell. When we do have a datatype and a string which the datatype mapping knows is not allowed, then we can impose the not-in-LV condition, but this is a much more informed state to be in that not even knowing the datatype. 

> 
> In fact, I believe that's what the OWL2 RDF-based semantics seems to assume. They treat both cases – lexical form not in the L2V map, and datatype IRI not in the datatype map – in the same way: the literal denotes something outside of rdfs:Literal. 

Ah, then that sounds like a bug to me (or at any rate it would make OWL2 non-monotonic.) I will check this out in more detail.

> See text quoted here:
> http://lists.w3.org/Archives/Public/public-rdf-wg/2011Nov/0126.html
> 
> This may be a bug in OWL2 and may be worth raising as an erratum with the OWL WG. (But someone else do this please – model theory is not my territory.)

OK I will pick this up. 

> 
>> (BTW, this phrase "valid RDF graph" seems to be blurring its meaning.
> 
> You're right – I meant “valid” in the sense of “conforming to the definition of RDF graph”. (cf. “valid HTML”, “RDF Validator”)

Right, thats what I thought. Problem with this is, an invalid (non-valid?) graph isn't an RDF graph at all, so saying 'valid' doesnt add anything, but it *sounds* as though it does, which is potentially confusing. 

> 
> On 18 Nov 2011, at 17:07, Pat Hayes wrote:
>> BTW, its very odd to say that something "is unknown". If this means "we don't know what its value is (yet)", then of course any missing information makes something unknown.  But it is tempting to treat "unknown" as a classification, like being human, so that once something is in the unknown category then it is **known** to be 'unknown'. And if we do that, then the logic becomes nonmonotonic and many other semantic assumptions break. I think you guys might be using the word in different ways (?). I'm assuming Richard is using in the second way. 
> 
> Well, I'll admit that I'm struggling with this notion of something being “unknown”. Surely, in a deterministic system, you either know something or you don't know it – and if it's the latter, then you *know* that it's unknown.

What I meant was, if unknown means 'not known', then you might come to know it later. Its being not known is labile, because new information might come along. BUt if "unknown" is a classification or a kind of value, then once somehting is "unknown" it has to stay "unknown". Coming to know it later is then actually a kind of contradiction. (This issue comes up acutely in considering three-valued logics, BTW, which I once studied in more depth than I care to remember, a long time ago.)

> 
> Now I'm tempted to write something like this in RDF Concepts:
> 
> [[
> If the literal's datatype IRI is not in the datatype map, then the literal value is undefined.
> ]]
> 
> “Undefined” seems to be the right term to use here: The spec does not say anything about what the value is, but neither does it stop anyone from defining the value (e.g., in a semantic extension).

As long as nobody thinks that "undefined" means "does not have a value at all". I think "unknown" might actually be better, maybe. Right now Semantics says this: "Typed literals whose type is not in the datatype map of the interpretation are treated as before, i.e. as denoting some unknown thing. "  

Its not easy to say this stuff in a way which is compact, easy to read and also reasonably proof against misunderstandings. 

Pat

> 
> Best,
> Richard
> 
> 
> 
> 
>> In the semantics document, validity refers to truth in interpretations: being invalid means that a graph is false in every interpretation, ie it cannot be satisfied. It does not mean syntactically illegal. Validity in this sense requires an inference engine to check, not a parser. I know that "valid" has many meanings, but just wanted to make sure we don't start talking past one another, or at least be aware of it when we do, cf. this thread.)
>> 
>> Pat
>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> ------------------------------------------------------------
>> IHMC                                     (850)434 8903 or (650)494 3973   
>> 40 South Alcaniz St.           (850)202 4416   office
>> Pensacola                            (850)202 4440   fax
>> FL 32502                              (850)291 0667   mobile
>> phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
>> 
>> 
>> 
>> 
>> 
>> 
> 
> 
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Sunday, 20 November 2011 21:56:09 UTC