Re: Datasets and contextual/temporal semantics from Pat Hayes on 2011-10-13 (public-rdf-wg@w3.org from October 2011)

From: Pat Hayes <phayes@ihmc.us>
Date: Thu, 13 Oct 2011 08:29:43 -0500
To: Richard Cyganiak <richard@cyganiak.de>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <38CB85A6-F664-4A30-BCA5-985E49B7DC46@ihmc.us>
On Oct 13, 2011, at 6:10 AM, Richard Cyganiak wrote:

> (I wrote this early today in the hotel – don't have wifi there and didn't have a chance to read Pat's long message yet – will be interesting to compare them!)

Indeed. I should say immediately that your notion here of "context" is not my noton of "context" (I hate that word for exactly this reason, by the way: It means something different almost every time it is used.) 
> 
> 
> The big problem we face with the semantics of RDF datasets is this: People want to use RDF datasets to manage information with *different context*, such as different temporal validity. RDF Semantics is not designed to to handle different contexts.

Indeed, and that was DELIBERATE. A contextual logic (in the sense you are using it) simply does not work as a Web logic. For some discussion of this point, see  http://www.ihmc.us/users/phayes/IKL/GUIDE/GUIDE.html#LogicForInt . In fact, a contextual logic does not work for ontologies in general. If the truth of an assertion depends on the context in which it is asserted, and if this context is not available when it is read, then it is USELESS. Or maybe worse than useless. 

> Many of our problems stem from that.
> 
> I'll give examples.
> 
>   :G2010 {:alice :age 29.}
>   :G2011 {:alice :age 30.}
> 
> Individually, each of those graphs are true (at a certain point in time). If taken together, an inconsistency is inferred (assuming :age is a functional property):
> 
>   :alice :age 29, 30.
> 
> By merging the two graphs, we have discarded the contextual information.

In RDF, that "contextual information" was never there in the first place. This is BAD RDF.

> This shows that the graph merge operation is *not truth-preserving* – not *valid* in the formal sense – *if* the merged graphs have different contexts.

No, it shows that they don't have contexts. Graph merging is truth preserving, precisely because RDF is *not* a contextual logic. 

> 
> Another example:
> 
>   :G2010 {
>     <person/2279> :worksFor <person/2279/employer>.
>     <person/2279/employer> owl:sameAs companies:431.
>   }
>   :G2011 {
>     <person/2279> :worksFor <person/2279/employer>.
>     <person/2279/employer> owl:sameAs companies:998.
>   }
> 
> Taking each graph individually, we have no reason to assume that they aren't true (at their respective times), and the modelling is perfectly sensible. But the person changed employers, so evidently <person/2279/employer> identifies two different resources in the two graphs (assuming companies 431 owl:differentFrom companies:998). Their merge would obviously be inconsistent.

Which shows that the original modelling is not, in fact, sensible. It is in fact thoroughly bad practice. If the SWeb is populated with time-sensitive data which depends upon an unstated context to be true, then it isnt going to work. You have to incorporate the "context" into the data in some way, a process which I have called 'decontextualizing'. We have worked out ways of doing this (in logic, admittedly, rather than RDF) in excruciating detail in the IKRIS project a few years ago. See the above paper, later sections, for details. 

> 
> These examples were temporal, but there are non-temporal examples too:
> 
>   :G1 {
>     <urn:uuid:123456789> a :Person;
>       :mbox <mailto:bob@example.com>;
>       :birthday "1984-02-05".
>   }
>   :G2 {
>     <urn:uuid:123456789> a :Person;
>       :mbox <mailto:alice@example.com>;
>       :birthday "1981-10-21".
>   }
> 
> This is a clash of identifiers, caused maybe by poor chance, or by erroneous copy-and-pasting or poor identifier management. Nevertheless, each graph on its own is reasonable and has to be considered true.

Why? they clearly arent both true. I would say that in a case like this, neither should be considered true, since each casts doubt on the other.

> No party has more right over the authority-less URN than the other. In the one context, the URN simply *does* denote Alice, and in the other context, it simply *does* denote Bob. But the merge of these two true graphs of different context is clearly false, as the contexts are incompatible.

I dont think this is a reasonable inference even if we were using a context logic. 
> 
> So, RDF Semantics does not work if we mix triples that have different temporal validity or other contextual differences.

It is predicated upon RDF *not* being contextual in this way, so yes, it does not "work" in the way you would like. I consider this to be a feature. 

> 
> That's not really much of a problem, because it just means that you have to keep triples of different context apart in separate graphs.

No. You *should* compose your data so that data can be merged. That is the entire purpose of the Semantic Web design. Without that, all of linked data is just a bunch of isolated DB table fragments in a poor notation. 


> This is intuitive enough, and people seem to have no problem understanding that (except Pat, ironically!) There is an unspoken yet intuitive assumption that the triples in one graph share the same context.

There should not be. If there is, we are all in trouble. 

> As long as contexts are kept apart, the entailments of RDF Semantics work and are useful.
> 
> Now, in the real world, applications *have* to deal with data of different contexts. Different versions, different provenance, contradictory viewpoints, honest errors in identifier use, and so on. Many practitioners working with RDF already take for granted a model that takes context into account, even though the formal semantics doesn't take it into account.

Fair enough, though I dont think all these are all kinds of 'context'. Errors and falshoods are what their names suggest, not truths in a different 'context'.

> 
> That's why named graphs were invented in the first place – to make it possible to keep contexts apart, so that statements which are true in their respective context don't get all smushed together into one inseparable inconsistent mess.

No, that is not why named graphs were invented. They were invented so that one could say things about graphs in RDF. Things like who is asserting them, where they came from, etc..,: but not to supply a 'context' for the truth of the triples in them. That would be data, not metadata. 

> Named graphs exist to *escape* the entailments of RDF Semantics in situations where they are inappropriate. Named graphs exist to enable the storage and processing of contradictory and incompatible information with RDF tools.

Wrong, wrong. They might make this possible (maybe) , but this is not what they were designed for.

> 
> So here are a couple of axioms that I believe are true and that one has to understand in order to apply RDF Semantics correctly, but that are unfortunately unstated in RDF Semantics:
> 
> 1. RDF Semantics defines an entailment relationship between sets of triples, a.k.a. RDF graphs

yup

> 2. This entailment relationship is only valid if all triples share the same context

I strongly disagree. 

> 3. Therefore, placing triples with incompatible context into a single graph is not seen as as something useful, and we understand RDF graphs as only containing triples of compatible context

Again, strongly disagree. In fact, this wouls make all graph merges invalid, which is ridiculous. 

> 4. It follows that merging two graphs with incompatible contexts is not a valid operation

see above

> 5. Whether two contexts are compatible or not is outside of the scope of RDF Semantics

It had better not be. If we have contexts at all, then the semantics has to give them a semantics. 
> 
> And now, the key additions for RDF 1.1:
> 
> 6. RDF datasets provide a way of managing context by keeping triples with different context in different named graphs
> 
> I think it is absolutely essential to keep these points in mind when we're talking about the semantics of RDF datasets.

I think we must reject this idea clearly and firmly. 
> 
> If we can't extend RDF Semantics into a proper temporal and contextual logic, then we have make the axioms above explicit, and ensure that the semantics allow RDF datasets to hold incompatible information without entailing inconsistencies.

Nonsense.This would completely destroy any hope of doing realistic inference with Web data. What mashups can possible work, with this condition imposed? 

Pat

> 
> One consequence of this is that for any entailments that arise from the structure of RDF datasets, we must be clear about the context in which they arise. For example, if we want a graph name to denote a graph container, then we have to answer the question: *in which context* should it denote the container?
> 
> Best,
> Richard
> 

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973   
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Thursday, 13 October 2011 13:30:29 UTC