Problems with the RDF Semantics document (was: Re: ISSUE-30: How does SPARQL's notion of RDF dataset relate our notion of multiple graphs?) from Richard Cyganiak on 2011-04-18 (public-rdf-wg@w3.org from April 2011)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Mon, 18 Apr 2011 23:40:41 +0100
To: Ivan Herman <ivan@w3.org>, Dan Brickley <danbri@danbri.org>
Cc: RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <1CC11DE4-B7F8-40F1-94B0-9FA8BFB223F5@cyganiak.de>
Ivan, Dan,

This got long, sorry about that. Summary: I like inference and would love to see it presented in the RDF recommendation set in a way that meets the needs of users and implementers. The RDF Semantics document does not meet those needs. This is because it puts the cart (model theory) before the horse (data), and is written for the wrong audience.

On 17 Apr 2011, at 08:10, Ivan Herman wrote:
>> This part of SPARQL is successful and useful despite being disconnected from the RDF Model Theory. RDF Datasets as they are defined in SPARQL have no impact on entailments, and therefore do not require a relation to the RDF Model Theory.
> 
> Strictly speaking, this statement is gradually getting overhauled. SPARQL 1.1 Entailment regimes:

Strictly speaking, that's not quite true.

SPARQL Entailment Regimes define entailments over *single* RDF graphs. The document states that there is no interaction between triples in different graphs in the dataset for the purpose of entailment. Hence, the fact that a graph is part of an RDF Dataset does not have any effect on entailment.

On 17 Apr 2011, at 09:07, Dan Brickley wrote:
>> My understanding is that the RDF Model Theory exists to define which inferences are valid, given an RDF graph. What other purpose does it serve?
> 
> It helps us understand the kinds of transformations on RDF graph data
> that are truth-preserving, also the kinds that change the meaning of
> the graph such that the derrived graph says something different about
> the world. You can think of those in terms of entailments I guess;

Yes, those are about entailment. My question was about *what other purpose* the model theory serves, besides defining the valid entailments.

> the mistake is to think 'I'm not writing a rule engine, so I can ignore that mathsy spec".

Why, Dan, is that a mistake? Why should someone who is not writing a rule engine (or data that heavily relies on these rules, like complex vocabulary mappings) have to read the mathsy spec?

On 17 Apr 2011, at 09:14, Ivan Herman wrote:
>>> My understanding is that the RDF Model Theory exists to define which inferences are valid, given an RDF graph. What other purpose does it serve?
> 
> And... let us not use 'inference' as some sort of a dirty word.

I didn't. The dirty word in my email was "model theory".

> The RDF Semantics will tell me such trivially-looking-thing like the meaning of rdfs:subpropertyOf. If one looks at the RDF(S) entailment rules in the document, they all are, in fact, fairly trivial, but a specification must specify those somewhere.

Absolutely! The entailment rules are, in fact, my favourite part of the RDF Semantics document. I find them easy to grasp, and key to understanding RDF. It is a shame that something so useful is hidden in a non-normative chapter near the end of the document, wedged in between “Monotonicity of semantic extensions” and “Appendix A: Proofs of Lemmas”. Keep in mind that we are talking about a document that requires deep study if read by someone who does not have a background in logics. I'm sure that many a reader has given up after a few pages for lack of patience or education. This document really doesn't do a great job at communicating the actionable parts in a manner that is easy to digest by users and implementers.

On 17 Apr 2011, at 09:57, Dan Brickley wrote:
> Information expressed in RDF is horribly sparse, unpredictably
> shaped, and draws on an awkward patchwork of evolving schemas defined
> by parties who don't talk to each other as often or as carefully as
> they might. A bit of inference here and there is one of several
> techniques that can be used to smooth over those data gaps,

Yes -- and it would be *great* if there was a document that addresses exactly this!

> and I despair when I read practical RDF enthusiasts talking as if such
> techniques are akin to attempting to create full AI.

The part of the RDF recommendation set that explains these techniques is RDF Semantics. Did you read Sections 0.1 or 1.1 of that document recently [1][2]? Can you fault a practical RDF enthusiast who works through this text, and then decides that he or she is not going to bother figuring out what exactly the difference between RDF entailment and full AI is? Don't you think that might be part of the problem?

(Side note: I have met more than one Description Logics enthusiast who speaks as if these things *are* just an iota away from achieving full AI. But that's another topic.)

Look at what's going on in the other branch of this thread -- an absolute authority on the subject insists that the purpose of RDF Semantics is *not* to define those rules for smoothing over the gaps in that awkward patchwork of data. He says that's merely a minor side effect of the document's true purpose, which is to give *meaning* to the entire awkward patchwork, including those parts that are not connected to inference at all. Well, there's a lot of maths there, I understand the mechanics of it, but I don't see the part where it produces *meaning*, and I don't see how this weird notion of *meaning* is relevant to the world-wide distributed software system we are trying to build here. This is probably because of my lack of education (or imagination?). Still, there is this extreme disconnect between what is claimed to be the very foundation of RDF, and what I can see going on in practice every single day.

And let me repeat, I'm in perfect agreement with you regarding the importance of inference to the Semantic Web project. But inference should serve data, and model theory should serve inference. My problem is with the assumption embodied in RDF Semantics that data is /nothing/ without model theory, and everything in the RDF stack exists in order to make logical assertions that constrain possible worlds. That's backwards.

> Indulging in
> them-and-us-ism that contrasts practical, developer-friendly linked
> data RDF against pie-in-the-sky ivory tower inference-obsessed
> eggheaded academics

These characterisations are yours, not mine. Regarding the pie-in-the-sky ivory tower inference-obsessed eggheads, I spend my days in a building with plenty of them; see [3] for the kind of stuff we do when get bored.

> won't help us get our job done any faster, or help
> us appreciate what everyone has to offer here.

Well, at this time I'm still more concerned about getting the job *right*. I appreciate happy users, interoperable systems, test cases, and running code ...

Let me summarise: I believe that RDF Semantics, *as a document*, *in the way it presents its content*, is impractical, fails to meet the needs of users and implementers, perpetuates a (caused by its history, but visible in retrospect) misunderstanding of the purpose of RDF, and altogether sets the bar too high for no good reason.

What do I think should be done about all this? I don't know. Perhaps very little *can* be done, given the constraints we operate under. But a first step might be to talk about the role of the various documents in relation to each other, and about their target audiences.

Best,
Richard


[1] http://www.w3.org/TR/rdf-mt/#intro
[2] http://www.w3.org/TR/rdf-mt/#technote
[3] http://pedantic-web.org/fops.html#reasoning
Received on Monday, 18 April 2011 22:48:00 UTC