Re: Comments on Last-Call Working Draft of RDF 1.1 Semantics from Michael Schneider on 2013-12-07 (public-rdf-comments@w3.org from December 2013)

From: Michael Schneider <schneid@fzi.de>
Date: Sat, 7 Dec 2013 01:07:58 +0100
To: Pat Hayes <phayes@ihmc.us>
CC: Guus Schreiber <guus.schreiber@vu.nl>, "public-rdf-comments@w3.org Comments" <public-rdf-comments@w3.org>
Message-ID: <52A266DE.4000808@fzi.de>
Dear Pat, Dear Working Group,

we had settled on treating ISSUE-165 during the CfI phase,
and I wanted to first create my implementation report
and find an opportunity to get more into the details of
the draft of the semantics before giving an answer
to the WG answer. Here is is my answer now.

Before I come to replying to the particular WG answers,
I want to bring up another issue that I have found
only during the CfI phase. In my original LCWD comment,
I had only swiftly checked the precise changes concerning
datatypes; my main argument was more against the change
of the nomenclature and formal representation from a
datatype map to a set of recognizing IRIs. Now, after a
more in-depth check, I have to say that I have now also
technical problems with this change.

Let's assume we have a semantic extension of D-RDFS,
called "D-X", with several datatype IRIs in D:

     D := { xsd:string, xsd:integer, ... }

In the RDF 2004 spec, the analog entailment regime
would have been defined w.r.t. a datatype map D, which
would be a set of /pairs/ (u,d), where u a IRI and d
a datatype. In our case, these datatypes d would be
somehow represented as references to the corresponding
sections in the XSD Datatypes spec, telling the
characteristic aspects of these datatypes, including
their lexical spaces, value spaces, and the mapping
from literals to values.

In the RDF 2004 spec, both the datatype IRIs and their
associated datatypes would be fixed for D-X. So for
any D-X interpretation I, the denotation of u, I(u),
equals d. In contrast, in RDF 1.1, D would contain
the IRI u instead of the pair (u,d), and, as D is a
set of recognizing IRIs, I would know that for any D-X
interpretation I, there exists /some/ datatatype d
with I(d). However, I would /not/ know what the
datatype d is, except perhaps from additional information
given in the handbook for D-X, but by means that are
outside the RDF specification.

Concerning entailments, the way I have originally read
the new draft, was that for a given semantic extension D-X,
it is possible for a datatype IRI d in D to have different
denotations (i.e. datatypes) under different
D-X-interpretations I1 and I2, and, in fact, the actual
datatype would be completely unspecified in this reading.
This would then cancel out most datatype-related entailments
compared to RDF 2004, in which for any pair (u,d) in a
datatype map D of D-X, the denotation of u under any
D-X-interpretation I would always be defined to be the same
datatype, namely I(u) = d.

I am sure that such a reading is not what the WG intends,
but the only sentence I could find about what
might have been intended is in Chapter 7:

"""
We assume that a recognized IRI identifies
a unique datatype wherever it occurs, and the semantics
requires that it refers to this identified datatype.
"""

Now, this is an extremely vage and confusing sentence,
and I have still no idea if I understand it. With regard
to what is uniqueness meant here? What is meant by
"the semantics requires" something? The sentence should
probably simply be dropped. But then, nothing else is
being said about the datatypes associated to the
"recognizing" IRIs, and this would then, of course,
bring back my destructive reading above.

So, in my original reading, by replacing datatype maps
with sets of recognized IRIs, half of the required
information has been lost, or at least, the explicit
support by the specification has been removed.
It is clear that from a simple set of IRIs alone,
there is no way to know what the IRI denotes, and
thus what the expected semantics of an RDF graph
with literals is meant to be under the D-X semantics.
Consequently, the documentation of D-X would have
to come up with some custom means of saying which
the IRIs in D denote. But then, there /would/ be
the pairs of IRIs and datatypes again, essentially
at least, just in a way unsupported by the spec.
I don't believe that it was really the intention
of the WG to support such a source of confusion.


So far for the new point. Now to the particular
WG answers (quoted by >), where I will come back to
this and my original argument again.


 > Regarding ISSUE-165, this matter was debated
 > extensively within the WG, and most of your
 > points were made during this discussion.
 > (see http://lists.w3.org/Archives/Public/public-rdf-wg/2013Jun/0085.html
 > and subsequent threads.)

First to say, I do not see in the cited mail
exchange any discussion about my original argument
that at least three other core Semantic Web
standards, namely SPARQL 1.1, OWL 2, and RIF,
are reusing the original definition of RDF
datatype maps, and thus interoperability with these
standards will thus be directly affected. If you
make the change in the RDF spec, then the current
versions of these other specs will be bound to the
old version of the RDF standard and will be formally
incompatible with the current one.

Even if the revised definition of datatype maps
is intended to "mean basically the same thing",
the other specifications will still be incompatible
with the new definition in a strictly technical sense:
They use a different formal representation and a
different nomenclature for the associations
of IRIs and their denoted datatypes, and so
one will always have to explain the translation
between the two formalism. And when the time
comes for new revisions of these other specs,
it has to be decided by these other WGs to either
follow the new approach, or to stick with the old
one. From a pov of the whole Semantic Web, the
first option is of course what should be done,
so, in essence, by applying this change in the
RDF spec, the RDF WG essentially forces the other
specifications into the same change as well.
Hence, the RDF WG is in high responsibility
here and should do a change only when there is
clear motivation for it, and when it can be
foreseen that the change will be easily
accepted by future WGs of the other specs.
Neither do I see any clear motivation for
the change, nor would I expect that such
future WGs will easily accept this change.

However, I can see that my new technical point
given above had, in its essence, already been
brought up by Antoine Zimmermann in the first
point of his review cited above. As far as I
was able to follow the heated discussion there,
it goes pretty much in circles, and is more of
a series of attempts to convince the other party
of their preferences, including BIG LETTERS,
after which Antoine eventually gave in.
So this is not so much what I would normally
think of being an "extensive WG discussion".

Anyways, what I can see as the essence of this
discussion is that you consider the change to be
semantically compatible with the old version,
and that it is meant to only b a small change.
Even if I accept this (which would require me
to have a different reading of the draft than
the one I give above), it is still the case that
you change the formal representation underlying
datatype semantics from a set of pairs of IRIs
and datatypes into a set of IRIs and some
additional text indicating the understanding of
the association between these IRIs and their
denoted datatypes.

I do not consider this to be a small thing at all!
To me, this is comparable to changing the syntax of,
say, the assignment construct of a programming
language, from the widely used "reference=value"
style into something where you just declare the
reference, and require that these references get
their value somehow, by a means which is outside
the language spec. You may argue that you can
still write exactly the same kinds of programs
with the revised language, which may really be
the case, but to the price that any existing
software written in this language will not
compile anymore under the revised version,
any existing compiler needs to be rewritten,
same for any textbook on that language,
and all professional programmers have to
learn the new construct, wasting some of their
precious productive time. And after all,
the change would be widely considered
completely unesseary, because the old construct
worked perfectly well and was in wide use,
while the new one may even lead to confusion.

Back to the change in RDF, if you really think
that the semantic consequences are the same and
that it is a minor change, then why the change
at all? In particular, given that such a change
will break formal compatibility with other existing
Semantic Web standards for no added value?


 > The primary reason for the change was to simplify
 > the presentation of the RDF semantics, which was
 > an overarching goal of the WG.

The primary goal of any W3C WG should be to comply
with the WG charta, which, in the case of the RDF WG,
explicitly requires that "changing the fundamentals
of the RDF Semantics" are out of scope for the WG
(Chapter 3). The scope of the RDF WG, according to
the charta, was "to extend RDF to include some of
the features that the community has identified as
both desirable and important for interoperability
based on experience with the 2004 version of the
standard, but without having a negative effect on
existing deployment efforts." Now see what you are
about to do here: You want to change a basic formal
aspect of the original RDF standard, which will
break interoperability with several other core
Semantic Web standards!

But let's talk about your argument of simplification.
I do not agree that this change counts as a
considerable simplification at all, rather the
opposite. I originally expected that the semantic
conditions of datatype semantics, which really have
always been particularly easy to understand, would
have changed as well. But, as I found, they are
still essentially the same (modulo adjustments
to the new notion of recognizing IRIs). So what
you really only change here is to make the
original datatype map, which was a set of pairs
consisting of an IRI and a datatype, into a
set of IRIs with some additional text telling
that the IRIs have to denote their corresponding
datatypes somehow. So you have changed something
that is represented in a very standard way and
perfectly clear to understand into something that
is certainly not clearer, and to me, as I stated
above, even confusing.

In any case, such a kind of change certainly does
not justify a deviation from what has been used
by several other Semantic Web standards.


 > The actual mathematics has not altered, as the
 > 2004 semantics required D-interpretation mappings
 > to conform to the datatype map, so the datatype map
 > is simply a part of (a restriction of)
 > the interpretation mapping itself.

Even if I would agree that the current draft can be
read this way, it is still the case that the formal
representation has changed, which breaks interoperability
with existing Semantic Web standards.
And again, if there is really hardly a change,
why do we need the change at all?


 > Once this is recognized, it is clearly simpler to
 > treat it in this way rather than as a separate mapping.

It should be clear by now that I disagree with this view.
The original way was perfectly clear to me,
while the new one is at least confusing to me.
But, apart from personal preferences, even if it
really is a simplification, then the simplification
would be much too small to justify breaking
interoperability with existing standards.


 > In addition, it had been noted by several commentors
 > that the 2004 definitions allowed for 'pathological'
 > D mappings, such as one which permutes the meanings
 > of the XSD datatype IRIs. It was felt that
 > disallowing such maps was a laudable by-product
 > of the change.

Now, this argument surprises me, and there are two answers
to this.

Firstly, the problem cannot be that big, given the fact
that in the ca 10 years since the original RDF standard
at least three other core SW standards have been written
which reuse the original notion of datatype maps without
problems, each taking years of specing work and building
up considerable experience with these things. This provides
strong evidence to me that things are sufficiently fine
with datatype maps.

As far as I am concerned myself, I have been responsible
for editing one of these specifications (the OWL 2
RDF-Based Semantics), which makes heavy use of the
original definitions for datatype and datatype maps.
I have provided technical advise to the editors
of SPARQL Entailment Regimes and RIF RDF&OWL
Compatibility among other things with regard to datatype
related semantics. I have created several large test
suites, which are partially about datatype semantics. I
have created many formal proofs based on the datatype
semantics of RDF. I have spend some time thinking about
implementation of datatype semantics in the past, although
not yet implemented into my RDF Semantics reasoner.
And overall I have been working in the RDF field fulltime
continuously for the last 8 years up to the day.
But in all these years with all this gained experience
concerning RDF Semantics in general and RDF datatype
semantics in particular, I have never encountered any
serious problems with the original notion of datatype maps.
Rather, I have always found the original datatype
semantics well designed and it allowed me to do my work
decently. I would never have come to the conclusion that
anything would require a change, in particular not a
change of the kind proposed in RDF 1.1. For me, the old
saying holds that "If it ain't broke, don't fix it!"

Secondly, whatever these unknown commenters were about,
let me say that no change of the semantics whatsoever
will save us from people doing strange or silly things
with datatypes, if they only want to. I can easily, for
example by applying owl:sameAs to two
value-space-incompatible datatype IRIs, do all kinds
of crazy things in the 2004 spec as well as in the
new draft. So the "pathological" argument is most
probably moot.


 > We also note that this change does not alter any
 > entailments.

Again, this depends on the reading of the current
draft. In my reading, most datatype-related entailments
would be removed. In the reading according to the
discussion cited above, nothing would change
semantically. Either way, no change should be made then.


To summarize, even if I give in to the reading of
the current draft as stated in the cited discussion
thread, there is still the problem that a fundamental
aspect of the old RDF model-theoretic semantics
has changed concerning its nomenclature and formal
representation, which is used in the original form
by at least three other core standards of the
Semantic Web. Further, even if I agree with the
reading of the WG, I do not agree that there was
any need for such a change, as the old spec was
perfectly clear and this is clearly confirmed by
its use in several other standards that have been
produced over the years, and by my own long-year
experience in the matter. I further do not agree
that the given change is a simplyfication
but, rather, I consider it to be pretty confusing.
In any case, I see no justification for this change
to break interoperability with three other
Semantic Web standards, which is, of course,
to me the most important reason to reject this
change.

But if the WG still thinks that the change
is appropriate, there is, by no means, any urge
to apply it now, but it can still be postponed
to a later WG, which would also allow to have
more discussion, in particular with regard to
the other standards that use the original datatype
semantics.

I therefore kindly ask the WG to revert the
change and bring back the old notion of a
datatype map consisting of pairs of IRIs and
datatypes, with the necessary adjustments
to the corresponding semantic conditions.


Best Regards,
Michael Schneider
Received on Saturday, 7 December 2013 00:08:25 UTC