RDF 1.1 Semantics Implementation Report on the Swertia RDF-Based Reasoner from Michael Schneider on 2013-12-02 (public-rdf-comments@w3.org from December 2013)

From: Michael Schneider <m_schnei@gmx.de>
Date: Tue, 03 Dec 2013 00:20:34 +0100
To: public-rdf-comments@w3.org
Message-ID: <529D15C2.5050409@gmx.de>
Dear Working Group,

please find below my implementation report for my experimental Swertia 
RDF-Based Reasoner, a system that tries to be a close implementation of 
the model-theoretic semantics of RDF (unlike the many existing systems 
that are more based on the RDF entailment rules). I still wasn't able to 
run the official RDF 1.1 tests, due to lack of time. I also believe that 
the result for the test suite will not become very good, as many of the 
tests are about datatype reasoning, which is not supported by my system. 
Anyway, I still plan to run the tests, as soon I find the time, and also 
plan to provide the results and the prototypical system, but for now I 
provide you with my implementation experiences only. I hope this will 
already be useful for the Working Group.

Best regards,
Michael

= RDF 1.1 Semantics Implementation Report Swertia =

Swertia [1], the Semantic Web Entailment Regime
Translation and Inference Architecture, is intended
to become a generic Semantic Web reasoning framework.
The goal is to provide reasoning support for all major
Semantic Web reasoning standards, including RDF(S),
OWL 2 (Direct Semantics, RDF-Based Semantics, RL/RDF rules),
SWRL, RIF (RIF BLD, RIF Core, RIF+RDF and RIF+OWL combinations),
and Common Logic. Supported reasoning methods are
entailment checking, consistency checking and query answering
in the form of SPARQL entailment regimes. Internally,
Swertia will not provide any reasoning capabilities itself
but will provide all necessary means to enable the use of
existing reasoners, such as first-order logic (FOL)
theorem provers and model finders, to perform reasoning
in the supported Semantic Web standards.

The framework itself is still in an early phase and no
initial release has been published. However, as part
of Swertia, a prototypical reasoner implementation
for reasoning in the RDF 1.0 semantics and the
OWL 2 RDF-based semantics has existed for a while now.
While not in wide use, the reasoner worked quite well
for the experimental purposes of the author, and had
been tested successfully with a comprehensive test suite
for RDF-based reasoning [2], and used for evaluation work
in a published paper [3].

For the RDF 1.1 Semantics, an attempt was made to adapt
the existing reasoner into a system that supports
as much of RDF 1.1 as possible.

== Overview of the Swertia Reasoner ==

The reasoner is mainly a translator of RDF graphs
into FOL formulae represented in the TPTP language [4],
which is understood (directly or indirectly via
translation tools) by the majority of existing
FOL reasoning systems.

The translator itself only translates the input RDF graphs
(the premise and possibly a conjecture graph of an
entailment checking task) into corresponding axiom
and conjecture TPTP formulae, following RDF Simple semantics:
IRIs are translated into constant terms, blanknodes into
existential variable terms, literals into function terms
(with different functions for plain, language-tagged and
typed literals), triples into ternary predicates, and
graphs into conjunctions of such predicates, with globally
scoped existential quantifiers for all the blank nodes
occurring in the graph.

The semantics for the different entailment regimes are not
treated by the RDF translator itself but, rather, the
corresponding semantic conditions are directly modeled
as sets of FOL axiom formulae (usually one formula
per semantic condition).

For reasoning, the axiom formulae that represent the
semantic conditions of the respective entailment regime
are combined with the formulae for the translated input
graphs graphs and given to an FOL theorem prover
for entailment or inconsistency detection
and a FOL model finder for non-entailment
or consistency detection. The final reasoning result
is the combination of the result of the two systems.

== Support for Basic Semantic Conditions ==

By "basic semantic conditions", I refer to all
the semantic conditions that are not specifically
about blank nodes, plain or tagged strings,
or datatypes and typed literals (I will get
to these aspects of the RDF 1.1 Semantics below),
but including all the axiomatic triples
for RDF and RDFS.

For RDF 1.0 and OWL 2 Full, the Swertia RDF translator
itself did not have any particular support for
the basic semantic conditions. Rather, all these
semantic conditions were represented by FOL formulae.
In the past, all the basic semantic conditions
of the RDF 1.0 entailment regimes Simple, RDF, and RDFS
were easily translated into FOL formulae.

For RDF 1.1, I went through all the semantic conditions
of the new entailment regimes to see what needed to
be changed. I found that hardly anything had changed
from the point of view of semantic conditions,
except for the order of entailment regimes.
In fact, for RDF 1.1, all the original basic
semantic conditions turned out to be there again.
Hence, I was able to reuse all the original FOL formulae
for the basic semantic conditions from the old
implementation without change.

== Support for Blank Nodes ==

For RDF 1.0 and OWL 2 Full, the Swertia RDF translator
maps blank nodes into existential variables
that apply to the whole target FOL formula.
For this, the translator iterated the input RDF graph,
looked up all the occurring blank nodes,
and produced a fresh FOL variable name for each
new blank node, while for blank nodes that
re-appeared in different positions of the graph,
the corresponding FOL variable name was reused.
This operation was technically easy to implement,
takes at most n*log(n) time, for n the graph size
(if, for example, a balanced tree representation is used),
and requires up to linear-size space for the
resulting mapping structure (which needs to
be kept throughout the translation process).

For RDF 1.1, nothing relevant had changed wrt.
blank nodes that would have required a change
of this treatment. Hence, there were no additional
or new implementation issues compared to the old
RDF revision, so the implementation was doable
without problems for RDF 1.1.

== Support for Plain and Language-Tagged Strings ==

The original Swertia RDF translator came with specific
support for plain and language-tagged literals
in the translation output format TPTP.
As the translator's input, the Model representation
of the Jena framework [5] was used, which essentially
provides an implementation of the RDF 1.0 Abstract Syntax.
In particular, Jena Model's have direct support
for plain and language-tagged literals.

For both kinds of plain literals, dedicated FOL function terms
have been used in the translation: Plain literals were
represented by unary function terms with the literal's lexical
form represented by a constant term uniquely encoding the string.
Language-tagged literals were represented by binary function
terms, where the first argument term was represented like
that for plain literals, and the second argument term
was a corresponding representation of the language tag
as a constant term.

For RDF 1.1, it was an obvious idea to use the same
FOL functions for representing strings and language-tagged
strings in the FOL output, because their interpretations
(or values) are the same as those of RDF 1.0 plain and
language-tagged literals, respectively. However, I was
unclear what to expect from the input format for the
translation, specifically in the case of language-tagged
strings.

I understand that concrete RDF serialization syntaxes
are free to represent language tagged strings as they like
(including the old tagged plain literal format).
What I do not understand is how they are represented
in the abstract RDF 1.1 model. Afterall, if I use a
framework like Jena, I have to rely on the parsing
from the concrete syntax into the internal representation
model, and I am unclear what will happen for
language-tagged literals. If Jena parses them into
the old representation for language-tagged literals,
than nothing would need to be changed in my implementation.
However, if they are mapped into something else,
I would need to do a change to my translator software
as well.

According to §3.3 of the "Concents and Abstract Syntax"
document, "a literal is a language-tagged string
if and only if its datatype IRI is rdf:langString,
and only in this case the third element is present:..."
I am not sure if I really understand this. So far,
my guess was that a language-tagged string would
be a typed literal, where the lexical form is composed
of the "plain" lexical form", the "@" sign, and then
the language tag, i.e.:

     ( "foo@en" , rdf:langString )

But the above definition sounds to me more as if a
language-tagged string is a /triple/ consisting of
(1) the lexical form /without/ the language tag; and
(2) the language tag; and
(3) the datatype IRI rdf:langString, i.e.

     ( "foo", "en", rdf:langString )

It would be good to clarify the situation to make
it easier for implementers to decide how to support
language-tagged strings.

I, for now, decided to stick with the original implementation,
which mapped Jena representations of language-tagged literals
into binary function terms. Therefore, no changes have
been made so far.

== Support for Datatypes and Typed Literals ==

The most obvious deficit of my original translator
was its almost complete lack of support
for datatype semantics, as support for datatypes
has not yet been of much relevance for my work.
Nevertheless, there have always be plans to support
some level of datatype reasoning, and some initial
ideas have been developed. Definitely, I want to
support datatypes in the future, because without
datatype support at least for rudimentary types
like integer numbers, the system, while appropriate
for some experimental work, will not be of much
practical usefulness.

For RDF 1.1, given the short time for the
Call-for-Implementation phase, I have not undergone
any effort to support datatypes in the RDF 1.1
implementation. But at least I have checked for
changes in the RDF 1.1 specification concerning
datatypes that would have an effect on datatype
reasoning, in order to be sure that I will not
meet problems in the future that would have been
avoidable. This is not only relevant for RDF 1.1,
which provides pretty rudimentary datatype semantics,
but also for expressive semantic extensions,
such as OWL 2 Full.

The obvious way to start was to compare the original
RDF 1.0 semantics with the new semantics w.r.t.
datatypes. If there would be no or only marginal
changes, this would mean that if an implementation
would work for RDF 1.0, there should not be too
many surprises with an implementation for RDF 1.1.
Or put differently: any big problems with the
RDF 1.1 semantics would have already be problems
for RDF 1.0.

Comparing the two semantics, it became clear that,
apart from some reordering of the semantic conditions
due to the reordering of the entailment regimes,
the semantics remained technically almost identical:
essentially the same semantic conditions that were
present in the old specification in chapter 5
of datatypes were again present in the new spec,
although spread over different places.

The only problem that I found was with the new notion
of "identified datatypes":
In the original spec, the notion of a datatype map
was that of a set of pairs, which stated associations
between URIs and the corresponding datatypes.
So, for example, if a semantic extension of
RDF 1.0 D-entailment was meant to include the
xsd:integer datatype, one was able to state that
the datatype map D contained the pair consisting
of the URI "xsd:integer" with the particular datatype
of integers as defined in the XSD Datatypes spec.
For implementations, this would make the situation
sufficiently clear. In RDF 1.1, we only get a
set of datatype IRIs, and the actual association
with concrete datatypes is not directly supported.
So an implementation of a particular semantic extension
of RDF 1.1 needs to somehow find out what the
associations are. Of course, a definition of a
particular semantic extension would tell the
identified datatypes for the identifying IRIs
in SOME way, but in any case if WILL
have to say what the association is, otherwise
it would be impossible for an implementation to ever
become compliant. In other words, there must
_always_ be such an association in order to be
useful, because just a set of IRIs can be
interpreted in any arbitrary way. Therefore, the
RDF spec should, as it did in the past, and
as several other W3C standards on top of it
such as RIF, SPARQL 1.1, and OWL 2, do, support
this idea directly in terms of a set of
associations, not only as a set of IRIs alone!

== Conclusions ==

For the most part, the adaptation of the existing
RDF translator was straight-forward and little
was to be done. There was some confusion about
the representation of language-tagged strings,
specifically what their real representation is
in the RDF 1.1 abstract syntax. The specification
should be clearer about this. As the original
RDF translator did not offer explicit support for
datatype semantics, and there was only very little
time given by the CfI, I decided not to do
any implementation effort for datatype semantics,
and only have a look what /would/ have to change
if I had datatype support. It turned out that
technically the semantics has not changed much.
However, one problem (not so much for RDF(S),
but for more expressive systems with more datatypes)
would in my opinion be that the RDF 1.1
semantics does not support stating explicit associations
between datatype IRIs and the correspondingdatatypes,
but leaves it to other specifications to find a way
to specify these relationships. I consider this
to be a problem, and it is definitely a deviation
from the original RDF specification, that should
not be done.

== References ==

[1] Swertia Home: http://swertia.sourceforge.net/
(doesn not contain any sources or binaries currently)

[2] Schneider, M., Mainzer, K.: A Conformance Test Suite
for the OWL 2 RL/RDF Rules Language and the
OWL 2 RDF-Based Semantics. In: Proceedings of the 6th
International Workshop on OWL: Experiences and Directions
(OWLED 2009). CEUR Workshop Proceedings, vol. 529 (2009)

[3] Michael Schneider and Geoff Sutcliffe: Reasoning in
the OWL 2 Full Ontology Language using First-Order Automated
Theorem Proving. In: Proceedings of the 23rd International
Conference on Automated Deduction (CADE 2011), pp. 446-460,
LNAI 6803 (2011).

[4] TPTP Home and language specification: http://tptp.org/

[5] Jena Home: http://jena.apache.org/
Received on Monday, 2 December 2013 23:21:07 UTC