Design:EntailmentRegimes

From SPARQL Working Group
Revision as of 18:12, 13 October 2009 by BGlimm (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

SPARQL 1.1/Entailment Regimes

Authors
Birte Glimm, Oxford University Computing Laboratory
Bijan Parsia, University of Manchester
Abstract
Status of this Document
This document is an editors' draft.
Issues include:
* http://www.w3.org/2009/sparql/track/issues/28 - Entailment regimes vs. update?
* http://www.w3.org/2009/sparql/track/issues/34 - How do entailment regimes interaction with aggregates, grouping, and blank nodes?
* http://www.w3.org/2009/sparql/track/issues/40 - How can other entailment regimes plug in their semantics to SPARQL/Update?
* http://www.w3.org/2009/sparql/track/issues/42 - TF-ENT What should happen for RDFS entailment in the face of inconsistencies?
* http://www.w3.org/2009/sparql/track/issues/43 - should entailment-regimes be declared over the whole dataset or individual graphs?

Copyright © 2009 by the Authors.


Abstract

SPARQL is a query language for data that is stored natively as RDF or viewed as RDF via middleware. What correct answers to a SPARQL query are depends on the used entailment regime and the vocabulary from which the resulting answers can be taken. The first version of SPARQL [SPARQL/Query 1.0] was defined only for simple entailment, but it defines a set of conditions that have to be met when defining what correct results are for SPARQL queries under different entailment regimes. The goal of this document is to specify conditions such that SPARQL can be used with entailment regimes other than simple entailment, most notably we define the semantics of SPARQL queries under RDFS entailment and OWL Direct and RDF-Based semantics.


Introduction

The first SPARQL specification defines only simple entailment, which allows for finding answers by matching the triple pattern of the query onto the RDF graph of the queried data. Other entailment regimes, such as RDFS entailment, allow for finding answers to a query that are not directly specified in the queried graph, but can be infered using a set of inference rules. In this document, we specify how SPARQL can be used with entailment regimes other than simple entailment.

SPARQL Entailment Regimes is closely related to the following specifications:

In places, we refer to the RDF or RDFS entailment rules from the RDF Semantics specification. This rules are just used in an informative way and implementations are not expected to implement these rules as they are used here.

Document Conventions

Throughout the document, we adhere to certain conventions that are outlined below.

Graph Syntax

This document uses the Turtle [TURTLE] data format to show triples explicitly. This notation uses a node identifier (nodeID) convention to indicate blank nodes in the triples of a graph. While node identifiers such as '_:xxx' serve to identify blank nodes in the surface syntax, these expressions are not considered to be the label of the graph node they identify; they are not names, and do not occur in the actual graph. In particular, the RDF graphs described by two Turtle documents which differ only by re-naming their node identifiers will be understood to be equivalent. This re-naming convention should be understood as applying only to whole documents, since re-naming the node identifiers in part of a document may result in a document describing a different RDF graph. A generated blank node may also be made with [].

IRIs are written enclosed in '<' and '>' and may be absolute RDF IRI References or relative to the current base IRI. IRIs may also be abbreviated by using Turtle's @prefix directive that allows declaring a short prefix name for a long prefix of repeated IRIs. Once a prefix such as @prefix foo: <http://example.org/ns#> is defined, any mention of an IRI later in the document may use a qualified name that starts foo: to stand for the longer IRI. For example, the qualified name foo:bar is a shorthand for the IRI http://example.org/ns#bar.

For example, the following triples use prefixes and abbreviated IRIs and also non-abbreviated IRIs such as <book2>, which are relative to the base IRI of the document.

@prefix dc:   <http://purl.org/dc/elements/1.1/> .
@prefix :     <http://example.org/book/> .
:book1  dc:title "SPARQL Tutorial" . 
<book2> dc:title "Turtle Tutorial" .


Namespaces

Examples assume the following namespace prefix bindings unless otherwise stated:

Prefix IRI
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs: http://www.w3.org/2000/01/rdf-schema#
xsd: http://www.w3.org/2001/XMLSchema#
fn: http://www.w3.org/2005/xpath-functions#

In the interests of brevity, the prefix ex: is also used in the examples. The prefix is assumed to be bound to an imaginary IRI such as 'http://www.example.org/'.

Preliminary Definitions

This document uses the same definitions as the SPARQL Query Language specification and we recapture the most important terms here for clarity. In the case of any difference, the SPARQL Query Language definitions are the normative ones.

We use RDF-L for the set of all RDF Literals, RDF-B for the set of all blank nodes in RDF graphs, and RDF-T for the set of RDF Terms, which is a union of the set of all IRIs, RDF-L, and RDF-B.

We use V for the set of query variables, where V is infinite and disjoint from RDF-T. A triple pattern is member of the set:

(RDF-T union V) x (I union V) x (RDF-T union V)

A Basic Graph Pattern is a set of Triple Patterns.


Effects of Different Entailment Regimes

The SPARQL 1.0 specification already envisages that SPARQL can be used with entailment regimes other than simple entailment and to illustrate the differences between simple, RDF, and RDFS entailment, we consider the following data:

(1) ex:book1 rdf:type ex:Publication .
(2) ex:book2 rdf:type ex:Article .
(3) ex:Article rdfs:subClassOf ex:Publication .
(4) ex:publishes rdfs:range ex:Publication .
(5) ex:MITPress ex:publishes ex:book3 .

We use the following query that asks for a list of all publications

SELECT ?pub WHERE { ?pub rdf:type ex:Publication . }

Clearly, ex:book1 is an answer due to triple (1). Intuiltively, we can expect that ex:book2 is also a publication because it is an article (2) and all articles are publications (3). Even ex:book3 is a publication because it is published by MIT Press (5) and everything that is published is a publication (4). Under simple and RDF entailment, ex:book1 is the only answer because a system that uses simple entailment will not perform any of the reasoning steps that were required to find that ex:book2 and ex:book3 are publications. Under simple entailment, we check how the basic graph pattern ?pub rdf:type ex:Publication can be mapped to the queried graph and by instantiating ?pub with ex:book1 we have a match. RDF already supports a few inferences, but not those that are required to derive that ex:book2 and ex:book3 are publications. In order to retrieve ex:book2 and ex:book3, one would need a system that supports RDFS entailment. Under RDFS entailment, a set of RDFS entailment rules is applied to the explicitly given data to derive new consequences. E.g., the rule rdfs9 can be applied to the triples (3) and (2) to derive

(6) ex:book2 rdf:type ex:Publication .

The rule rdfs3 can be applied to (4) and (5) to derive

(7) ex:book3 rdf:type ex:Publication .

The triples (6) and (7) can then be used to find that ex:book2 and ex:book3 are also answers to the query under an RDFS entailment regime.

The difference between RDF and simple entailment is less significant since RDF supports only few inferences. Consider, for example, the following query:

SELECT ?prop WHERE { ?prop rdf:type rdf:Property . }

Under simple entailment the query has an empty answer when querying the above graph. Under RDF entailment, we can use the RDF rule rdf1 on (5) to derive the triple

ex:publishes rdf:type rdf:Property .

which means that ex:publishes is a valid binding for ?prop and will be returned as an answer for the query from a system that uses RDF entailment.

The web ontology language OWL allows for even more inferences and in the remainder, we specify what correct answers are for the different entailment regimes.


Extensions to Basic Graph Pattern Matching

The SPARQL 1.0 specification gives a set of conditions that have to be met when extending the basic graph pattern matching to other entailment regimes and SPARQL 1.0 says:

An entailment regime specifies
1. a subset of RDF graphs called well-formed for the regime
2. an entailment relation between subsets of well-formed graphs and well-formed graphs.

Since OWL, for example, makes restrictions on the well-formedness of RDF graphs, the first condition can be used to define an OWL entailment regime for those RDF graphs only that represent an OWL ontology. For the entailment relations mentioned in the second condition we use those entailment relations that are already specified and used in the Semantic Web such as RDF(S) entailment, OWL entailment, or RIF entailment.

SPARQL 1.0 further defines a set of conditions for extensions of the basic graph pattern matching. We repeat these conditions here for clarity and give an idea of how the different entailment regimes satisfy them. Further details for each of the specified entailment regimes can be found in the corresponding section for that entailment regime.

1 -- The scoping graph, SG, corresponding to any consistent active graph AG is uniquely specified and is E-equivalent to AG.

2 -- For any basic graph pattern BGP and pattern solution mapping P, P(BGP) is well-formed for E.

3 -- For any scoping graph SG and answer set {P1 ... Pn} for a basic graph pattern BGP, and where {BGP1 .... BGPn} is a set of basic graph patterns all equivalent to BGP, none of which share any blank nodes with any other or with SG

   SG E-entails (SG union P1(BGP1) union ... union Pn(BGPn))

These conditions do not fully determine the set of possible answers, since RDF allows unlimited amounts of redundancy. In addition, therefore, the following must hold.

4 -- Each SPARQL extension must provide conditions on answer sets which guarantee that every BGP and AG has a finite set of answers which is unique up to RDF graph equivalence.

We do not change any of the existing entailment relations, but rather define the vocabulary from which possible answers can be taken and which answers are legal answers in order to guarantee that query answers are always finite. We also do not restrict the set of legal graphs that can be queried apart from the restriction to graphs that are legal under the entailment regime in question. E.g., under the RDFS entailment regime, one can query all legal RDFS graphs, while under OWL 2 DL Direct Semantics, one can query all graphs that correspond to legal OWL 2 DL ontologies. We do define which queries are legal and how illegal queries or graphs and inconsistencies are handled.

For 1: All entailment regimes specified here use the same definition of a scoping graph as given in SPARQL 1. Thus, the required equivalence is immediate.

For 2: This gives minimal conditions on what legal answers are for an entailment regime E: A mapping is only a legal solution mapping and included in the answer if applying the mapping yields a set of RDF triples that are well-formed for E. For example, under RDFS entailment, any SPARQL query is legal, but queries that require literals as a binding for a variable in a subject position have no answer because all mappings that result in a set of RDFS entailed tripes are not well-formed RDF since RDF forbids literals in the subject position. Similarly, for OWL 2 DL entailment, a query might have no answer because all possible bindings might result in RDF triples that are not well-formed for OWL 2 DL.

For 3: This condition prevents the reuse of blank nodes between query answers unless those blank nodes are really the same in the queried graph. Under this restriction no accidental co-references among blank nodes are introduced. Since we use the same definition of a scoping graph, the condition is also satisfied.

Editor's Note: The proof needs to be redone though since it uses the interpolation lemma, which does not hold e.g., for RDFS-entailment.

For 4: For entailment regimes other then simple entailment this point is very important since a finite set of answers is no longer guaranteed. For example, already under RDF and RDFS entailment, even the empty graph entails an infinite number of axiomatic triples such as rdf:_1 rdf:type rdf:Property, rdf:_2 rdf:type rdf:Property, ... Thus, a query with BGP { ?x rdf:type rdf:Property . } would, without further restrictions, have infinitely many answers. In order to guarantee finite answers, different entailment regimes will impose different restrictions on the vocabulary from which bindings can be taken. We explain these restrictions in greater detail in the following section.


Details for different entailment regimes

For each of the covered entailment regimes, we adress all the conditions that entailment extensions to SPARQL basic graph pattern matching have to satify.


RDF Entailment

RDF entailment is closest to simple entailment in that it provides only few additional answers and RDF is not expressive enough to express inconsistencies. RDF does, however, entail an infinite set of axiomatic triples and we have to specify how we can guarantee the forth condition on extensions of basic graph pattern matching, which requires that answer sets are always finite.


Name RDF
Legal Graphs Any legal RDF graph.
Legal Queries Any legal SPARQL query.
Illegal handling An illegal graph or query (syntax errors) causes an error.
Entailment RDF Entailment
Inconsistency RDF graphs are always RDF consistent and no inconsistency handling is required.
Query Answers A pattern instance mapping P for RDF entailment is a pair (μ, σ), where μ : V -> RDF-T is a solution mapping and σ : RDF-B -> RDF-T is an RDF instance mapping. Let BGP be a basic graph pattern, V(BGP) the set of variables in BGP, B(BGP) the set of blank nodes in BGP, G an RDF graph, and P=(μ, σ) a pattern instance mapping for RDF entailment. We write P(BGP) to denote the result of replacing each variable v from V(BGP) for which μ is defined with μ(v) and each blank node b from B(BGP) for which σ is defined with σ(b).

A solution mapping μ is a possible solution for BGP from G under RDF entailment if dom(μ) = V(BGP) and there is an RDF instance mapping σ from B(BGP) to RDF-T such that dom(σ)=B(BGP) and the pattern instance mapping P=(μ, σ) is such that P(BGP) are well-formed RDF triples that are RDF entailed by G.

A possible solution μ is a solution for BGP from G under RDF entailment if (C1) each subject s of a triple ( s, p, o ) in P(BGP) occurs in the scoping graph and (C2) each blank node b in the range of μ and σ occurs in the scoping graph SG.

Please note that we define what legal answers under RDF entailment are in a two-stage process. Intuitively, the possible answers are all answers that one would expect under RDF entailment, i.e., all mappings such that instantiating the basic graph patterns with them results in RDF triples that are RDF entailed by the queried graph. The set of possible answers is, however, not necessarily finite. To obtain always a finite set of answers, the next step defines which of the possible answers are actually to be returned as answers to the query. In this step, we restrict answers so that we only return finitely many of the axiomatic triples (C1) and return no answers that are equivalent to other answers and just contain fresh blank nodes that were created by applications of RDF entailment rules (C2). We comment on these two restrictions below.

Examples for the Restrictions on Solutions (Informative)

The following example illustrates the use of both conditions C1 and C2. Consider the query

SELECT ?x WHERE { ?x rdf:type rdf:property . } 

against a graph containing only the triples

ex:a ex:b ex:c . 
_:xxx rdf:type rdf:Bag .
_:xxx rdf:_1 ex:a .

We assume that the scoping graph is the same as the active graph, i.e., the graph for the triples shown above. As part of the possible solutions we find { (x, ex:b) } since according to the rdf entailment rule rdf1

ex:a ex:b ex:c . 

entails

ex:b rdf:type rdf:property . 

Further, from the axiomatic triples, we also get the possible solutions { (x, rdf:_1) }, { (x, rdf:_2) }, ... We get even more possible answers since the simple entailment rule se2 means that

ex:b rdf:type rdf:property . 

RDF entails

_:exb1 rdf:type rdf:property . 

for some blank node _:exb1 allocated to ex:b and { (x, _:exb1) } is a possible solution. The rule se2 can be applied again to the just mentioned triple and we get that the following triple is also entailed by the queried graph

_:exb2 rdf:type rdf:property . 

for some blank node _:exb2 allocated to _:exb1 and { (x, _:exb2) } is a possible solution too. We can in fact infer infinitely many blank nodes to be possible solutions to the query.

Clearly, the set of possible solutions is infinite in this case, but for a possible solution to actually be a solution, the two conditions C1 and C2 have to be met: (C1) each subject s of a triple ( s, p, o ) in P(BGP) occurs in the scoping graph and (C2) each blank node b in the range of μ and σ occurs in the scoping graph SG.

In this case, condition C1 limits the answers from the axiomatic triples. We consider the two examplary possible answers { (x, rdf:_1) } and { (x, rdf:_2) }.

  • For the possible solution μ: x->rdf:_1, we can take an empty RDF instance mapping σ to obtain a pattern instance mapping P=(μ, σ) with P(BGP) =
rdf:_1 rdf:type rdf:property . 

and since the subject rdf:_1 occurs in the scoping graph (here the scoping graph and the active graph are equal) and condition C2 is trivially satisfied, and this solution mapping is actualy a solution.

  • For the possible solution μ: x->rdf:_2, we can again take an empty RDF instance mapping σ to obtain a pattern instance mapping P=(μ, σ) with P(BGP) =
rdf:_2 rdf:type rdf:property . 

but since the subject rdf:_2 does not occur in the scoping graph, this solution mapping is not a solution. We can argue similarly for rdf:_n with n > 2.

The second condition rules out { (x, _:exb1) }, { (x, _:exb2) }, ... since the blank nodes _:exb1, _:exb2, ... do not occur in the scoping graph (again the scoping graph equals the active graph here). Thus, the only answers are { (x, ex:b) } and { (x, rdf:_1) }.

We give another example for RDFS entailment that shows the behavior if the query contains blank nodes and for the case where entailment really makes a difference other than adding some axiomatic triples.


Literals in the Subject Position (Informative)

Please note that solution mappings that map variables that occur in the subject in the basic graph pattern BGP to literals will not be returned as solutions even though there might be a pattern instance mapping P for the solution mapping such that P(BGP) is RDF entailed by the queried graph, but P(BGP) is not well-formed as required (see also the SPARQL triple patterns definition). E.g., given a query

SELECT ?x WHERE { ?x rdf:type rdf:XMLLiteral . }

even the empty graph would RDF entail all statements xxx rdf:type rdf:XMLLiteral for xxx a well-formed RDF XML literal, but applying a solution mapping such as μ : ?x -> "<a>abc</a>"^^rdf:XMLLiteral would result in a triple that is not a valid RDF triple.

Please note that triples with literals in the subject positions are currently not considered well-formed RDF, but this might be changed in the future. If literals were allowed in the subject position, condition C1 would still guarantee finite answers.


Boolean Queries (Informative)

The two conditions C1 and C2 also have an effect on the answers to Boolean queries. For Boolean queries that contain variables, e.g.,

ASK { ?x rdf:type rdf:property . }

this is straighforward. The query answer is yes (true) if ?x has at least one solution and it is no (false) otherwise. For Boolean queries without variables it is less clear how the answer is determined. We have two possible outcomes for a Boolean query without variables: there is a solution sequence containing a mapping ( μ ) where μ has an empty domain (it does not map any variable to anything) or there is only an empty solution sequence ( ). In the first case, the query answer is yes (true), whereas in the second case the query answer is no (false). Thus we can still apply the two conditions on possible solutions and check whether the solution mapping is indeed a solution. For example, if we ask the above query against the empty graph, we find that the solution mapping μ with empy domain is a possible solution, but condition C1 rules this solution mapping out as an answer and the query answer is no (false).


Other Possible Design Choices for Finite Answers (Editor's Comment)

1) Instead of using the vocabulary of the graph to limit the number of axiomatic triples, we could simply exclude the axiomatic triples from the set of query answers.

2) We do return axiomatic triples, but SELECT or CONSTRUCT queries that contain one of the following the triple patterns

?x rdf:type rdf:Property . 
?x rdf:type rdfs:ContainerMembershipProperty . 
?x rdfs:domain rdfs:Resource . 
?x rdfs:range rdfs:Resource . 

must satisfy additional safety restrictions. E.g., if a query contains the one of the above triple patterns, then it must specify a limit clause. This would have the advantage that we do have a kind of monotonic behavior, i.e., both ask queries described above would have the answer yes and the subject would be contained in the answers for the according select query, but the additional limit clause would restrict answers to a finite subset of all answers. The disadvantage is that we do have no control as to what would be in the returned answer and what would be left out, which is not very intuitive either.

3) Since, as long as literals are not allowed in subject positions, only subjects of the form rdf:_1, rdf:_2, ... can cause infinite answers, we could also look at what the largest n is in elements of the form rdf:_n and return axiomatic triples such as rdf:_m rdf:type rdf:property only if m<=n. This falls apart, however, if we allow literals in subject position, since in that case

xxx rdf:type rdf:XMLLiteral 

is an answer to the query

SELECT ?x WHERE {  ?x rdf:type rdf:XMLLiteral }

and xxx is a well-formed RDF XML literal.


RDFS Entailment

Under RDFS we have not only more entailment rules, which result in possibly more query answers, but we also have to handle inconsistencies, which can result in infinite answer sets since an inconsistet graph RDFS entails any consequence.


Name RDFS
Legal Graphs Any legal RDF graph that is RDFS-consistent.
Legal Queries Any legal SPARQL query.
Illegal handling An illegal graph or query (syntax error) causes an error.
Entailment RDFS Entailment
Inconsistency If the queried graph is RDFS-inconsistent, an implementation MAY generate an error or warning and SHOULD generate such an error or warning if, in the course of processing, it determines that the data or query is not compatible with the request.
Query Answers A pattern instance mapping P for RDFS entailment is a pair (μ, σ), where μ : V -> RDF-T is a solution mapping and σ : RDF-B -> RDF-T is an RDF instance mapping. Let BGP be a basic graph pattern, V(BGP) the set of variables in BGP, B(BGP) the set of blank nodes in BGP, G an RDF graph that is RDFS-consistent, and P=(μ, σ) a pattern instance mapping for RDFS entailment. We write P(BGP) to denote the result of replacing each variable v from V(BGP) for which μ is defined with μ(v) and each blank node b from B(BGP) for which σ is defined with σ(b).

A solution mapping μ is a possible solution for BGP from G under RDFS entailment if dom(μ) = V(BGP) and there is an RDF instance mapping σ from B(BGP) to RDF-T such that dom(σ)=B(BGP) and the pattern instance mapping P=(μ, σ) is such that P(BGP) are well-formed RDF triples that are RDFS entailed by G.


A possible solution μ is a solution for BGP from G under RDFS entailment if (C1) each subject s of a triple ( s, p, o ) in P(BGP) occurs in the scoping graph and (C2) each blank node b in the range of μ and σ occurs in the scoping graph SG.


As under RDF entailment, we use a two-stage process to define what legal answers under RDFS entailment are. Possible answers are all answers that one would expect under RDFS entailment, i.e., all mappings such that instantiating the basic graph patterns with them results in RDF triples that are RDFS entailed by the queried graph. The set of possible answers is, however, not necessarily finite. To obtain always a finite set of answers, the next step defines which of the possible answers are actually to be returned as answers to the query and we restrict answers so that we only return finitely many of the axiomatic triples (C1) and return no answers that are equivalent to other answers and just contain fresh blank nodes that were created by applications of RDFS entailment rules (C2).

Please note that this definition of answers works well with the sound and complete entailment algorithm for RDFS as proposed by ter Horst [RDFSENTAILMENT] and guarantees interoperability between systems independent of the intermediate and possibly redundant consequences that different systems might add or avoid to add.


Examples for the Restrictions on Solutions (Informative)

For RDFS entailment, we consider an example that uses RDFS entailment to derive implicit consequences and we also consider a query with blank nodes in it. The active graph in this example is for the triples

_:a ex:s _:b1 . 
_:a ex:t _:b2 . 
ex:s rdfs:subPropertyOf ex:r . 

We again assume that the scoping graph is the same as the active graph and the query we consider is

SELECT ?x WHERE { ?x ex:r _:a . }

We deliberately chose to use a blank node in the query that occurs in the active graph. Since we consider RDFS entailment, the active graph entails a number of interesting triples such as

_:a ex:r _:b1 . 
_:aa ex:r _:b1 . 
_:a ex:r _:b11 . 
_:aa ex:r _:b11 . 

just to name some. The active graph does, however, not entail

 _:x ex:r _:a . 

since it is not the case that every RDFS interpretation must map _:a to an element that is the ex:r successor of something. The graph does also not entail

_:a ex:s _:b2 . 

since ex:t is not a subproperty of ex:s.

From the entailed triples, we get a number of possible solutions, e.g.,

P1=(μ1, σ1) with μ1 : x->_:a and σ1 : _:a->_:b1, 
P2=(μ2, σ2) with μ2 : x->_:aa and σ2 : _:a->_:b1, 
P3=(μ3, σ3) with μ3 : x->_:a and σ3 : _:a->_:b11, 
P4=(μ4, σ4) with μ4 : x->_:aa and σ4 : _:a->_:b11, 

etc., but we do not get P=(μ, σ) with μ : x->_:x and σ : _:a->_:a since P(BGP) is not entailed by the scoping graph as argued above. To be solutions, the possible solution have to satisfy C1 and C2: (C1) each subject s of a triple ( s, p, o ) in P(BGP) occurs in the scoping graph and (C2) each blank node b in the range of μ and σ occurs in the scoping graph SG. Here we have

P1(BGP)=_:a ex:r _:b1 . 
P2(BGP)=_:aa ex:r _:b1 . 
P3(BGP)=_:a ex:r _:b11 . 
P4(BGP)=_:aa ex:r _:b11 . 

and P1 and P3 do satisfy condition C1, but condition C2 rules out P3 and we have a single answer { (x, _:a) } from P1. Interestingly, it is not the case that the result of instatiating the BGP ?x ex:r _:a with the solution mapping (_:a ex:r _:a) results in a triple that is RDFS entailed by the active graph because we would have to use an appropriate RDF instance mapping too. Although _:a occurs in the BGP and in the queried graph, it is not required that the name in the query must match the name in the graph because blank nodes are just stating the existency of something. We can equally use the query

SELECT ?x WHERE { ?x ex:r []. }.

We only need names for the blank nodes in the query to enforce co-references. E.g., in the query

SELECT ?x WHERE { ?x ex:r _:b . _:b ex:s ex:a. }.

we ask for elements that start an ex:r ex:s chain. If the queried graph contains also a blank node _:b, then the name _:b in the query is, however, not necessarily mapped to the same element as the _:b in the queried graph.

We can consider the above example again, but this time with a scoping graph that does use different names for the blank nodes in the active graph and the query. E.g., assume that the scoping graph SG contains the triples

_:sga ex:s _:sgb1 . 
_:sga ex:t _:sgb2 . 
ex:s rdfs:subPropertyOf ex:r . 

The query is again

SELECT ?x WHERE { ?x ex:r _:a . }

The scoping graph entails a number of intersting triples such as

_:sga ex:r _:sgb1 . 
_:aa ex:r _:b1 . 
_:sga ex:r _:b11 . 
_:aa ex:r _:b11 . 

just to name some. The scoping graph does, however, not entail

 _:x ex:r _:sga . 

since it is not the case that every RDFS interpretation must map _:sga to an element that is the ex:r successor of something. The graph does also not entail

_:sga ex:s _:sgb2 . 

since ex:t is not a subproperty of ex:s.

From the entailed triples, we get a number of possible solutions, e.g.,

P1=(μ1, σ1) with μ1 : x->_:sga and σ1 : _:a->_:sgb1, 
P2=(μ2, σ2) with μ2 : x->_:aa and σ2 : _:a->_:sgb1, 
P3=(μ3, σ3) with μ3 : x->_:sga and σ3 : _:a->_:b11, 
P4=(μ4, σ4) with μ4 : x->_:aa and σ4 : _:a->_:b11, 

etc., but we do not get P=(μ, μ) with μ : x->_:x and σ : _:a->_:sga since P(BGP) is not entailed by the scoping graph as argued above. To be solutions, the possible solution have to satisfy C1 and C2: (C1) each subject s of a triple ( s, p, o ) in P(BGP) occurs in the scoping graph and (C2) each blank node b in the range of μ and σ occurs in the scoping graph SG. Here we have

P1(BGP)=_:sga ex:r _:sgb1 . 
P2(BGP)=_:aa ex:r _:sgb1 . 
P3(BGP)=_:sga ex:r _:b11 . 
P4(BGP)=_:aa ex:r _:b11 . 

and P1 and P3 do satisfy condition C1 and condition C2 rules out P3 and we again have a single answer { (x, _:sga) } from P1.

Inconsistencies (Informative)

An rdfs-inconsistent graph RDFS-entails any graph, but there are limited possibilities to express an inconsistency in RDFS. Every inconsistency is due to a literal of type rdf:XMLLiteral, where the lexical form is a mal-formed XML string, e.g.,

ex:a ex:b "<"^^rdf:XMLLiteral .

in combination with a range restriction on the property, e.g.,

ex:b rdfs:range rdf:XMLLiteral .

The first triple alone does not cause an inconsistency. It only requires that the literal "<"^^rdf:XMLLiteral is interpreted as something that is not in the extension of rdfs:Literal. Since rdfs:Literal contains rdf:XMLLiteral, the second triple together with the first one results in an inconsistency. The following example illustrates that an inconsistency is not always as directly visible as in the example above and one might need to apply some inference rules to detect it. E.g., consider the following triples (numbers are only given to explain the inferences later):

(1) ex:a rdfs:subClassOf rdfs:Literal .
(2) ex:b rdfs:range ex:a .
(3) ex:c rdfs:subPropertyOf ex:b.
(4) ex:d ex:c "<"^^rdf:XMLLiteral .

Here we can to derive the inconsistency as follows:

(6) ex:d ex:b "<"^^rdf:XMLLiteral .    (by applying rule rdfs7 to (3) and (4))
(7) "<"^^rdf:XMLLiteral rdf:type ex:a.   (by applying rule rdfs3 to (2) and (6))
(8) "<"^^rdf:XMLLiteral rdf:type rdfs:Literal .   (by applying rule rdfs9 to (1) and (7))

At this point, we can detect the inconsistency since "<" is not a valid lexical form for an RDF XML literal and we have to interpret it as some element tha is NOT in rdfs:Literal, but at the same time it should be of type rdfs:Literal. The triple derived last is characteristic for an RDFS inconsistency.


Effects of Unchecked Inconsistencies (Editor's Comment)

Please note that the above definition of the RDFS entailment regimes does not require that systems MUST generate an error or a warning in the case of an inconsistency, but systems MAY generate an error or warning. A system SHOULD generate such an error or warning if, in the course of processing, it determines that the data or query is not compatible with the request. Thus, it can happen that we query an inconistent graph and get a (finite) answer sequence as a result although such a graph would in fact (trivially) entail any RDF triple. This was to allow for more efficient implementations. Consider, for example, a default graph containing the following triples

ex:b ex:s ex:y1 .
ex:b ex:s ex:y2 .
...
ex:b ex:s ex:y10000 .
ex:a ex:d "<"^^rdf:XMLLiteral .
ex:d rdfs:range rdf:XMLLiteral . 

and a query

SELECT * WHERE { ?x ex:r ex:b . ex:b ex:s ?y . }

which requires a join operation in the query processor. This graph is RDFS-inconsistent due to the last two triples, but the query processor might only know (after parsing) that there is no ex:r property at all in the graph. Thus, the processor knows that it does not have to evaluate the query, but a required check for consistency can be very costly (there could be more than 10,000 ex:b ex:s ex:yn tuples) and after a possibly long time the user would get an error.

Another motivation are, for example, queries that require a union. For example, the query

SELECT * WHERE { {BGP1} UNION {BGP2} }

can be executed by dispatching BGP1 and BGP2 in parallel to some processing element, streaming results back to the caller from either side of the UNION as they become available. Since the use of HTTP for streaming back results places some constraints on what can be done, e.g., the error or success code must be transmitted before results start, at the point where a query processor might discover the inconsistency it might be too late for a conformant way of communicating this to the client.

A consequence of not requiring a consistency check is that in the case of an inconsistency, not all conditions on extensions of basic graph pattern matching are satisfied. For example, the first condition

The scoping graph, SG, corresponding to any consistent active graph AG is uniquely specified and is E-equivalent to AG.

explicitly mentions that the scoping graph must be E-equivalent (RDFS equivalent in this case) to the active graph and that AG must be consistent. Clearly that is not possible if the active graph is inconsistent. One way around this would be to specify an entailment regimes for a subset of RDFS that cannot express inconsistencies, e.g., be defining well-formed graphs for the regime as those that contain only syntactically valid rdf XML literals. Any graph with an invalid RDF XML literal can be rejected for that regime. This requires just a syntactic check and all legal graphs for the regime will be consistent.

We could also define different two RDFS regimes, one for systems that guarantee a consistency check and one for systems that cannot give this guarantee. A system without such a guarantee allows for a more efficient implementation. A system with such a guarantee is satisfying the conditions on extending basic graph pattern matching, is closer to RDFS entailment as it is defined, and allows users to detect and repair problems (usually inconsistencies are unintended), for a higher cost of implementation and a possibly slower response time in general.


D-Entailment

Editor's Note: D-Entailment is just a hook at the moment and might be merged with RDFS...


Name D-Entailment
Legal Graphs All legal RDF graphs.
Legal Queries Any legal SPARQL query.
Illegal handling An illegal graph or query causes an error.
Entailment D-Entailment
Inconsistency TBD
Query Answers TBD

OWL 2 RL

Editor's Note: OWL 2 RL is just a hook at the moment...


Name OWL 2 RL
Legal Graphs Any graph that can be represented in an OWL 2 RL document.
Legal Queries Any legal SPARQL query.
Illegal handling An illegal graph or query (syntax error or outside of OWL 2 RL) causes an error.
Entailment RDF Based Semantics
Inconsistency TBD
Query Answers TBD

OWL Full

Editor's Note: OWL Full is just a hook at the moment...


Name OWL Full
Legal Graphs All legal RDF graphs.
Legal Queries Any legal SPARQL query.
Illegal handling An illegal graph or query causes an error.
Entailment RDF Based Semantics
Inconsistency TBD
Query Answers TBD

OWL 2 DL, OWL 2 EL, and OWL 2 QL

Editor's Note: OWL 2 DL, EL, and QL is just a hook at the moment, EL and QL are just restrictions of DL...


Name OWL 2 DL
Legal Graphs Any RDF graph which is an OWL 2 DL ontology document which does not entail that owl:Thing rdfs:SubClassOf owl:Nothing, i.e., the given ontology must be consistent.
Legal Queries Any legal SPARQL query.
Illegal handling In case the query or the queried graph is illegal, the system must raise an error.
Entailment See OWL 2 conformance
Inconsistency Inconsistent graphs are illegal thus trigger an error.
Query Answers TBD

RIF

Editor's Note: RIF is just a hook at the moment...


Name RIF
Legal Graphs All legal RDF graphs.
Legal Queries Any legal SPARQL query.
Illegal handling An illegal graph or query causes an error.
Entailment TBD
Inconsistency TBD
Query Answers TBD


Entailment Regimes and Data Sets (Informative)

Many RDF data stores hold multiple RDF graphs and applications can make queries that involve information from more than one graph. In this section we clarify how entailment regimes behave in the presence of named graphs.

As defined in the SPARQL specification, a SPARQL query is executed against an RDF Dataset which represents a collection of graphs. An RDF Dataset comprises one graph, the default graph, which does not have a name, and zero or more named graphs, where each named graph is identified by an IRI. The graph that is used for matching a basic graph pattern is the active graph. Under an entailment regime E other than simple entailment, we do not only consider the triples that are in the graph, but also triples that are E-entailed by the graph. The entailed triples must, however, be E-entailed by the active graph and not by a merge of the triples in all graphs. This follows from conditions conditions 1 and 3 of the conditions on extensions for basic graph matching.

For an example, we consider a data set with consists of an empty default graph, a named graph graphA with IRI http://example.org/a.rdf, and a named graph graphB with IRI http://example.org/a.rdf. The named graphs contain the following data:

http://example.org/a.rdf:
ex:p rdfs:domain ex:A .
http://example.org/b.rdf:
ex:x ex:p ex:y .

If we ask the following query under RDFS entailment

SELECT ?g WHERE { GRAPH ?g { ex:x a ?type } }

the answer sequence is empty because neither the default graph, nor the named graphs on their own entail a triple that would provide the required binding for ?type.

In order to evaluate a query over the merge of the triples in the named graphs, one can use several FROM clauses, which result in the creation of a fresh default graph for the query that contains a merge of the triples, e.g.,

SELECT ?t FROM <http://example.org/a.rdf> FROM <http://example.org/b.rdf> WHERE { ex:x a ?t . } 

has the answer (t, ex:A). One can not merge triples from several sources into a named graph (they will always be merged into a fresh default graph) and such an extension would require changes to the conditions for extensions of basic graph pattern matching in the existing SPARQL query language specification.


References

Normative References

Informative References

  • [TURTLE] Turtle - Terse RDF Triple Language, Dave Beckett.