Feature:ParameterizedInference

From SPARQL Working Group
Revision as of 01:51, 10 April 2009 by Cogbuji (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search


Feature: Parameterized Inference

Several current SPARQL engines and RDF stores support rule based inference using ontologies to some extent. There are two problems with the current approaches:

  • there is no way to dynamically specify which inference rules shall be used to answer a specific query
  • there is no way to dynamically specify which ontologies shall be used to anser a specific query.

We propose two new clauses USING ONTOLOGY, and USING RULESET for parametrizing inference rules and ontologies to be used for anwering a query.

Feature description

USING RULESET

There is no normalized behavior for parameterizing inference in SPARQL engines. Some systems do static fwd-chaining materialization such that the entailed triples are always fixed with respect to a static ruleset, others evaluate inference rules dynamically in a top-down fashion. However, often one wants to answer different queries on RDF data, e.g. give me all the answers with respect to the "raw" RDF data vs. give me all answers including those obtained by RDFS inference. Engines supporting dynamic rules evaluation coulld support this, but there is no support for that in the SPARQL language.

When we speak about inference here, we mean inference with respect to a fixed finite ruleset, which can be evaluated in a forward chaining manner in finite time. Problems with infinity of "real" higher entailment regimes beyond simple RDF entailment such as RDF, RDFS, or OWL are well known.

  • We suggest to add a keyword USING RULESET which points to a ruleset (e.g. in RIF or as a set of CONSTRUCT queries) that describes the expected inferences to be done by the engine when answering a query.


USING ONTOLOGY

When you want to query the same data with respect to e.g. different versions of the same ontology. Let's assume I have created my own version of FOAF which includes that foaf:member is a subProperty of rdfs:member (Other examples would be some ontologies including my own mappings, e.g. between FOAF and vCard, etc.). Then I might want to query FOAF data in my store with respect to this new ontology, while others still may want to query that FOAF data with respect to the original FOAF ontology only.

  • We suggest to add a keyword USING ONTOLOGY which points to an RDFS or OWL ontology that should be taken into account (i.e. merged into the default as well as into each named graph) when answering queries with respect to a certain ruleset.

Note: While merging different ontologies with different data graphs is possible by explicit specification using multiple FROM clauses for the default graph. merging ontologies into the named graphs is not possible. An alternative to the USING ONTOLOGY keyword would be an extension of the FROM NAMED clause that allows to specify a named graph as the merge of multiple input graphs.

Example

Here is an example asking for all foaf:Agents in my FOAF file (which are not explicitly mentioned.):

SELECT ?X 
FROM <http://www.polleres.net/foaf.rdf>
USING RULESET <subclassing.rif>
USING ONTOLOGY <http://xmlns.com/foaf/spec/index.rdf>
WHERE { ?X a foaf:Agent. }

where subclassing.rif is a single rule RIF ruleset with the rule:

?S[rdf:type ?D] :- And( ?S[rdf:type ?C] ?C[rdfs:subclassOf ?D] )

Likewise, we suggest to define some "standard rulesets, such as e.g.

USING RULESET RDFS 

using a fixed ruleset covering RDFS entailment, as given in http://www.w3.org/2005/rules/wiki/SWC#Embedding_RDFS_Entailment probably modulo a finite subset of the axiomatic triples only).

We emphasize that the above query wouldn't have needed the USING ONTOLOGY clause, it could have been simply replaced by

SELECT ?X 
FROM <http://xmlns.com/foaf/spec/index.rdf>
FROM <http://www.polleres.net/foaf.rdf>
USING RULESET RDFS
WHERE { ?X a foaf:Agent. }

but that only merges the ontology into the default graph for possible inferences. Our experiments show that even SPARQL engines which claim to support RDFS entailment do not apply RDFS inferences on named graphs then, or do not allow both RDFS entailment *and* explicit specification of the named graphs. I.e. even if USING RULESET was supported the following would not work as intended:

SELECT ?X 
FROM NAMED <http://xmlns.com/foaf/spec/index.rdf>
FROM NAMED <http://www.polleres.net/foaf.rdf>
USING RULESET RDFS
WHERE { GRAPH <http://www.polleres.net/foaf.rdf> {?X a foaf:Agent. } }

In this context, we think it would be useful to be able to specify the clause "USING ONTOLOGY" which allows to specify ontologies being implicitly merged into both the default graph and *all* named graphs.

Similarly, it is not possible with FROM NAMED clauses to specify a named graph as the merge of other graphs, while this *IS* possible for the default graph. Some use cases might need this (actually, the proposed USING ONTOLGY clause is a special case of this). So an alternative syntax for the previous could be to extend the facilities to specify naamed graphs:

SELECT ?X 
FROM NAMED <new> {<http://xmlns.com/foaf/spec/index.rdf> <http://www.polleres.net/foaf.rdf> }
USING RULESET <subclassing.rif>
WHERE { GRAPH <new> {?X a foaf:Agent. } }

Likewise, there is no way to define an implicit set of named graphs (e.g. all URIs appearing in the default graph). This might be needed useful for queries such as the following:

SELECT NAMES OF ALL PEOPLE I KNOW OR NAMES OF PEOPLE KNOWN BY THESE PEOPLE

We could envision the following mockup syntax:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?N ?L
FROM <http://www.polleres.net/foaf.rdf>
FROM NAMED ALL
WHERE { { <http://www.polleres.net/foaf.rdf#me> foaf:knows ?X .  ?X 
foaf:name ?N . }
         UNION
         { ?X rdfs:seeAlso ?L . GRAPH ?L { [a foaf:Person ] foaf:name ?N } }
       }

FROM NAMED ALL is just a strawman, what I mean is the closure of the named graph i.e. that all IRIs appearing as resources in the default graph should also implicitly be NAMED GRAPHS.

Existing Implementation(s)

Both features are implemented in the GiaBATA system [1] a SPARQL engine based on a rule engine on top of a relational database, which uses the dlvhex-system [2] and its relational database binding dlvdb [3] underneath. The dlvhex-based SPARQL engien is available at http://dlvhex-semweb.sourceforge.net/ for download.

Virtuoso implements RDFS subclasses, subproperties, class and property equivalence and owl:sameAs in a manner similar to the proposed, see section 17.9: RDF Inference in Virtuoso

Existing Specification / Documentation

List any existing text that attempts a formal definition of this extension. This could be a draft specification, API or syntax documentation, etc.

Compatibility

Extensions should be upwards compatible with the previous SPARQL spec. Although the charter does not formally bind us to this requirement, it rule should only be violated in exceptional cases. In case your extension possibly raises any compatibility issues, these should be detailed here.

Links to postponed Issues

Has this extension/use case some history in the group already? I.e. are there posponed issues or archived mail-threads related to this originating from DAWG?

Related Features

This feature is very close to Feature:ControlOfInference, but not identical. Parameterized inference does not eliminate the need of control of inference at triple pattern level.

Champions

Use cases

A description of one or more use cases, the solution of which requires this feature. Multiple use cases can be added to each feature.

  • Querying data with or without rule based RDFS inference. Compare inferred vs. raw triples.
  • Querying data with respect to different versions of an ontology from the same repository.

References

to be completed by references!