Warning:
This wiki has been archived and is now read-only.

ISSUE-1: What inferencing can or must be used

From RDF Data Shapes Working Group
Jump to: navigation, search

Introduction

RDF supports the notion of semantic extensions such as RDFS, OWL, and RIF (see RDF 1.1 Semantics, Semantic Extensions and Entailment Regimes). A semantic extension normally defines an entailment regime. An entailment regime defines how additional triples are inferred from an explicitly given set of triples. An RDF application may rely on some semantic extension. Users of this type of application may not make a distinction between the explicit and inferred triples, nor may they even be aware that inferencing has taken place. This implies that we must consider the relation between constraint checking and semantic extensions in order to achieve a useful result for application users. For example, it would be confusing for a constraint to fail because an explicit triple was absent, but the same triple was inferred. A user viewing the inferred triples would be confused by the error message that said the triple was missing.

Issue 1 is concerned with this question.

SPARQL

This issue also occurs when querying RDF data. The SPARQL specification addresses this issue through the explicit use of entailment regimes (see SPARQL 1.1 Entailment Regimes). Conceptually, a SPARQL query is based on graph pattern matching, and the graph on which the patterns are matched is the graph the results from applying the entailment relation to the explicitly given triples.

There is a close connection between queries and constraint checking. A constraint may be viewed as a query that returns TRUE if the constraint is satisfied and FALSE if the constraint is violated. This view of constraints is similar to the use of assertions in programming language. Equally, a constraint could be phrased as a query that searched for violations, and return TRUE if one or more violations were found. When violations occur, a more complex query could return information about where or why the violations occur. This information could be used in error messages that help the user understand or resolve the violation. SPARQL ASK queries could be used to check assertions and violations. SPARQL SELECT and CONSTRUCT queries could be used for error messages.

Since there is a significant and useful overlap between constraint checking and SPARQL queries, one way forward is to adopt the approach that SPARQL uses for entailment regimes, possibly with additional semantics for how they are used in constraint checking.

SPARQL Parameterized Inference Proposal

AP: Also, we might need different and more lightweight/flexible means to define and extend entailment regimes, apart from the ones pre-defined in SPARQL 1.1 Entailment Regimes (which defines entailment regimes for RDFS, OWL, and RIF), e.g.

  • in terms of begin able to declare a custom ruleset that should be respected for inferencing, consisting of rules (USING RULESET) of the form
{QuadPattern} IF {SPARQL GroupGraphPattern}
  • in terms of specifiying one or more RDFS or OWL ontologies (USING ONTOLOGY), that should be considered when doing inferences.

For details cf. http://www.w3.org/2009/sparql/wiki/Feature:ParameterizedInference

SPIN Rules

HK @AP: Once we go down that road, we should also consider something similar to [SPIN Rules http://spinrdf.org/spin.html#spin-rules], possibly combined with USING RULESET or as part of the constraint declaration itself, e.g.

   ex:Person
       a rdfs:Class ;
       :constraint [
           :sparql "ASK { ?this ex:uncle ?someUncle }" ;
           :ruleset <http://example.org/unclerules> ; 
       ] ;

where the SPARQL is executed over a query graph that has the SPIN rules from the given rule set applied:

   ex:Person
       :rule [
           :sparql """
               CONSTRUCT {
                   ?this ex:uncle ?uncle ;
               }
               WHERE {
                   ?this ex:parent ?parent ;
                   ?parent ex:sibling ?uncle ;
                   ?uncle ex:gender ex:male ;
               }
               """
       ] ;

And syntactic sugar would cover the case that it should use the rules from the current graph.

In general, I would absolutely love to see something like SPIN rules as part of the standard, yet I wonder whether this pushes the implementation burden too much, given that triple stores would have to perform all those inferences in all these languages over all kind of union graphs etc. This requires discussion and feasibility feedback from database vendors.

OSLC Resource Shapes

OSLC in general does not currently rely on any semantic extension. The semantics of OSLC Resource Shapes are not formally defined, but the clear assumption is that they apply to the explicitly given set of triples present in HTTP requests because an OSLC service is not expected to perform any inferencing on them. An OSLC service that stores triples in an information resource may add triples to that resource, e.g. for creator, creation date, etc., but these are not inferred triples. They are system generated triples.

SPIN

SPIN relies on a limited form of inferencing when the engine builds an execution plan: the SPIN constraint checking engine walks up and down the class hierarchy (rdfs:subClassOf) and makes sure that constraints are "inherited" into subclasses.

Apart from that there is no explicit concept of inferencing in SPIN. Entailment is left to the graph implementation that the SPARQL queries are executed upon.

Many SPIN/SPARQL queries use some kind of "inferencing" on the fly. Examples include property paths (rdfs:subClassOf*), user-defined SPIN function and magic properties. The latter two provide a simple form of backward chaining, in that values are computed on the fly and possibly even recursively.

ShEx

ShEx doesn't mandate any inference. Like SPARQL 1.0, it depends on an input graph, which may in turn be the product of some inference. It's possible that some language support for inference of membership in a value set may be needed to satisfy clinical use cases.

RDFUnit

RDFUnit relies on the explicit graph given for validation but uses SPARQL property paths to infer the type hierarchy. Depending on the validation parameters it can additionally load the referenced vocabularies in memory to retrieve the type hierarchy more accurately.

OWL Constraints (Stardog ICV)

OWL Constraints [1] evaluate OWL axioms in a single interpretation of an RDF graph. However, it is natural to have the graph be closed under RDFS entailment because many OWL axioms turn on whether an individual belongs to a class, and having the graph in question not utilize the RDFS meaning of the built-in RDFS properties like rdfs:subClassOf would give rise to results that go against intuitions about OWL typing.

So OWL constraints work best when RDFS inferencing is used to augment the graph. It would not be necessary to actually put the RDFS-entailed triples in the graph, however.

So, the OWL constraint

Person <= >=1 SSN

would be violated in

John rdf:type Student .

Student rdfs:subClassOf Person .

under the most-natural version of OWL entailments.

DC Application Profiles

A Dublin Core Application Profile is currently conceived of as a stand-alone structure that defines entities, properties and constraints. It is required that the RDF property definitions for properties included in the DCAP exist outside of the DCAP in an openly accessible ontology. The semantics of the ontology are what will be visible in the open world. While it would be bad practice for the usage of a property in a DCAP to contradict the definition in the ontology, DCAP does not specify any requirement to check for that occurring.

Behind the philosophy of the DCAP is the preference for the development of open world ontologies with minimum semantic commitment, and the use of profiles that define different views supporting a variety of open and closed world uses and meet the needs of different communities. The Dublin Core Metadata Terms is an example of such an ontology.

The DCMI working group on RDF validation and profiles has yet to determine if inferencing between sub/super classes and properties will be included in its requirements. There are a very few use cases that would require this, but it may be possible to create rules within the profile that would obviate the need to include the ontolog(y/ies) in the validation.

Relation of Entailment to Non-Unique Name Assumption and Comparison of Lexical Forms to Literal Values

Arthur Ryman

Entailment results in the addition of triples to a graph. These additional triples are those that are inferred from the explicitly given triples. The inferred triples area required in order to produce answers that our users expect, e.g. a member of a subtype is a member of its supertype so the subtype is valid wherever the supertype is required. However, our users will also have other expectations.

The RDF 1.1 specification defines equality of literals in terms of the equality of their lexical form. See http://www.w3.org/TR/rdf11-concepts/#section-Graph-Literal. However, our users will probably prefer that equality of literals is defined in terms of their literal value. For example, "1"^^xs:integer and "01"^^xs:integer have different lexical forms but the same literal value, namely the integer "1".

Similarly, triples that use OWL terms may entail owl:sameAs triples which tell us that some resources have more than one name in the given graph. This is entirely analogous to literals having different lexical forms but the same literal value. For example, if a SHACL program defines the property "ex:hasFather" as having a max cardinality of 1, then the following triples should not trigger a violation:

ex:Luke ex:hasFather ex:Anakin, ex:Darth .

ex:Anakin owl:sameAs ex:Darth .

Therefore, we cannot rely on entailment alone to produce the results that our users will intuitively expect. We also need a clear policy on equality of nodes.