ISSUE-68 and ISSUE-131: sh:hasShape and pre-binding from Peter F. Patel-Schneider on 2016-06-10 (public-data-shapes-wg@w3.org from June 2016)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Fri, 10 Jun 2016 06:21:22 -0700
To: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>
Message-ID: <55f65c89-8f4a-c8c0-8b26-f932c956bd3f@gmail.com>
Pre-binding and sh:hasShape form a large part of the meaning of SHACL.  They
are not just part of the extension mechanism in SHACL but are used in the
definition of the core of SHACL.

In Section 1.5 there is

  This specification uses parts of SPARQL 1.1 in the normative definition of
  the semantics of the SHACL Core constraints and scopes.

  SPARQL variables using $ marker represent external values that must be
  pre-bound in the SPARQL query before execution.

  Some SHACL constraints are defined with the use of the sh:hasShape
  function.

In Section 4 there is

  The SPARQL definitions in this section also assume the existence of a
  built-in SPARQL function sh:hasShape.

Then pre-binding shows up in the normative definition of every core
constraint component and sh:hasShape shows up the normative definitions of
sh:not, sh:and, sh:or, sh:shape, and sh:qualifiedValueShape.
It is possible to implement the core of SHACL without using sh:hasShape and
pre-binding but this implementation will be implementing something that is
defined in large part by sh:hasShape and pre-binding.

In the extension part of SHACL, sh:hasShape and pre-binding are used
directly when writing the SPARQL code that implement templates.  Problems
with sh:hasShape and pre-binding thus are not just problems with an
underlying definition of SHACL but also directly affect the meaning of
constructs that are employed by users of SHACL.

It is possible to have a SPARQL-based extension mechanism for SHACL that
does not use sh:hasShape and does not use pre-binding.  Thus neither
sh:hasShape nor pre-binding is needed for SHACL.


sh:hasShape is currently defined in Appendix A of the SHACL specification,
http://w3c.github.io/data-shapes/shacl/#hasShape.  sh:hasShape currently
produces three results: undefined recursion is encountered, true if no
violation validation result is produced, and false if some violation result
is produced.

This desription of sh:hasShape has several problems.  First, it is unclear
as to which validation results count in the description.  Is it only result
from the direct validation of the focus node or do results from embedded
shapes count?  Second, the three possibilities are not disjoint.  Third,
recursion is not possible in SHACL so the undefined result can never occur.

However, the biggest problem with sh:hasShape is that it depends on
pre-binding.  sh:hasShape has to evaluate SPARQL queries in a context where
several query variables are limited to certain values.  This is an innate
peculiarity of using a SPARQL function that in turn initiates further SPARQL
query processing so problems in pre-binding are problems for sh:hasShape.


Pre-binding of variables in SHACL is currently defined in Appenix B of the
SHACL specification, http://w3c.github.io/data-shapes/shacl/#pre-binding.

Pre-binding is defined, in full, as

  Pre-binding a variable with a value means that the SPARQL processor needs
  to evaluate all occurrences of variables with that same name (including
  occurrences in inner scopes and nested SELECT queries) so that they have
  the provided value. In other words, whenever a SPARQL processor evaluates
  a pre-bound variable, it must use the given value.

This definition does not align with the definition of SPARQL at all.  SPARQL
is a query language and often does not evaluate query variables.  In
particular, SPARQL does not evaluate query variables in basic graph
patterns.  The definition of basic graph pattern matching in SPARQL, from
https://www.w3.org/TR/sparql11-query/#BasicGraphPattern, is

  Let BGP be a basic graph pattern and let G be an RDF graph.
  μ is a solution for BGP from G when there is a pattern instance mapping P
  such that P(BGP) is a subgraph of G and μ is the restriction of P to the
  query variables in BGP.

Note that there is no notion of evaluation here at all.  Using evaluation as
the basis of the definition of pre-binding is thus disconnected from a large
part of the behaviour of SPARQL.

This disconnect shows up in even the simplest of SPARQL queries that
implement constraint components.  Consider the normative SPARQL definition
of sh:class in property constraints

  SELECT $this ($this AS ?subject) $predicate (?value AS ?object)
  WHERE {
 $this $predicate ?value .
 FILTER NOT EXISTS { ?value rdf:type/rdfs:subClassOf* $class } .
  }

The pre-binding of $this and $predicate does not affect meaning of the basic
graph pattern

 $this $predicate ?value .

so that, according to the definitionf of SHACL and SPARQL, the solution
sequence generated from matching this basic graph pattern will have
solutions for each triple in the data graph.  This is already a total
failure but what happens next?  Well the filter is used to remove some of
the solutions, using the SPARQL semantic Filter function.  Each solution is
checked to see whether the filter evaluates to true for that solution.
Because the filter expression is an EXISTS expression it uses the SPARQL
substitute function, which for each query variable in ?value
rdf:type/rdfs:subClassOf* $class replacces it by its mapping in the
solution, if any.  There is a solution for each triple in the data graph
this will result in that many substitutions.  Next each of these
substitutions is separately matched against the data graph.  This matching
will have a result for values of $this that are the subject of an rdf:type
triple and then these solutions are filtered out.  So the end result will
have a solution for every triple in the data graph where the subject of the
triple is not the subject of an rdf:type triple.

Of course this is completely not what the result should be.  However, it is
what the current definition of SHACL says the result is.


Some SPARQL expert is going to have to take a close look at pre-binding to
determine what its definition should be.  However, before that there needs
to be a closer look taken at how pre-binding should operate.  For example,
should prebinding affect variables throughout the query or only variables
that would be affected by a BIND construct at the beginning of the query?
There should be some examples generated to show how pre-binding works under
these two options so that the working group can make an informed decision.


Peter F. Patel-Schneider
Nuance Communications
Received on Friday, 10 June 2016 13:21:51 UTC