Re: [ISSUE-62] A clean proposal with sh:Scope from Holger Knublauch on 2015-06-08 (public-data-shapes-wg@w3.org from June 2015)

From: Holger Knublauch <holger@topquadrant.com>
Date: Mon, 08 Jun 2015 17:35:04 +1000
To: Tom Johnson <johnson.tom@gmail.com>
CC: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>, Corey A Harper <corey.harper@nyu.edu>
Message-ID: <557545A8.8010509@topquadrant.com>
On 6/7/15 4:53 AM, Tom Johnson wrote:
>  The algorithm is roughly:
>   - a `validation` function that accepts an arbitrary graph (or a 
> Dataset? are filters and constraints that reference graph names in 
> scope for SHACL?)

Any SPARQL-based constraint, filter or scope could access any named 
graph from the dataset, using the GRAPH keyword.

> and a constraints graph (optionally, since the constraints may be 
> extracted from the other); and iterating over each shape calls:
>     - a `scoping` function that accepts the graph to validate and a 
> shape sub-graph, outputting focus nodes.
>   - then, iterating over each focus node and constraint, calls:
>     - a `constraint` function that accepts a constraint sub-graph, a 
> focus node, and the original Graph, and outputs violations.

Yes and depending on how we design things the constraint function may 
insert filter conditions. Note that the algorithm above is just how the 
specification formulates it. Implementations will likely not call the 
constraints for each individual focus node separately. In my own API 
(like in the SPIN API), the code injects the code to bind the focus 
nodes into the start of the SPARQL query, e.g. ?this a ?SCOPE_CLASS.

>
> I think I still have some questions about how this would be 
> implemented for SPARQL-based stores; and guidance seems necessary 
> since people will want to validate more than just the default graph;

Yeah, quite possibly the default graph would be another parameter into 
the validation, but this is uncritical here because once you are inside 
a SPARQL query, you can assume that this parameter is implicitly passed 
on. For example, SPARQL end points have a parameter for the default 
graph, even if that is a named graph in the server's dataset.

> First of all, I agree that defining constraints in terms of SPARQL is 
> a good way to go.  With SPARQL you have a well tested existing 
> grammar, test suites, etc... and providing translations from specific 
> constraints *does* mean that most implementations can cheaply 
> implement constraints in those terms using upstream translations. That 
> said, implementations that have SPARQL engines generally have 
> independant BGP engines, too.  I think it's the case that the latter 
> are cheaper to run, so there may be trade-offs here.

Sure, and any non-SPARQL implementation may fall back to using SPARQL to 
ask simple SPO queries, or use something like Jena's Graph API. The BGP 
engines are only cheaper as long as you don't need complex FILTER 
conditions to execute on the same engine.

>
> Second, I have something of an aversion to the use of embedded 
> microsyntaxes in general.  In my view, the question is not so much 
> about dropping a useful language feature, but allowing the languages 
> expressions to be in their own syntax.  Recourse to external grammars 
> in definitions seems totally reasonable; extensions likewise.  But my 
> feeling is that if the language is meant to be expressed in RDF, it 
> should be possible to write custom constraints in RDF.  Extensions 
> should be just that: extensions.  This may come down to skepticism 
> that 95% of constraints can be pre-defined---people are creative. :)

I didn't say that 95% constraints can be pre-defined. This was only 
about *scoping*. There is no way to predict what kind of 
ontology-specific constraints will want to ask. SPARQL is a very good 
compromise here, to leave the door open. I don't understand your 
statement about "it should be possible to write custom constraints in 
RDF". Do you see gaps in our current core vocabulary?

>
> Lastly, if I understand correctly the current draft makes embedded 
> SPARQL support optional (from recollection, I don't think it's even a 
> SHOULD). That seems to conflict very strongly with making embedded 
> SPARQL the primary way of defining custom constraints.  For what it's 
> worth, I think SPARQL support really ought to be a SHOULD or 
> stronger.  Softer language is at risk of delaying (or worse) 
> portability for what seems like a key feature, likely to be in 
> widespread use.  "...they have to explain this choice to their users" 
> doesn't strike me as a solution to this problem.

I am certainly all in favor of making SPARQL support at least a SHOULD, 
and if you look at the spec, section 8 on templates even has a MUST 
about this. The built-ins of SHACL are also defined in SPARQL and the 
spec is full of SPARQL examples. We do have however an open discussion 
within the WG on how flexible this mechanism shall be, and I believe a 
good point can be made that languages like JavaScript should also be 
allowed. But that's another thread...

Regards,
Holger
Received on Monday, 8 June 2015 07:35:36 UTC