Re: [ISSUE-62] A clean proposal with sh:Scope from Tom Johnson on 2015-06-04 (public-data-shapes-wg@w3.org from June 2015)

From: Tom Johnson <johnson.tom@gmail.com>
Date: Thu, 4 Jun 2015 10:57:53 -0700
To: Holger Knublauch <holger@topquadrant.com>
Cc: RDF Data Shapes Working Group <public-data-shapes-wg@w3.org>, Corey A Harper <corey.harper@nyu.edu>
Message-ID: <CAJeHiNFGCtFbw5XML+BXZ2VX_JVUaery_PeCf6i9wKM_BVdOyg@mail.gmail.com>
Hi Holger (and all),

I'm a librarian with the Digital Public Library of America, a sometimes
participant of Karen Coyle's Dublin Core Application Profiles group, and a
likely SHACL implementer for `ruby-rdf` [1].  I've been lurking this list
for a while, and now seems as good a time as any to jump in.

This looks really good to me, and resolves a lot of my concerns about the
approaches discussed previously.  A couple of comments are inline.

On Wed, Jun 3, 2015 at 4:16 PM, Holger Knublauch <holger@topquadrant.com>
wrote:

> I thought more about the issue of generic scopes and filters and have come
> up with a variation of Peter's design. Assuming we define
>
> - Scope: takes a graph as input and produces bindings for the focus node
> (?this)
>
>     Graph -> focus nodes
>
> - Constraint: that takes a focus node as input and produces (violation)
> results:
>
>     focus nodes -> results
>

I think "Constraint" has to be defined in terms of `Graph, FocusNodes ->
Results`, yes?  The scope selects the focus nodes from Graph, but the
question answered by a constraint validation is whether the triples in
Graph (as opposed to the triples in the universe) fulfill the constraint.
Am I correct that this is the intent?

I'm interested in the question of how Graph can be selected; sometimes when
we have discussed "Scope" in the Dublin Core group, this is what we mean.
The language is starting to clarify for me now, and I guess my questions
here are:

   - In a SPARQL context, can we understand Graph to be the union graph of
all the graphs included in the dataset (i.e. G0 U G1 U ... GN)?
   - In other contexts, does SHACL need mechanisms for selecting graphs
from web sources?
      - If I want to apply a shape to, e.g., an LDP-RS, can I select Graph
to be the one at its URI?
      - What if I want to select Graph as multiple sources; e.g. an LDP-RS
+ some related data in a non-LDP triplestore, or an LDP-RS + a specific
Linked Data Fragment?
   - Assuming that we need such a Graph selection mechanism, can it be
squared with the SPARQL and RDF Dataset concepts, so there only needs to be
one set of formal concepts?

I think we should make Scopes an explicit concept in SHACL's RDF
> vocabulary, similar to how shapes are defined. There would be the following
> class hierarchy:
>
> sh:Scope
>     sh:NativeScope
>     sh:TemplateScope
>
> And native scopes can have sh:sparql (or a JS body etc). Example
>
> # Applies to all subjects that have a skos:prefLabel
> ex:MyShape
>     sh:scope [
>         a sh:NativeScope ; # Optional rdf:type triple
>         sh:sparql """
>                 SELECT DISTINCT ?this
>                 WHERE {
>                     ?this skos:prefLabel ?any
>                 }
>             """
>     ] ;
>     sh:constraint [
>         a ex:UniqueLanguageConstraint ;
>         ex:predicate skos:prefLabel ;
>     ] .
>
> This (common) case above could be turned into a template sh:PropertyScope:
>
> ex:MyShape
>     sh:scope [
>         a sh:PropertyScope ;
>         sh:predicate skos:prefLabel .
>     ] ;
>     sh:constraint [
>         a ex:UniqueLanguageConstraint ;
>         ex:predicate skos:prefLabel ;
>     ] .
>
> and we could provide a small collection of frequently needed scopes, e.g.
>
> - all nodes in a graph
> - all subjects
> - all nodes with any rdf:type
> - all IRI nodes from a given namespace
>

This all looks good to me.

You have, here, the ability to define scopes without reference to class.
This is important for some simple uses cases I have like "check that all
resources with edm:provider x have shape y", and I don't think it was
possible under previous proposals, even with inference.

These scopes are also much more self contained than the examples in Peter's
message, making it much easier to define portable constraints. We could
even consider possibilities surrounding constraint/scope inheritance for
Shape class hierarchies.


>
> Systems that don't speak SPARQL would rely on the hard-coded IRIs from the
> core vocabulary, such as sh:PropertyScope.


I'm concerned about this last line.  If systems that don't speak SPARQL
need to rely on the core IRI scopes, those systems will have fairly limited
functionality.  I think it really behooves us to dig deeper on the idea of
sh:TemplateScope; I can't see any reason that the vast bulk of Basic Graph
Patterns + filters for scoping couldn't be defined directly in terms of the
SHACL vocabulary.

I'd also suggest inverting the language, making scopes defined without
recourse to SPARQL "native", and calling scopes defined in SPARQL
`sh:SparqlScope` or similar.

We could now also formally define the scope behind sh:scopeClass (and
> sh:nodeShape):
>
> sh:ClassScope
>     a sh:TemplateScope ;
>     sh:argument [
>         sh:predicate sh:class ;   # Becomes ?class
>         sh:valueType rdfs:Class ;
>     ] ;
>     sh:sparql """
>             SELECT ?this
>             WHERE {
>                 ?type rdfs:subClassOf* ?class .
>                 ?this a ?type .
>             }
>         """ .


> In addition to these scopes, I suggest we turn sh:scopeShape into
> sh:filterShape, and use these filters as pre-conditions that are evaluated
> for a given set of focus nodes. The workflow then becomes:
>
>     - sh:scope produces bindings for ?this
>     - sh:filterShape filters out the values of ?this that do not match the
> given shape
>     - the actual constraints are evaluated
>
> I believe this design provides the flexibility of a generic scoping
> mechanism (as suggested in Peter's design) without getting into the
> complexity of having to analyze SPARQL syntax or rely on hacks with
> rdfs:Resource, while having a user-friendly syntax. The fact that we
> separate sh:Scope from sh:Shape means that we can enforce different,
> explicit semantics on scopes. For example we could allow a sh:Scope to
> encapsulate another SPARQL query that tests whether a given ?this is in
> scope, i.e. the inverse direction of the SELECT query, to optimize
> performance.



> Thanks,
> Holger
>
>
Best,

--
Tom Johnson
Metadata & Platform Architect
Digital Public Library of America
tom@dp.la
Received on Thursday, 4 June 2015 17:59:08 UTC