Re: shapes-ISSUE-63 (sh:hasShape): Nested shapes: sh:hasShape function versus recursive SPARQL code generation [SHACL Spec] from Peter F. Patel-Schneider on 2015-06-08 (public-data-shapes-wg@w3.org from June 2015)

From: Peter F. Patel-Schneider <pfpschneider@gmail.com>
Date: Mon, 08 Jun 2015 06:21:59 -0700
To: Holger Knublauch <holger@topquadrant.com>, public-data-shapes-wg@w3.org
Message-ID: <557596F7.7050607@gmail.com>
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

On 06/04/2015 03:58 PM, Holger Knublauch wrote:
> On 6/5/2015 4:55, Peter F. Patel-Schneider wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>> 
>> One of my concerns with ISSUE-63 is that hasShape may not be
>> implementable in many SPARQL systems.
> 
> Is is premature to come to a decision on this topic, but I doubt that
> your point above will outweigh the benefits of the sh:hasShape function.

Opinions vary on this.

As far as I can tell the main benefit of sh:hasShape is that it is used in
http://w3c.github.io/data-shapes/shacl.  There are problems with sh:hasShape
aside from concerns about implementability.

> Yes, there may be some systems where this function is difficult to
> implement, but SPARQL functions are an official extension point of the
> standard, so I would expect that all proper SPARQL APIs provide such a
> hook.

It may be that all proper SPARQL APIs provide the extension hook.  However,
using the extension hook for sh:hasShape is a different matter.  It appears
to me that the extension hook was designed to allow for things like
func:even (see http://www.w3.org/TR/sparql11-query/#extensionFunctions).
Using it to recursively call SPARQL processing is something that is very
different.

> The only cases where such a hook would rely on the goodwill of a 3rd
> party would be in closed-source database products.
> 
> We need to distinguish two layers here:
> 
> 1) When SHACL is implemented with a generic graph-based API such as Jena
> or Sesame, and query execution happens against their graph interfaces,
> then the evaluation of functions happens over the iterators produced by
> the simple SPO queries, and is therefore under complete control of the
> Jena/Sesame implementations. This means that in principle we can provide
> SHACL implementations for every database on the planet, as long as they
> have Jena or Sesame drivers, or a SPARQL end point which we can treat as
> a SPO graph. This very much aligns with the notion of datasets.

What happens with query optimization?  Query optimization depends on being
able to see the entire query setup, and using sh:hasShape splits the query
into different pieces.  sh:hasShape is also written in a top-down,
tuple-at-at-time fashion, which may not fit into the query optimization
setup at al.  I suppose that it would be possible to rewrite the query
optimizer to take sh:hasShape into account, but it could be a major, tough
rewrite.

> 2) If, for performance reasons, people want to execute SHACL natively on
> a database (which has all named graphs etc set up), then this database
> must already have SHACL implemented, including something like the 
> validateNodeAgainstShape operation. If a vendor went through the effort
> of implementing this operation, then it is a trivial step to also expose
> this operation as a SPARQL function.

Executing SHACL on an integrated SPARQL system does not require changes to
the SPARQL system if a translation technique is used.   Even if it is
possible to call external (i.e., outside of the SPARQL query execution
process) processing from the inside of SPARQL queries it may not be possible
to have this external process call back to the same SPARQL environment.
Even if this is possible, the resultant tuple-at-a-time processing may be
prohibitively expensive in a database setting.

> This leaves as the only interesting case the scenario where someone uses
> a generic API such as Jena but wants the queries to be natively executed
> on the target database, and the database does not support sh:hasShape. In
> this case, the (Jena) engine can apply some flattening algorithms similar
> to what you suggested in your draft, and eliminate the sh:hasShape
> function calls into a single large query.

I agree that using a single large query is probably the only way to go when
using a production external database.  I think, however, that even if the
production database can be modified that there could be problems in
providing sh:hasShape support.

This should clearly be possible for many scenarios, esp
> for the core vocabulary (and sh:valueShape). Engines may want to optimize
> this anyway.  However, I believe such optimizations should be out of
> scope for the WG, because we could quickly double the size of our
> documents and the complexity of the definitions.

The WG documents do not need to include such optimizations in any approach
taken for SHACL, so the fact that they would increase the size of WG
documents does not provide any detriment for an approach.


> I find the current definitions using sh:hasShape very elegant and
> compact.

I find the current definitions using sh:hasShape full of problems, not
particularly elegant at all, and no more compact than a translation approach.

> Holger

peter
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJVdZb3AAoJECjN6+QThfjzDdUH/0vWcIT9waZNahr4eJjj1wGn
Adv4yaUldeqmXK0ITI50Xoy0J+stZpHlfbEERpRuYU//0wCadbl35icWt3qqTjKN
DxF6lHVY7LBXU/XTOts3BpuWs0nAEeElpQfSbVbICNHFpf+v4Cd+qqEJObuCar2s
OWAGCGEhulqlT29tnv91yPbb1bDJl7orD5Dhr8ABQDMcTngR/vtqseQmZO9N/GVI
oocyhI8FUMk47mHOAFMg4D+WwIlZEH3S/XKfJ7JlObDk6RjliwGzg27s7Ujcb0J8
AgxqeLrhN+Psf33JSnUWDuBqbe8EWxtsAxBBATMlOFZax4HOQaK2gnb84krVp9w=
=XPZ2
-----END PGP SIGNATURE-----
Received on Monday, 8 June 2015 13:22:33 UTC