Re: shapes-ISSUE-63 (sh:hasShape): Nested shapes: sh:hasShape function versus recursive SPARQL code generation [SHACL Spec] from Peter Patel-Schneider on 2015-06-10 (public-data-shapes-wg@w3.org from June 2015)

From: Peter Patel-Schneider <pfpschneider@gmail.com>
Date: Wed, 10 Jun 2015 10:07:39 -0700
To: Holger Knublauch <holger@topquadrant.com>, "public-data-shapes-wg@w3.org" <public-data-shapes-wg@w3.org>
Message-ID: <CAMpDgVzOyooipgKY8A0Y+T3aKUQJUYj2FUJLsr1meAVK5aWwmA@mail.gmail.com>
It may have been convenient to use sh:hasShape for or, Xor, and not, but the
results are not correct, as violations are not correctly defined.

sh:hasShape may have uses outside of SHACL, but that can only be a
minor point here.


I'm not against extension functions, so long as there is a clear, clean, and
consistent definition for them.


As far translating SHACL shapes to single SPARQL queries, I claim that
https://www.w3.org/2014/data-shapes/wiki/Shacl-sparql provides such an
approach.  The translation approach has the distinct advantage that the query
optimization techniques in existing SPARQL engines can be directly applied to
the resultant query.  An approach using a SPARQL extension function will need
special support.



However, the biggest problem with sh:hasValue is that it makes the
current SHACL spec ill-formed.


peter




On 06/08/2015 04:46 PM, Holger Knublauch wrote:
> On 6/8/2015 23:21, Peter F. Patel-Schneider wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256
>>
>> On 06/04/2015 03:58 PM, Holger Knublauch wrote:
>>> On 6/5/2015 4:55, Peter F. Patel-Schneider wrote:
>>>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
>>>>
>>>> One of my concerns with ISSUE-63 is that hasShape may not be
>>>> implementable in many SPARQL systems.
>>> Is is premature to come to a decision on this topic, but I doubt that
>>> your point above will outweigh the benefits of the sh:hasShape function.
>> Opinions vary on this.
>>
>> As far as I can tell the main benefit of sh:hasShape is that it is used in
>> http://w3c.github.io/data-shapes/shacl.
>
> That is one benefit among others. It was very convenient to have this function
> available when I was adding "and", "xor" and "not" recently. However, the
> function will also be of interest to end users. For example consider
>
> INSERT {
>     ?x rdf:type ex:VeggiePizza .
> }
> WHERE {
>     ?x rdf:type ex:Pizza .
>     FILTER (!sh:hasShape(?x, ex:FoodWithMeatIngredients))
> }
>
> which is a typical classification problem that is currently covered by other
> ontology languages. There will be many similar use cases that people will
> discover as long as we grant them this flexibility.
>
>>   There are problems with sh:hasShape
>> aside from concerns about implementability.
>>
>>> Yes, there may be some systems where this function is difficult to
>>> implement, but SPARQL functions are an official extension point of the
>>> standard, so I would expect that all proper SPARQL APIs provide such a
>>> hook.
>> It may be that all proper SPARQL APIs provide the extension hook.  However,
>> using the extension hook for sh:hasShape is a different matter.  It appears
>> to me that the extension hook was designed to allow for things like
>> func:even (see http://www.w3.org/TR/sparql11-query/#extensionFunctions).
>> Using it to recursively call SPARQL processing is something that is very
>> different.
>
> Then I guess you would also be against the SPARQL extension functions in the
> current SHACL spec. We'd need to have a serious conversation about dropping
> that - for many SPIN users these are considered the most useful feature of
> all. Like sh:hasShape, they are a way of encapsulating reusable behavior, like
> you would expect from any structured programming language.
>
> Anyway, your argument is totally unconvincing. As long as a function returns a
> value and doesn't do writes to the data, it can do whatever it likes. There
> are plenty of practical examples such as complex text matching algorithms that
> are exposed via SPARQL functions.
>
>>
>>> The only cases where such a hook would rely on the goodwill of a 3rd
>>> party would be in closed-source database products.
>>>
>>> We need to distinguish two layers here:
>>>
>>> 1) When SHACL is implemented with a generic graph-based API such as Jena
>>> or Sesame, and query execution happens against their graph interfaces,
>>> then the evaluation of functions happens over the iterators produced by
>>> the simple SPO queries, and is therefore under complete control of the
>>> Jena/Sesame implementations. This means that in principle we can provide
>>> SHACL implementations for every database on the planet, as long as they
>>> have Jena or Sesame drivers, or a SPARQL end point which we can treat as
>>> a SPO graph. This very much aligns with the notion of datasets.
>> What happens with query optimization?  Query optimization depends on being
>> able to see the entire query setup, and using sh:hasShape splits the query
>> into different pieces.  sh:hasShape is also written in a top-down,
>> tuple-at-at-time fashion, which may not fit into the query optimization
>> setup at al.  I suppose that it would be possible to rewrite the query
>> optimizer to take sh:hasShape into account, but it could be a major, tough
>> rewrite.
>
> I claim the opposite. Having structured function calls such as sh:hasShape
> allow an engine to do better optimizations than seeing a random (and very
> large) collection of nested sub-queries. It allows the engine to decide
> whether it wants to flatten the nested queries or not (and then do the
> optimizations that you speak of). In some cases it can reuse cached function
> call results from previous invocations.
>
> Overall, I believe you are too conservative here. While I agree it's worth
> looking at the current state of the art with RDF databases, it is unfair to
> state that we cannot rely on any extensions of those databases in the future.
>
>>
>>> 2) If, for performance reasons, people want to execute SHACL natively on
>>> a database (which has all named graphs etc set up), then this database
>>> must already have SHACL implemented, including something like the
>>> validateNodeAgainstShape operation. If a vendor went through the effort
>>> of implementing this operation, then it is a trivial step to also expose
>>> this operation as a SPARQL function.
>> Executing SHACL on an integrated SPARQL system does not require changes to
>> the SPARQL system if a translation technique is used.   Even if it is
>> possible to call external (i.e., outside of the SPARQL query execution
>> process) processing from the inside of SPARQL queries it may not be possible
>> to have this external process call back to the same SPARQL environment.
>> Even if this is possible, the resultant tuple-at-a-time processing may be
>> prohibitively expensive in a database setting.
>
> Not sure what you mean with tuple-at-a-time. Nothing in my draft prevents
> SHACL engines to do the kind of translation techniques that you hint at. The
> difference is that you want to make it mandatory, while I believe this is a
> mistake, and we can produce a cleaner spec and a better user experience if we
> leave these details unspecified.
>
>>
>>> This leaves as the only interesting case the scenario where someone uses
>>> a generic API such as Jena but wants the queries to be natively executed
>>> on the target database, and the database does not support sh:hasShape. In
>>> this case, the (Jena) engine can apply some flattening algorithms similar
>>> to what you suggested in your draft, and eliminate the sh:hasShape
>>> function calls into a single large query.
>> I agree that using a single large query is probably the only way to go when
>> using a production external database.  I think, however, that even if the
>> production database can be modified that there could be problems in
>> providing sh:hasShape support.
>>
>> This should clearly be possible for many scenarios, esp
>>> for the core vocabulary (and sh:valueShape). Engines may want to optimize
>>> this anyway.  However, I believe such optimizations should be out of
>>> scope for the WG, because we could quickly double the size of our
>>> documents and the complexity of the definitions.
>> The WG documents do not need to include such optimizations in any approach
>> taken for SHACL, so the fact that they would increase the size of WG
>> documents does not provide any detriment for an approach.
>
> I don't understand what you are talking about then. You stated you want to
> define a translation approach that produces a single query per shape. Then
> please write down this algorithm. Just doing dome hand-waving is no longer
> working. It's trivial to point out problems with other people's work while not
> providing details about the alternative.
>
>>
>>
>>> I find the current definitions using sh:hasShape very elegant and
>>> compact.
>> I find the current definitions using sh:hasShape full of problems, not
>> particularly elegant at all, and no more compact than a translation approach.
>
> So you think this definition of sh:NotConstraint is neither elegant nor compact:
>
> SELECT *
> WHERE {
> FILTER (sh:hasShape(?this, ?shape, ?shapesGraph)) .
> }
>
> Please let us see your worked out counter-proposal with all the details and
> examples of how your translation is supposed to work. There is no point in
> further discussion without these details.
>
> Holger
>
Received on Wednesday, 10 June 2015 17:08:08 UTC