Re: ISSUE-139: uniform descriptions and implementations of constraint components from Holger Knublauch on 2016-06-08 (public-data-shapes-wg@w3.org from June 2016)

From: Holger Knublauch <holger@topquadrant.com>
Date: Wed, 8 Jun 2016 12:38:02 +1000
To: public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <4c6f336e-7879-ec6c-aee4-02f189651c2b@topquadrant.com>
On 7/06/2016 0:54, Peter F. Patel-Schneider wrote:
> On 06/05/2016 11:14 PM, Holger Knublauch wrote:
>> On 6/06/2016 6:36, Peter F. Patel-Schneider wrote:
>>> Yes, each constraint component should not need more than one implementation,
>>> whether it is in the core or otherwise.
>> While I share the same goal, I don't see how it can work in practice. You have
>> not yet responded to how this would look in cases like sh:hasValue where being
>> forced to use the query snippet for node constraints and property constraints
>> would lead to abysmal performance.
> I don't see how a different implementation of sh:hasValue is going to lead to
> abysmal performance.   The difference is going to roughly be something like
>
> SELECT $this WHERE { FILTER NOT EXISTS { $this $property $hasValue } }
>
> vs
>
> SELECT $this WHERE { FILTER NOT EXISTS { $this $property ?value
>                                  FILTER ( sameTerm(?value,$hasValue) } }

Already answered elsewhere, the difference is like O(N) to O(log(N)), 
and the O(N) *is* abysmal.

>
>>>     Otherwise there are just that many more ways of introducing an error.
>> This is IMHO not a strong enough argument to *prevent* multiple queries. It's
>> a nice-to-have. No need to cripple the language just for that. If an extension
>> developer creates a poorly tested and broken extension then it's a bug that he
>> or she needs to fix. That's the same as everywhere.
> I don't see this as crippling the language.  I instead see this as improving
> the language by making it more regular.

Maybe an improvement from the metamodel perspective or from some 
theoretical points of view, but not for the users. I hope we agree 
Usability of the language and the ability to create performant queries 
is more important.

>
>>> Yes, in the current setup each constraint component should be usable in node
>>> constraints, in property constraints, and in inverse property constraints.
>>> Otherwise there is an extra cognitive load on users to figure out when a
>>> constraint component can be used.  The idea is to not have errors result from
>>> these extra uses, though.  Just as sh:minLength does not cause an error when a
>>> value node is a blank node neither should sh:datatype cause an error when used
>>> in an inverse property constraint.  Of course, an sh:datatype in an inverse
>>> property constraint will always be false on a data graph that is not an
>>> extended RDF graph.
>> I completely disagree. Compared with OWL, this policy would mean that any
>> constraint-like property should be applicable everywhere. For example in
>> addition to
>>
>> ex:Person
>>      a owl:Class ;
>>      owl:disjointWith ex:Animal ;
>>      rdfs:subClassOf [
>>          a owl:Restriction ;
>>          owl:onProperty ex:gender ;
>>          owl:maxCardinality 1 ;
>>      ] .
>>
>> the policy that you propose would also allow
>>
>> ex:Person
>>      a owl:Class ;
>>      owl:maxCardinality 1 ;
>>      rdfs:subClassOf [
>>          a owl:Restriction ;
>>          owl:onProperty ex:gender ;
>>          owl:disjointWith ex:Animal ;
>>      ] .
>>
>> Do you remember why the OWL WG did not apply the same policy that you describe?
> In OWL there is a strong limitation on how property restrictions are formed.
> The only allowable property restrictions are the object and data property
> versions of AllValuesFrom, SomeValuesFrom, HasValue, HasSelf, and
> cardinalities.  Each of these has a particular meaning defined in the OWL
> specification.
>
> So
> [        a owl:Restriction ;
>           owl:onProperty ex:gender ;
>           owl:disjointWith ex:Animal ;
>       ]
> is semantically incomplete.  Is it all the ex:gender values that are being
> restricted?  Is it some of them?  Is it some number of them?  Is it something
> else?  There is no way of knowing.
>
> Similarly in OWL the pieces of property restrictions are very constrained in
> where they can occur each different kind of property restriction has its own
> syntax and the properties that connect a property restriction to its pieces in
> the RDF encoding for OWL are only used for this purpose.  So
> owl:maxCardinality is only used in property restrictions and not elsewhere.

Yes and SHACL should implement the same policy, because sh:maxCount also 
only makes sense for predicate-based constraints and not node constraints.

>
> At one point during its early development OWL did have a syntax closer to that
> of SHACL but this syntax was abandoned because of a desire for a more limited
> syntax with clean boundaries.  There is no syntactic benefit in the current
> OWL syntax to allowing, for example, cardinalities on classes so there was no
> syntactic reason to think of doing so.

Everything you said above confirms the current design to put syntactic 
restrictions on where constraint parameters can be used.

>
> There are description logics that allow cardinalities on classes but because
> of expressive needs, not syntactic concerns.   This extra expressive power
> often affects the computational complexity of reasoning, so there are few, if
> any, implementations of it.
>
>> In the case of SHACL you would allow something like
>>
>> ex:PersonShape
>>      a sh:Shape ;
>>      sh:maxCount 1 ;
>>      sh:property [
>>          sh:predicate ex:gender ;
>>          sh:closed true ;
>>      ] .
>>
>> (Note the sh:maxCount would be utterly confusing to people, needlessly
>> increasing the cognitive load.)
> Currently in SHACL this would look like
>   ex:PersonShape sh:Shape ;
>    sh:constraint [
>       sh:maxCount 1 ;
>       sh:shape [ sh:property [
>           sh:predicate ex:gender ;
>           sh:closed true ;
>       ] ] ] .

Sorry I was already assuming the syntax where node constraints are 
collapsed into shapes. So the example I wanted to express is

ex:PersonShape
     a sh:Shape ;
     sh:constraint [
         sh:maxCount 1 ;
     ];
     sh:property [
         sh:predicate ex:gender ;
         sh:closed true ;
     ] .



>
>  From the SHACL spec:
> - For node constraints the value nodes are the individual focus nodes, forming
> a set of exactly one node.
> - The property sh:maxCount restricts the number of value nodes.
> This is perfectly well-behaved.
>
> Also from the SHACL spec
> - For property constraints the value nodes are the objects of the triples that
> have the focus node as subject and the given property as predicate.
> Right now sh:closed is couched in terms of the focus node as it is only
> allowed for node constraints.  Using value node instead gives a description
> that is suitable here.

The case of sh:closed is highlighting just how bizarre the proposal 
ISSUE-139 is. The description and intuitive understanding of sh:closed 
is basically "a focus node may only have values for properties declared 
in the surrounding shape". In the case of node constraints this is 
intuitive because the sh:closed is on the nodes in scope of the shape. 
However, when applied as in

ex:MyShape
     sh:scopeClass ex:Person ;
     sh:property [
         sh:predicate ex:firstName ;
         sh:datatype xsd:string ;
     ] ;
     sh:property [
         sh:predicate ex:worksForCompany ;
         sh:closed true ;
     ] .

it would mean that all values of the ex:worksForCompany property must 
only have values for the properties ex:firstName and ex:worksForCompany. 
This is neither intuitive nor will this make any sense in practice.

>
>> There are good reasons why languages are designed to disallow certain
>> nonsensical statements: they support "compile-time" checking of syntax errors,
>> and enable input forms to suggest relevant properties.
> There is a vast difference between nonsensical and silly.  A nonsensical
> statement is one that cannot be assigned a meaning, as in an OWL property
> restriction missing the part that says how to treat the property values.  A
> statement can be silly for a number of reasons, but silly statements to have
> well-defined meanings.
>
>> I would also appreciate a response to the case of primary keys from my
>> previous email.
> Are you proposing that there should be a constraint component for primary
> keys?  I don't see any description of how this would work in property
> constraints so how can anyone determine how it would work in other
> constraints.

I am disappointed you "don't see any description of how this would 
work". I had already put a link to the primary key issue into the 
original email, and it has also been mentioned in the Use Cases document 
and has been discussed last year. Anyway, I have written it up again

https://lists.w3.org/Archives/Public/public-data-shapes-wg/2016Jun/0034.html

> If you are not proposing that there be a constraint component
> for primary keys then I don't see any relevance to the discussion here.

You are again contradicting yourself. Earlier you stated that this 
discussion is about the general extension mechanism. Now you are trying 
to mark this use case as irrelevant. FWIW I am not proposing to put this 
into SHACL Core because I am not willing to spend more WG time on 
features that I can equally solve myself using the extension mechanism, 
and where I would need to go through endless discussions about every 
little detail.

>
>> Using them in inverse property constraints would be
>> meaningless and misleading.
> Well, that depends on what a primary key is supposed to be.  I don't see any
> particular SHACL or RDF reason that primary keys can't be inverse properties
> just as well as non-inverse properties.
>
>> There must be a way for extension developers to
>> indicate for which cases a constraint component can be used.
> Why?  When is this necessary?

Because otherwise it is impossible to develop user-friendly tools that 
suggest under which conditions a constraint component can show up. See 
my description of TopBraid forms in

https://lists.w3.org/Archives/Public/public-data-shapes-wg/2016Jun/0022.html

>
>> sh:context is
>> playing that role. We could potentially use Shapes and sh:scopeClass instead,
>> but then the meta-shapes would overlap with the actual data shapes.
>>
>> I believe the general problem that we have again and again is that you (Peter)
>> seem to focus on the constraint validation aspect only, while I (and hopefully
>> others) also want workable support for the other use cases of SHACL such as
>> form generation. From a pure validation perspective, it may indeed be possible
>> to formulate queries for all three cases, even if they are dummies. But from a
>> users' perspective this makes no sense. And even from an implementation point
>> of view, forcing all the 3 cases everywhere is an extra burden. I believe
>> cutting down to a maximum of two queries will be acceptable and is easily
>> achievable without the drastic redesign you are proposing.
> Well I certainly do believe that the validation is the primary use of SHACL.
> If SHACL doesn't work, or doesn't work well, for validation then it is not
> going to be used.

I disagree, and I believe the Charter of the WG also disagrees with you:

https://www.w3.org/2014/data-shapes/charter

The*mission*of theRDF Data Shapes Working Group 
<http://www.w3.org/2014/data-shapes/>is to produce a language for 
defining structural constraints on RDF graphs. In the same way that 
SPARQL made it possible to query RDF data, the product of the RDF Data 
Shapes WG will enable the definition of graph topologies for interface 
specification, code development, and data verification.

Notice the focus on "a language for *defining*...". It's just a 
declarative structure that can be used to *describe* graph topologies. 
There are many ways to consume these descriptions.

>
> I see lots of problems with the current design of SHACL as a validation
> language.  I view my proposals as improvements to SHACL.

Some of these ideas at least have triggered people to think outside of 
the box, and this may have been helpful. But no, this particular 
proposal would be a massive step backwards for SHACL.

>
>> I wouldn't mind walking through the core vocabulary again to see if we can
>> further generalize some components. For example sh:hasShape could be applied
>> to node constraints too (pending a renaming probably). But there still needs
>> to be a generic context mechanism for extension authors. As a tool developer I
>> need that feature.
> Where is this ability needed?  For a system that limits what kind of SHACL
> constraints can be easily written?  No problem - just use your own property to
> limit what the form generator of the system does.

Yes, I could create my own property such as tq:context. And yes, you 
could also create a property pfps:singleValidatorForAllCases. But this 
totally defeats the purpose of having a standard in the first place. We 
need sh:context in the standard because otherwise nobody will use it, 
and there is no way to exchange the information (in this case the intent 
where a constraint component is meaningful).

Holger


>
>> Holger
> peter
>
>
>
>>> peter
>>>
>>>
>>> On 06/05/2016 09:57 AM, Dimitris Kontokostas wrote:
>>>> So, this goes into the SPARQL extension mechanism, which also affects the
>>>> definition of the  core language and, with what you propose,
>>>> - there should be _only one_ SPARQL query that will address all three
>>>> contexts, and any other contexts we may introduce in the future (e.g. for
>>>> paths)
>>>> - even if it doesn't make sense in some cases or even if it would result in an
>>>> error when used, all contexts will be enabled for all components with this one
>>>> generic SPARQL query, right?
>>>>
>>>> (apologies if you discussed this already on the last telco)
>>>>
>>>> Dimitris
>>>>
>>>>
>>>> On Sun, Jun 5, 2016 at 6:06 PM, Peter F. Patel-Schneider
>>>> <pfpschneider@gmail.com <mailto:pfpschneider@gmail.com>> wrote:
>>>>
>>>>       My recent messages have been about constraint components in general.  Of
>>>>       course, the examples of constraint components that are easiest to
>>>> discuss are
>>>>       the core constraint components, and when discussing core constraint
>>>> components
>>>>       there are also issues related to how they are described in the SHACL
>>>>       specification.
>>>>
>>>>
>>>>       Right now constraint components in both the core and in the extension
>>>> have the
>>>>       potential for tripartite behaviour - one behaviour in node
>>>> constraints, one
>>>>       behaviour in property constraints, and one behaviour in inverse property
>>>>       constraints.  No core constraint components actually work this way at
>>>> present,
>>>>       but the potential is there.  This should be changed so that constraint
>>>>       components, both in the core and in the extension, have a common
>>>> behaviour for
>>>>       node constraints, property constraints, and inverse property constraints.
>>>>
>>>>       Not only should each constraint component have a single behaviour, but
>>>>       constraint components should share common behaviours.  Right now it is
>>>>       possible for a constraint component to do something completely
>>>> different with
>>>>       the property.  For example, a constraint component could decide to use
>>>> the
>>>>       constraint's property in an inverse direction in both inverse property
>>>>       constraints and property constraints or could just ignore the property
>>>>       altogether.
>>>>
>>>>       Further, there should also be a common description of this behaviour
>>>> common to
>>>>       constraint components.  Some core constraint components, e.g.,
>>>> sh:class, are
>>>>       already described using a terminology, namely value nodes, that can
>>>> easily
>>>>       apply to all constraint components.  Other constraint components, such as
>>>>       sh:minCount and sh:equals, are described using different terminology that
>>>>       describes the same thing.  This makes sh:minCount and sh:equals appear
>>>> to be
>>>>       quite different from sh:class.  Either the descriptions should align
>>>> or there
>>>>       should be different syntactic categories for sh:class and sh:minCount.
>>>>
>>>>       It is also possible to resolve this problem by using a different
>>>> syntax for
>>>>       SHACL.  ShEx does this by having a single property-crossing
>>>> construct.  OWL
>>>>       does this by having multiple property-crossing constructs, including
>>>>       ObjectAllValuesFrom and ObjectSomeValuesFrom.  In both ShEx and OWL
>>>> there are
>>>>       many constructs, including the analogues of sh:class, that then just
>>>> work on
>>>>       single entities with no need to worry about focus nodes vs value nodes or
>>>>       properties vs inverse properties.
>>>>
>>>>
>>>>       Along with the problems of differing behaviour and description there
>>>> is also
>>>>       the problem of tripartite implementations, both of core and extended
>>>>       constraint components.  Why should sh:class need three pointers to
>>>>       implementations, even if they are the same?  Why should sh:minCount
>>>> need two
>>>>       (or three) implementations?
>>>>
>>>>       One could say that this doesn't matter at all because SHACL
>>>> implementations
>>>>       are free to implement core constructs however they see fit.  However,
>>>> this
>>>>       implementation methodology is completely exposed for constraint
>>>> components in
>>>>       the extension.  It would be much better if only a single simple
>>>> implementation
>>>>       was needed for each constraint component.  It would also be much
>>>> better if the
>>>>       implementations of constraint components did not need to worry about
>>>> how value
>>>>       nodes are determined.
>>>>
>>>>
>>>>       So my view is that SHACL is currently has the worst of all possible
>>>> worlds.
>>>>       Its syntax is complex, because each constraint component has its own
>>>> rules for
>>>>       where it can occur.  Its behaviour is complex, because each constraint
>>>>       component decides how to behave in each kind of constraint.  Its
>>>> description
>>>>       is complex, because different constraint components are described in
>>>> different
>>>>       ways.  Its implementation is complex, because constraint components
>>>> can have
>>>>       up to three different implementations each of which is often more
>>>> complex than
>>>>       necessary.
>>>>
>>>>       peter
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>       On 06/05/2016 06:45 AM, Dimitris Kontokostas wrote:
>>>>       > I was planning to ask for clarifications on this as well
>>>>       >
>>>>       > Is this thread about enabling all contexts in all SHACL Core
>>>> components only
>>>>       > or a suggestion to change the SPARQL extension mechanism in general?
>>>>       > These two can be independent of each other imho.
>>>>       >
>>>>       > Best,
>>>>       > Dimitris
>>>>       >
>>>>       > On Sun, Jun 5, 2016 at 10:10 AM, Holger Knublauch
>>>>       <holger@topquadrant.com <mailto:holger@topquadrant.com>
>>>>       > <mailto:holger@topquadrant.com <mailto:holger@topquadrant.com>>> wrote:
>>>>       >
>>>>       >     Peter, could you clarify whether you were only talking about the
>>>> core
>>>>       >     constraint components and how the spec would define them, or
>>>> about the
>>>>       >     general mechanism? I am not too concerned about how we write things
>>>>       in the
>>>>       >     spec. There is only one SPARQL query per component right now in the
>>>>       spec.
>>>>       >
>>>>       >     Thanks
>>>>       >     Holger
>>>>       >
>>>>       >     Sent from my iPad
>>>>       >
>>>>
>>>>
>>>>
>>>>
>>>> -- 
>>>> Dimitris Kontokostas
>>>> Department of Computer Science, University of Leipzig & DBpedia Association
>>>> Projects: http://dbpedia.org, http://rdfunit.aksw.org,
>>>> http://aligned-project.eu
>>>> Homepage: http://aksw.org/DimitrisKontokostas
>>>> Research Group: AKSW/KILT http://aksw.org/Groups/KILT
>>>>
>>
Received on Wednesday, 8 June 2016 02:38:39 UTC