Re: review of SHACL document (First pass of responses)

Many thanks, Peter!

Here is my first pass through your comments, starting with the least 
controversial and easiest issues. I will try to handle the remaining 
issues in other emails.

On 9/2/2015 10:40, Peter F. Patel-Schneider wrote:
> Abstract
>
> "SHACL (Shapes Constraint Language) is an RDF vocabulary for describing RDF
> graph structures."
> This is not the purpose of SHACL, at least as far as I am concerned.
> Further, SHACL is not an RDF vocabulary.
> Something along the lines of 'SHACL (...) is a language for checking
> constraints against RDF graphs.'  Subsequent sentences in the abstract
> should be changed to match this.

I believe SHACL does have at least two main purposes: constraint 
checking is one of them, but describing and communicating the structure 
of RDF graphs (e.g. for UI tools and humans) is another. I don't think 
we should drop that aspect, so I have changed the first sentence for now 
to include both view points:

"SHACL (Shapes Constraint Language) is a language for checking 
constraints against RDF graphs and for describing graph structures."


> 1. Introduction
>
> "The SHACL vocabulary includes high-level concepts to represent restrictions
> on predicates used in triples."
> SHACL does not work this way, but instead can be used to check whether
> constraints on the information in an RDF graph are satisfied or not.
> SHACL does not have high-level concepts.
> Replace with "The SHACL language includes high-level constructs to represent
> constraints on the information in RDF graphs."

This would duplicate the first sentence, and the point of that sentence 
was to highlight that some built-ins are about triples. But I agree that 
the term "concepts" is not right. So I went with the following two 
sentences to begin with:

"SHACL (Shapes Constraint Language) is an RDF-based language to 
formulate structural constraints on RDF graphs.
SHACL includes high-level constructs to represent restrictions on 
predicates used in triples."

>
> "Some users and implementors will be content with using the high-level shapes
> language only"
> The idea of partial implementations does not belong here.
> Replace with "Some users will be content with using the high-level language
> only".

Ok.

>
> "SHACL definitions are represented in RDF and can be serialized in multiple
> RDF formats. The example snippets in this document use Turtle [turtle] and
> (not written yet:) JSON-LD [json-ld] notation."
> SHACL does not have definitions.  If there is no JSON-LD yet, then don't
> mention it.
> Replace with 'SHACL is written in RDF and can be serialized in multiple
> RDF formats. The examples in this document use Turtle [turtle]."

Ok.

>
> 1.2 Overview and Terminology of Core Features
>
> This initial example centrally uses a controversial feature - shapes as
> classes.  It needs to be rewritten to not use this feature.  I have not
> further commented on problems in this section related to shapes as classes.

This aspect is covered by an open ISSUE, which we will hopefully resolve 
during the F2F meeting. If the WG decides to delete sh:ShapeClass then 
of course the example will have to be changed too. Meanwhile I believe 
it is helpful for people coming from the OO world or OWL to see familiar 
territory in that example.

>
> "Each ex:Issue must point to exactly one user via the property
> ex:submittedBy and may have one value for ex:assignedTo."
> SHACL doesn't do ``must''.  Instead in SHACL one writes constraints and then
> checks to see whether the constraints are satisfied.
> Replace with 'One shape requires that each instance of ex:Issue
> has exactly one value for ex:submittedBy, that that value is an instance of
> schema:Person that satisfies the constraints on submitters, has at most one
> value for ex:assignedTo, and that that value is an instance of schema:Person.'

Ok.

>
> "Users that have submitted an issue must also have a schema:email address,
> so that they can be notified when the issue has been updated."
> SHACL does not have an facilities for email notification.
> Replace with 'The submitter shape requires at least one value for schema:email.'

Ok.

> "One of the operations that SHACL engines should support validates that a given
> RDF node matches a given shape."
> There should not be discussion of SHACL engines here.  Instead the features
> of the SHACL language should be described.
> Replace with 'One of the operations of SHACL is validating a given RDF node
> against a given shape.'

Ok.

>
> "This operation can be invoked based on any control logic, ...."
> As this section is describing the basic part of SHACL, it
> is not appropriate to talk about external control logic here.   Just remove
> the entire sentence.

Ok.

>
> "Another supported operation"
> Replace with "Another operation".

Ok.

>
> I have previously commented on the problems with the description of
> validation and violations in this part of the document.

I believe I had addressed those... at least I tried my best :)  If there 
are specific left-overs in the latest document, please let me know.

>
> "The following example code"
> I don't see any code here.
> Replace by "The following RDF graph"

Ok.

>
> 2. Shapes
>
> Talk about shapes and classes should be moved to the advanced topics
> sections.

See above. We'll do this after the ISSUE is resolved.

>
> "all instances of the class are expected to have the shape"
> SHACL doesn't do expectations.  All that it can do is validation.

Changed to "then the validation will verify that all instances of the 
class have the shape."

> "shape declarations can specialize the shapes associated with superclasses"
> SHACL does not have any notion of specialization of shapes.
> These paragraphs need to be rewritten to conform with how SHACL works.

I do not understand the problem with the term "specialization" here. All 
constraints for a superclass also apply to a subclass. Could you 
elaborate why you think this is bad or suggest an alternative?

>
> UML diagrams should not be used as readers may not have adequate knowledge
> of UML.

The OWL 2 specification uses UML too. I believe the diagrams are 
sufficiently easy to understand even for people without formal UML 
training. I hope others agree that there is value in keeping them for 
illustration purposes. If we write a Primer, then that might be a better 
place though.

>
> 3. Property Constraints
>
> I find the introduction to this section very confusing.  Focus nodes don't
> have properties.  Property constraints and inverse property constraints
> aren't on the objects or subjects of triples.

I don't see such problems, and would require alternative wording.

>
> If property constraints have to be instances of sh:PropertyConstraint, then
> blank node property constraints also have to be, so they can't be missing
> the rdf:type triple.   This special treatment of blank nodes occurs in other
> places as well and should be eliminated.

This is an open ISSUE, and I have added a link to that ISSUE 70 to the 
document.

> The sentence including "is assumed to be an exhaustive" is not needed and
> only serves to confuse the matter.

Ok.

>
> The wording "A violation must be reported" appears in several places, but it
> is not correct.  Not all constraint violations end up being reported, for
> example, those in negated contexts.

Yes, and I had already fixed that on the branch.

> sh:text should be defined as a datatype, so that it does not need any
> special treatment.

I don't understand that comment. Do you mean that people will write

     "Test"^^sh:text

>
> The treatment of sh:valueClass just says "subclasses" in the description,
> and then provides a non-standard version of subclass.   This discrepancy
> needs to be prominently mentioned.

I don't understand this. To me the use of subclasses and the use of 
rdfs:subClassOf in the TEXTUAL DEFINITION of sh:valueClass look consistent.

>
> In several places properties have default value, e.g., the default value for
> sh:minLength is 0.  There is no reason to even talk about such default
> values as the default value is the value that has no effect.

This had caused confusion in the past, even among WG members. The 
question was whether the default value of sh:minCount should be 1. I 
wanted to mention this explicitly. Also, the TEXTUAL DEFINITION of 
sh:minCount assumes that a value is present, using the default if needed.

>
> In several places properties are mentioned as being optional, e.g.,
> sh:minCount.  There is no need for such discussion, as all the properties in
> the 3.1.x sections are optional.  Mentioning that some of these properties are
> optional begs the question as to whether the other properties are mandatory.

Ok.

> "restrict the string length of triples" needs to be changed to instead talk
> about values.

Ok, changed to "... restrict the string length of objects of triples..."

>
> There is no need to create extra classes for syntactic categories.  For
> example, sh:nodeKind is unnecessary.

Did you mean the class sh:NodeKind? I believe it is helpful to have a 
class for that, e.g. to reuse generic drop down components and other 
algorithms that walk through all members of a class. I have raised an 
ISSUE today though that the class should exclude any other members, 
similar to an owl:oneOf.

>
> The textual definition for sh:valueClass is written the wrong way around.
> Instead of having a mismatched rdf:type value being the triggering condition
> it should be not having a matching rdf:type value.   Untyped non-literals do
> not need special treatment.

Good point, fixed.

>
> Having a simple definition giving access to the correct value sets would
> permit the property constraint bits to be used in both directions.

Yes this was addressed in ISSUE-79.

>
>
> 3.1.12 from version current on 28 August
>
> The textual definition
> is reversed - counting the number of values that do not validate instead of
> those that do validate.

Ok.

>
> The syntax is problematic, as there might be multiple qualified value shapes.

We have ISSUE-83 for that topic, and I have added a link to that.

>
>
> 4. Other Core Constraints on Shapes
>
> These are not constraints on shapes, but are instead just constraints.

Ok, this was a left-over from the days when we had a separate syntax for 
global constraints.

> Evaluation of and and or is in order and short-circuiting.  A note should be
> added that this does not conform to a recent working group resolution.

Ok. I haven't made that change to the SPARQL queries yet; added pointer 
to ISSUE for now.

> The general definition of XOR with multiple arguments is that it is true
> precisely when an odd number of the arguments are true.

This is news to me, but I am not a logician. If you believe this is 
currently wrong, would you mind opening an ISSUE for that?

>
> 4.5 from version current on 28 August
>
> What happens if a SHACL control file adds ignored properties to
> sh:ClosedShape?

It would behave consistently with the rest of the system - once people 
mess around with the system vocabulary, then the behavior may change. I 
don't see how we can prevent that.

>
>
> 5. Scopes and Filter Shapes
>
> In my opinion individual scope links should point from the shape to the
> node.  This would align them with class-based scopes.  It would also mean
> that the property would not need to be excluded when closing shapes.

I have added a link to ISSUE-61 which is about this topic.

>
> There should be a mention in 5.1.2 that there is an issue related to the
> interaction of classes and shapes.

I had already pointed at this ISSUE further up in the document. I hope 
that's sufficient.

>
> 6. Constraint Violations Vocabulary
>
> What happens if a constraint violation node has an rdf:type link to
> rdfs:Resource?  According to the document it will not be a constraint
> violation, but that's crazy.

This will have to be updated once ISSUE-51 is resolved, and this problem 
will then disappear. I have updated the links to that issue and the 
related ISSUE-75.

>
> [From here on in, I have done a less thorough examination.]
>
> 7. General Shape Constraints
>
> How does "MAY reference a focus node" interact with being a constraint?

We cannot technically enforce that every SPARQL query actually uses the 
?this variable. I am not sure what to do about that, but I also don't 
see it as a problem.

> In many cases there will not be a node that can be used to represent the
> entire graph.   How then can graph-level constraints be defined?

Anyone can just make up a node and give it a sh:nodeShape triple. The 
graph resource (with the same IRI as the document) is a natural 
candidate though.

> 12. Functions
>
> There is no method provided for calling a SHACL function from within SPARQL
> code.

Why is this necessary? Functions are a regular extension point of 
SPARQL, and the syntax to call SHACL functions is just like any other 
function. Maybe you are referring to the issue that SHACL requires some 
extensions to normal SPARQL engines.

>
> 14. SPARQL-based Execution
>
> There is an assumption that the SHACL control graph is accessible to the
> SPARQL code.  There is a working group issue related to this, which needs to
> be referenced.

Ok.

> It is not true that sh:sparql values need not declare any prefixes.  There
> could be prefixes there that do not occur elsewhere, and such prefixes would
> have to be declared in the sh:sparql value.

I believe my sentence is formally correct, as some values of sh:sparql 
do not require prefixes.


I have posted an update to my current branch

https://github.com/w3c/data-shapes/commit/48707260a4aba377e8e75ea3c3ee6473a4fa3b74

and will try to look at the other (more complex) changes in the next 
days. For the record, below is a copy of the topics that I have not 
addressed yet.

Thanks again
Holger


TODO:

On 9/2/2015 10:40, Peter F. Patel-Schneider wrote:
>  Review of
>  Shapes Constraint Language (SHACL)
>  W3C Editor's Draft 10 August 2015
>
>
> Summary:
>
> There are many problems with this document.  It has multiple technical
> flaws.  It has undefined terms.  It is misleading.  It does not accurately
> reflect the state of working group deliberations.
>
> I deem it to not be suitable for publication as a first public work draft of
> the working group.  Changing the document to make it suitable would probably
> touch every subsection and almost every paragraph of the initial sections.
>
>
> Overall Comments:
>
> This document is likely to be the first document about SHACL that interested
> parties will read, so I am reviewing it with this in mind.  I'm assuming
> that the reader is quite familiar with RDF and somewhat familiar with
> SPARQL.
>
> I find it very difficult to get a notion of what SHACL is from reading the
> first portions of the document.  Not only are there many incorrect or
> misleading statements, there is no overall description of what SHACL is for
> and the example inappropriately centrally includes shapes as classes.  The
> initial section of the document appears to be trying to be like a
> mini-primer but it doesn't work for this purpose because of the lack of an
> overall description.  Compare this document with Core SHACL Semantics at
> http://w3c.github.io/data-shapes/semantics/ which provides a short but clear
> notion of what that version of SHACL is for.
>
> I have detailed comments on the Abstract and Section 1 below.  I have
> somewhat less detailed comments on later sections.
>
> I fear that releasing the current document as a FPWD will create a large
> amount of confusion.  In my opinion someone needs to do a full rewrite of
> the document before it is suitable for publication.
>
>
> Abstract
>
>
> "Additional constraints can be associated with shapes using SPARQL and
> similar executable languages. These executable languages can also be used to
> define new high-level vocabulary terms."
> Given that there are no provisions for languages other than SPARQL at the
> present time, this feature should not be mentioned in the abstract.
> Instead just use 'Additional constraints can be associated with shapes using
> SPARQL.'
>
> 1. Introduction
>
>
> "express other restrictions in an executable language such as SPARQL"
> There is no need to say that SPARQL is executable.
> Replace with 'express other conditions as SPARQL queries' and adjust the
> wording in the rest of the paragraph accordingly.
>
>
> 1.2 Overview and Terminology of Core Features
>
>
>
> "Each constraint defines a condition that can be validated against a graph."
> Constraints do not work this way.
> There needs to be some high-level description of how SHACL works before
> constraints are described.
>
>
> "a SHACL engine may follow"
> How does ``may'' come into this at all?  This needs to be replaced by some
> description of how SHACL is invoked on the control graph and the data graph.
>
> 1.3 Overview and Terminology of Advanced Features
>
> "The validation of each constraint is formalized with one or more execution
> languages."
> I don't see how anyone who does not already know about SHACL can figure out
> what this is saying.
> Replace with something like 'Each constraint is SHACL is defined in terms of
> a SPARQL 1.1 query.' and then adjust the rest of the paragraph accordingly.
>
>
> 2. Shapes
>
> "A Constraint defines restrictions on the structure of an RDF graph."
> I find this rather misleading.
> "A Shape is a group of constraints that have the same focus nodes."
> There are lots more things that go into a shape.
> Replace the entire paragraph with a concise description of what shapes are
> for that includes scopes and filters.
>
> This entire section needs to be reordered and rewritten to talk about what
> shapes are for and how they work  This description should describe how
> shapes use constraints, scopes, and filters.  Talk about rdfs:label and
> rdfs:comment, if retained, should be moved to a much less prominent place.
>
>
>
> There are a number of SHACL vocabulary terms that show up in the section but
> that are not discussed.  If a vocabulary term is important enough to show
> up here, then it is important enough that it needs to be discussed here.
>
>
> 3. Property Constraints
>
>
> The use of rdfs:label for human-readable properties in context leaves the
> notion of context undefined.  When should tools prefer the labels in
> property constraints?  Similarly for rdfs:comment.
>
> There is also no guidance on even when sh:defaultValue might be used by user
> interface tools.
>
> What are the instances of rdfs:Datatype?  How is it determined whether
> something is an instance of rdfs:Class?
>
> There is no discussion of whether properties can be repeated.  Even if this
> information is presented later, it should also be presented here,
> particularly if some properties can be repeated and others cannot.
>
> Matching a shape is not a defined notion.  [I took a quick look at the
> newest version of the document, and the changes do not help.]
>
> There is no discussion on whether constraint violations inside embedded
> shapes are to be handled specially.   Similar problems occur in several
> other places.
>
>
> 3.1.12 from version current on 28 August
>
> "error-level constraint violations" is not defined.
>
>
> 4. Other Core Constraints on Shapes
>
> There is no indication that violations reported from within sh:NotConstraint
> constraints are to be treated in any special way.  There is no defined
> notion of a node matching a shape.
>
> There is no definition of "error-level constraint violations".
>
> There should be some discussion on the difference between top-level
> constraints and constraints inside an and.
>
>
> 5. Scopes and Filter Shapes
>
> The wording at the beginning of Section 5 reads like it is optional for
> SHACL processors to use scopes.  There needs to be some introductory
> material that discusses how SHACL validation works with scopes
> and filters.
>
> The discussion of class-based scopes needs to be clear that the definition
> of instance is different from that in RDFS or OWL.
>
> There should be a mention in 5.1.2 that there is an issue related to the
> interaction of classes and shapes.
>
> The discussion of rdf:type is incorrect.   Even for classes that are shapes
> there might not be an rdf:type triple linking a resource with its shapes.
> In fact, there is no real notion of the shapes of a node defined in SHACL at
> all.  This wording is another attempt to make SHACL a modelling language.
>
> Are multiple scopes conjunctive, disjunctive, or independent?
>
> The discussion of filter shapes on shapes ("applies to all constraints")
> does not match the discussion at the beginning of the section.
>
>
> 6. Constraint Violations Vocabulary
>
> There is no definition of constraint validation operation or even of
> constraint validation.   [The glossary in the version current as of 28
> August does not define constraint validation or constraint validation
> operation in any useful fashion.]
>
> There is no discussion of how constraint severity works.
>
>
> [From here on in, I have done a less thorough examination.]
>
> 7. General Shape Constraints
>
> The discussion of general shape constraints based on a template does not
> make any sense, as templates have not yet been adequately introduced.
>
> 8. Templates
>
> There is no notion of SHACL instantiation defined.
>
> If the entire core profile is templates, then say so.  If not, say so.
> The wording concerning the relationship between templates and the core
> profile is extremely confusing as it stands.
>
> How are templates accessed?
>
> SHACL doesn't have rules or stored queries.
>
> Is rdfs:subClassOf important for templates?  If so, how?  If not, why the
> restriction related to template superclasses?
>
> 11. Supported Operations
>
> I find this section extremely difficult to understand.   Some of the
> information given here needs to be mentioned much earlier.
>
> SHACL engines MUST support SHACL operations.
>
> All of the operations in this section are missing the control graph as an
> argument.
>
> The operations should have the data graph as an explicit argument.
>
> There is no need to have templates arranged in a class hierarchy.
>
> sh:NativeConstraint is not adequately defined.
>
> There are numerous parts of the pseudo-code that don't make sense, e.g., "is
> at least ?minSeverity".
>
> All of the interface arguments are missing the data and control graphs as
> arguments.  There is nothing on the programming langauge types that are to
> be used.
>
>
> 14. SPARQL-based Execution
>
> There is no indication that SPARQL-based execution cannot be done using a
> standard SPARQL engine.
>
> The values of sh:sparql are not strings that are syntactically valid SPARQL
> queries.  (See the beginning of 14.2 - they can be fragments.  They also can
> be missing prefix declarations.)
>
> There is no notion of the defining graph in SHACL.
>
> There is no indication that executing the SPARQL queries cannot be done in a
> standard SPARQL implementation.
>
> The execution of function (and other template, I expect) bodies is not a
> SHOULD relationship.  Instead it is a MUST unless the alternative produces
> the same results.
>
>
>
> Wording problems that exist in multiple places:
>
> "SHACL RDF vocabulary"
> See above for why this is wrong.
> Replace by 'SHACL Language'.
> Variations of this also exist, e.g., "an RDF vocabulary", "SHACL
> vocabulary".
>
> "restriction"
> See above for why this is wrong.
> Each occurence needs to be examined for a suitable replacement.
>
> "subclass" and "instance"
> These words are used loosely and differently in different places.  Each
> place they are used needs to be examined to ensure either that a standard
> meaning is being used or that the deviation from the standard is prominently
> described.
>

Received on Wednesday, 2 September 2015 09:33:03 UTC