ISSUE-95: Metamodel simplifications
The purpose of this page is to collect proposals for simplifying the SHACL metamodel.
The less controversial parts of the metamodel have been copied from the SHACL specification and collected into a plain Turtle vocabulary file, shacl-vocab.ttl
.
This Turtle file is the official source for the vocabulary and is stored in our GitHub repository along with the SHACL specification.
The Turtle file has been automatically converted to an HTML file, shacl-vocab.html
.
Edits MUST be made to the Turtle source file, not the generated HTML file.
One of the primary design goals is to reduce the amount of duplication within the specification in the area of constraints.
At present, there are three main classes of constraints: PropertyConstraints
, InversePropertyConstraints
, and NodeConstraints
.
However, many constraints could have variants for each of these main classes.
Some sort of "mix-in" approach would eliminate the need to repeat each constraint in each class.
We also have the design point that each constraint is injected by virtue of the presence of its parameters. We need to specify this injection mechanism and make it uniformly apply to both built-in constraints and user-defined constraints.
Contents
- 1 Proposal 1
- 2 Proposal 2 (edited by Holger)
- 3 Proposal 3
- 4 Proposal 4
- 5 Meeting minutes SHACL metamodel discussion
Proposal 1
Author: aryman
The SHACL vocabulary defines terms that appear in shapes. These terms are used by people who write shapes.
The SHACL vocabulary also defines terms that may not appear in shapes. Some of these terms are used by people who write extension constraints. We refer to this part of the vocabulary as the metamodel. The metamodel may be used by SHACL processors. It is therefore desirable that the metamodel be easy to understand and process. It is also desirable for the metamodel to be consistent with RDFS modelling practices.
The main area in which we should simplify the metamodel is constraints. We have built-in constraints and extensions. Built-in constraint are defined by the SHACL spec. Extensions are defined by users and require an implementation in SPARQL or another extension language. The metamodel should define the built-in constraints and provide a mechanism for users to define extensions.
Goals for Simplification
We should simplify the vocabulary in the following areas:
- Eliminate the concept of Template since it is a misnomer and we can achieve the same results by annotating Constraint classes.
- Avoid metaclasses, and instead use a flat class/property/individual classification for all terms.
- Eliminate abstract classes since they are primarily an OO programming concept.
- Do not create a separate constraint class for each context in which a constraint may be used, i.e. property constraints, inverse property constraints, and focus node constraints. Instead add annotations to the Constraint class that specify where the constraint may be used.
The spec currently uses the concept of a Template and models both built-in and extension constrains as templates. I propose that we eliminate the concept of a Template. First of all, a template is a text processing concept in which one defines a textual skeleton for some artifact. The skeleton includes some slots where parameters can be filled in. However, the concept of template does not apply in our case. For SPARQL, we define a set of variables that can be used in the SPARQL query. We do this because SPARQL provides no call interface, e.g., functions or subroutines. Other languages such as Javascript do provide functions, so the SHACL engine would pass the system variables into the function. There are no templates involved. Therefore Template is a misnomer. Second, we already have the concept of a Constraint, so we don't need another concept. We can simply associate the SPARQL or Javascript with the constraint.
We should avoid the use of metaclasses, since these make the metamodel harder to understand. We should produce a model in which each resource can be classified as either a class, a property, or an individual.
Early versions of the metamodel included abstract classes. This may be good OO design but adds a lot of unnecessary terms to the vocabulary. We should avoid them completely.
Finally, the three different contexts in which a constraint may appear should be defined via annotations of a single Constraint subclass instead of required a separate subclass for each context. This will reduce the number of Constraint subclasses by almost a factor of three.
Constraint Contexts
I have used the term context in the preceding discussion but have not defined it precisely. Context is a key concept in SHACL. However, the specification treats it implicitly. We should spell out this concept in the specification.
By context, I mean the set of nodes that a constraint instance takes as input. For example, sh:MinCountConstraint
counts the number of nodes in the context and reports a violation if the number of nodes is less than the value given by the sh:minCount
parameter of the constraint instance. sh:PatternConstraint
matches each node in the context against the regular expression given by the sh:pattern
parameter and reports a violation if any of these nodes does not match the pattern.
In general, a constraint applies a set of validation rules to a context. The context is determined by the focus node and the context type of the constraint. SHACL defines three types of context, which correspond to the properties sh:constraint
, sh:property
, and sh:inverseProperty
. We have introduced the classes sh:NodeConstraint
, sh:PropertyConstraint
, and sh:InversePropertyConstraint
, which are each subclasses of sh:Constraint
, for these context types. The proposed definition of these classes is as follows:
sh:NodeConstraint
is the set of all constraint instances in which the context is computed by taking just the focus node. The context in this case always consists of exactly one node.
sh:PropertyConstraint
is the set of all constraint instances in which the context is computed by taking the set of all object nodes C
of triples of the form (F, P, C)
in the data graph where F
is the focus node and P
is the term given by the sh:predicate
parameter of the constraint. The context in this case consists of zero or more nodes.
sh:InversePropertyConstraint
is the set of all constraint instances in which the context is computed by taking the set of all subject nodes C
of triples of the form (C, P, F)
in the data graph where F
is the focus node and P
is the term given by the sh:predicate
parameter of the constraint.
Every constraint instance MUST have an unambiguous context rule associated with it. Therefore sh:NodeConstraint
, sh:PropertyConstraint
, and sh:InversePropertyConstraint
are mutually disjoint. It is conceivable that the WG may identify other useful context rules, e.g., those defined using SPARQL property paths. For now, let's assume we have just these three context types.
In this proposal, I introduced the constraint class annotation property sh:context
that links a constraint class to the contexts in which it applies. For example, sh:MinCountConstraint
is useless in the context sh:NodeConstraint
, since the context always contains exactly one node. However, most of the constraints defined by SHACL do make sense in any context.
Holger asked, "Why use sh:context
instead of rdfs:subClassOf
?". The answer is that the context types are mutually disjoint so using the subclass relation is inappropriate. Recall that if A rdfs:subClassOf B, C
then every instance of A
is also an instance of B
and C
. But if B
and C
are disjoint then they have no instances in common. The relation between a constraint and its context rule is therefore not a subclass relation. Hence, we need to introduce a new property, sh:context
, to express this relation.
Since each constraint is composed of a context rule and a set of validation rules that operate on the context, the metamodel might be clearer if we introduced subclasses of sh:Constraint
as follows:
# context rules sh:ContextConstraint rdfs:subClassOf sh:Constraint . sh:NodeConstraint rdfs:subClassOf sh:ContextConstraint . sh:PropertyConstraint rdfs:subClassOf sh:ContextConstraint . sh:InversePropertyConstraint rdfs:subClassOf sh:ContextConstraint .
# validation rules sh:ValidationConstraint rdfs:subClassOf sh:Constraint . sh:MinCountConstraint rdfs:subClassOf sh:ValidationConstraint . sh:MaxCountConstraint rdfs:subClassOf sh:ValidationConstraint . sh:PatternConstraint rdfs:subClassOf sh:ValidationConstraint . ...
Proposed Metamodel
A general SHACL constraint is a actually a conjunction of zero or more basic constraints. The presence of a basic constraint is indicated by the presence of its associated properties which define input parameters that specialize the constraint. To illustrate this, consider the following example:
ex:AliceShape a sh:Shape ; sh:property [ sh:predicate ex:name ; sh:minCount 1 ; sh:maxCount 1 ; sh:nodeKind sh:Literal ; sh:pattern "^Alice" ; sh:flags "i" ]
- the object (a blank node) of
sh:property
is an instance of the classsh:PropertyConstraint
which is a subclass ofsh:Constraint
- the presence of the
sh:minCount
property on the constraint instance indicates that the instance is also an instance ofsh:MinCountConstraint
which is also a subclass ofsh:Constraint
- the object "1" of the
sh:minCount
property is a parameter that is used in the definition ofsh:MinCountConstraint
- the presence of
sh:maxCount
implies the constraint instance is also an instance of the classsh:MaxCountConstraint
parameterized by the value "1" - the presence of
sh:nodeKind
implies the subclasssh:NodeKindConstraint
- the presence of
sh:pattern
implies the subclasssh:PatternConstraint
- the property
sh:flags
is an optional parameter used bysh:PatternConstraint
We can model these constraints as follows.
First consider sh:MinCountConstraint
.
sh:MinCountConstraint a rdfs:Class ; rdfs:subClassOf sh:ValidationConstraint ; sh:parameter sh:minCount .
The sh:parameter
property is a proposed class annotation property. It is multi-valued and indicates which parameters are associated with the constraint. If all of these parameters are present in a sh:Constraint
instance, then this indicates that the instance is an instance of the associated sh:Constraint
subclass, in this case sh:MinCountClass
.
Some constraints also have optional parameters, e.g., sh:PatternConstraint
has the option sh:flags
. If an optional parameter is present, then it modifies the definition of the constraint.
Optional parameters are declared by the proposed multivalued sh:option
class annotation property.
sh:PatternConstraint a rdfs:Class ; rdfs:subClassOf sh:ValidationConstraint ; sh:parameter sh:pattern ; sh:option sh:flags .
Not all constraints apply to each of the three contexts. To declare where a constraint applies, use the proposed multivalued sh:context
class annotation property.
sh:MinCountConstraint
does not apply to the focus node context. sh:PatternConstraint
applies to all three contexts.
sh:MinCountConstraint sh:context sh:PropertyConstraint , sh:InversePropertyConstraint . sh:PatternConstraint sh:context sh:PropertyConstraint , sh:InversePropertyConstraint , sh:NodeConstraint .
The three class annotation properties (sh:parameter
, sh:option
, and sh:context
) are all that are needed to declare the built-in constraints.
Of course, each constraint should also have other RDFS properties, especially rdfs:comment
, to give a prose description of the constraint.
We can provide machine-processable information about the constraint parameters by creating shapes for the constraints. However, these do not replace the normative specification.
For extension constraints, we also need to provide the definitions in the extension language.
For SPARQL, use the following single-valued properties: sh:propSparql
, sh:invPropSparql
, and sh:nodeSparql
.
We should also use these properties to provide compliant SPARQL implementations of the built-in constraints.
ex:MyConstraint a rdfs:Class ; rdfs:subClassOf sh:ValidationConstraint ; sh:parameter ex:myParameter ; sh:context sh:PropertyConstraint , sh:InversePropertyConstraint , sh:NodeConstraint ; sh:nodeSparql = "SELECT ..." ; sh:propSparql = "SELECT ..." ; sh:invPropSparl = "SELECT ..." .
Summary
-
sh:Constraint
is the class of all constraints - each built-in and extension constraint is a subclass of
sh:Constraint
- there are three contexts for constraints:
sh:NodeConstraint
,sh:PropertyConstraint
, andsh:InversePropertyConstraint
- the required and optional parameters for a constraint are declared using
sh:parameter
andsh:option
- the context for a constraint is declared using
sh:context
- extension constraints implemented using SPARQL provide context-dependent implementations using
sh:nodeSparql
,sh:propSparql
, andsh:invPropSparql
The following RDF terms have been either mentioned in the spec or have been discussed before:
-
sh:Constraint
-
sh:PropertyConstraint
-
sh:InversePropertyConstraint
-
sh:NodeConstraint
The following RDF terms are new terms associated with this proposal:
-
sh:parameter
-
sh:option
-
sh:context
-
sh:propSparql
-
sh:invPropSparql
-
sh:nodeSparql
- for each built-in constraint, its associated class, e.g.,
sh:MinCountConstraint
,sh:PatternConstraint
The Shape of a Constraint Subclass
Holger asked: "How can we provide metadata about the shape of a subclass of sh:Constraint
, i.e., How would an editor know to allow sh:parameter
, etc., on sh:MinCountConstraint
, sh:PatternConstraint
, etc.?"
Holger proposed introducing a new metaclass sh:ConstraintType
, making constraints instances of this instead of rdfs:Class
, and making it the scope of a shape.
This would certainly work.
However, there is no need to introduce a new metaclass and use the sh:scopeClass
mechanism because we can directly link any shape to any resource.
Of course, we need to define a shape. The WG resolved that shape information goes into a separate Turtle file, i.e., not included in shacl-vocab.ttl
.
This requirement for a shape is common to any solution.
In Proposal 1, I refer to the properties of the subclasses of sh:Constraint
as class annotations.
Let sh:ConstraintAnnotations
be the associated shape. It looks like:
sh:ConstraintAnnotations a sh:Shape ; sh:property [ sh:predicate sh: parameter ; ... ]; sh:property [ sh:predicate sh:option ; ... ]; ...
For each annotated constraint we add a triple:
sh:ConstraintAnnotations sh:scopeNode sh:MinCountConstraint, sh:MaxCountConstraint, ..., sh:PatternConstraint.
Proposal 2 (edited by Holger)
THIS PROPOSAL WAS AN INTERMEDIATE STEP AND IS NOW WITHDRAWN, SEE PROPOSAL 3 FOR AN UPDATE.
In the absence of a full proposal, here are links to the relevant emails for now:
- http://lists.w3.org/Archives/Public/public-data-shapes-wg/2015Nov/0163.html
- http://lists.w3.org/Archives/Public/public-data-shapes-wg/2015Nov/0165.html
Here is a link to the vocabulary extension that I am working on:
Note that this has replaced the term Argument with Parameter, and Templates have been removed in favor of Validators (which are Parameterizable).
An example definition of a constraint type (same syntax for extensions, but this here is for the actual sh:equals
constraint):
sh:EqualsConstraint a sh:ConstraintType ; rdfs:subClassOf sh:Constraint ; sh:parameter [ sh:predicate sh:equals ] ; sh:message "Value sets of {?predicate} and {?equals} must be equal" ; sh:propertyValidator [ a sh:SPARQLValidator ; sh:sparql "SELECT ..." ; ] . sh:PropertyConstraint rdfs:subClassOf sh:EqualsConstraint .
Another example: sh:class
is reusing a node validation function (sh:hasClass
)
sh:ClassConstraint a sh:ConstraintType ; rdfs:subClassOf sh:Constraint ; sh:parameter [ sh:predicate sh:class ] ; sh:message "Values must be instances of {?class}" ; sh:propertyValidator sh:hasClass ; sh:inversePropertyValidator sh:hasClass ; sh:nodeValidator sh:hasClass . sh:PropertyConstraint rdfs:subClassOf sh:ClassConstraint . sh:InversePropertyConstraint rdfs:subClassOf sh:ClassConstraint . sh:NodeConstraint rdfs:subClassOf sh:ClassConstraint .
Differences from Arthur's Proposal 1
1) Metaclass or not: For the constraint types (such as sh:ClassConstraint
)
Arthur suggests to just use rdfs:Class
. I suggest to use a subclass of
rdfs:Class
(e.g., called sh:ConstraintType
) because this allows us to have
rdfs:domain
assignments for the various properties (e.g., sh:parameter
and
sh:propertyValidator
). Such domain statements are beneficial to drive UI and
set expectations about what properties such constraint types must have. The
usual example is TopBraid-like input forms - if sh:ClassConstraint
is just
an rdfs:Class
, how would I know which properties can be applied? An
alternative to metaclasses might be to define a sh:Shape
, with a suitable
scope. The scope could be "all subclasses of sh:Constraint
" but then it
becomes very hard to use this metadata. In general, I don't see why we would want
to avoid this one metaclass at all costs. There is nothing complex about it,
but instead it makes the design more explicit. This feels like a matter of
personal taste only.
2) Tree under sh:Constraint
: We agree that the various constraint types
(sh:ClassConstraint
, etc.) should be rdfs:subClassOf sh:Constraint
. How does
that class relate to sh:PropertyConstraint
, sh:InversePropertyConstraint
and
sh:NodeConstraint
though? This is not clarified in Arthur's. I guess
these are also subclasses of sh:Constraint
(agreed). But then he introduced
a new property sh:context where he could have simply used rdfs:subClassOf
,
and all problems solved:
sh:Constraint
sh:ClassConstraint sh:PropertyConstraint sh:InversePropertyConstraint sh:PatternConstraint sh:PropertyConstraint sh:NodeConstraint ...
I understand some people have raised concerns against this subclassing, but it has
several advantages. Relying on yet another "inheritance-like" mechanism such
as sh:context
is not going to be supported by tools and any other existing
algorithm, so it creates unnecessary costs and barriers. It also would be much
harder to validate the shapes graph, because the system wouldn't know to apply
the constraints of sh:PatternConstraint
to sh:PropertyConstraint
instances.
3) Validators: This is hopefully easy to resolve. Arthur proposes directly
linking a constraint type with a SPARQL string, but I suggest to introduce
an intermediate class sh:Validator
with one subclass sh:SPARQLValidator
.
This has two advantages
- The validator is an object that can carry additional properties (e.g.,
we will need this for JavaScript support to point at function call + source
file, and then there are annotation properties that Dimitris wanted, as well
as other things like rdfs:comments
or selectors for different platforms such
as a marker for TopBraid-optimized queries or other metadata for execution).
- In Arthur's current proposal, each of
nodeSparql
,propSparql
,invPropSparql
need their own SELECT
query. But in many cases the query is almost identical
and only differs in the injection of the value to validate. In my design a
validator may be a Function returning a boolean. Functions can be backed by
a simple ASK
and the engine turns this into the correct SELECT
query based
on the context.
4) Declaration of parameters: (BTW I agree that the name Template is no
longer appropriate and Parameter is better than Argument). Arthur's design has
sh:parameter
pointing at properties directly. In my design, sh:parameter
points at instances of sh:Parameter
, which have additional values including
most of the constraint properties such as sh:predicate
, sh:datatype
,
sh:nodeKind
, sh:name
, sh:description
as well as sh:optional
. I do not
believe that a single property is sufficient, even though it may appear to
work for the core vocabulary. In my design (inherited from SPIN), the
sh:Parameters
are useful to validate a shapes graph. The fact that we are
using rdfs:domain
and rdfs:range
does not scale to the web where namespaces
and URIs are being reused in multiple classes. As you know it's impossible
to use rdfs:domain
for multiple classes nor to use rdfs:range
for different
value types depending on the context class. So in most cases, sh:parameter
would
need to be combined with a sh:property
constraint which just repeats the same
reference to the property, leading to a disconnect that is adding extra maintenance
burden. Finally, parameters automatically have maxCount=1
.
OTOH, the addition of sh:Parameter
requires a bit of extra code to look for
relevant properties, so this issue is not completely decided for me yet.
(BTW parameters are also used in functions, and there the parameters must
be ordered, e.g., using sh:order
).
(Also, if accepted, then sh:optional
should be renamed to sh:optionalParameter
.)
Proposal 3
This proposal has evolved from the discussion between Arthur, Simon and Holger on 2016-02-10.
Technical details for this proposal can be found in the Turtle file [shacl-vocab-hk.ttl]
which is reasonably complete pending cosmetics and comments.
Constraints and Constraint Components
Each sh:Shape
can have three constraint properties: sh:property
, sh:inverseProperty
, and sh:constraint
.
The values of these properties are instances of the subclasses of sh:Constraint
, which form the following hierarchy:
sh:Constraint sh:AbstractPropertyConstraint # (defines sh:predicate) sh:PropertyConstraint # (default type for sh:property) sh:InversePropertyConstraint # (default type for sh:inverseProperty) sh:NodeConstraint # (default type for sh:constraint)
Extensions can define additional subclasses, such as sh:SPARQLConstraint
.
There is also a class sh:Parameter
which may or may not be a subclass of sh:AbstractPropertyConstraint
(pending discussions).
In the core language, users only ever instantiate sh:PropertyConstraint
, sh:InversePropertyConstraint
, and sh:NodeConstraint
.
However, these constraint instances can combine properties from multiple constraint components.
Example built-in constraint components are sh:ClassConstraintComponent
and sh:PatternConstraintComponent
.
Anyone can define their own constraint components using exactly the same mechanism as the built-in ones.
For example, a sh:PropertyConstraint
may be
ex:MyShape sh:property [ sh:predicate ex:myProperty ; sh:class ex:Person ; sh:pattern "^urn:something:[A-Z]*" ; ] .
A processor looking at the shape above can figure out what to do by going through the declared instances of the class sh:ConstraintComponent
.
Each sh:ConstraintComponent
defines a set of so-called parameters (such as sh:pattern
), some of which may be optional (such as sh:flags
).
Each sh:ConstraintComponent
can also contain machine-readable instructions on how to evaluate a constraint based on the values of the parameters.
In this design, the core language is just one possible vocabulary among others, anyone can add their own constraint properties.
Here is an example from the Turtle file (without labels and comments):
sh:PatternConstraintComponent a sh:ConstraintComponent ; sh:scopeClass sh:NodeConstraint, sh:PropertyConstraint, sh:InversePropertyConstraint ; sh:parameter [ sh:predicate sh:pattern ; ] ; sh:parameter [ sh:predicate sh:flags ; sh:optional true ; ] . sh:pattern a rdf:Property ; rdfs:range xsd:string .
Since each sh:ConstraintComponent
defines a grouping of properties for certain nodes, we could regard them as shapes.
In the design above, the class sh:ConstraintComponent
is a subclass of sh:Shape
, and the values of sh:parameter
are
very similar to those of sh:property
.
The difference is that parameters can have at most one value and not have sh:minCount
/sh:maxCount
, but most other
property characteristics also make a lot of sense to parameter declarations.
Tools and algorithms written for SHACL can be reused if parameters are declared this way.
For example, someone could define their own extension constraint component, to ensure that all values of a property are in a certain language:
ex:LanguageConstraintComponent a sh:ConstraintComponent ; sh:scopeClass sh:PropertyConstraint ; # Only applies to sh:property sh:parameter [ sh:predicate ex:language ; sh:name "language" ; sh:description "The language tag that all values of the property must have, e.g. 'en'." ; sh:datatype xsd:string ; sh:minLength 2 ; sh:maxLength 2 ; sh:pattern "[a-z][a-z]" ; ] . ex:MyShape a sh:Shape ; sh:property [ sh:predicate ex:germanLabel ; ex:language "de" ; ] .
OPTION a: We could use sh:context
instead of sh:scopeClass
above, if we don't want to declare constraint types as shapes:
ex:LanguageConstraintType a sh:ConstraintType ; sh:context sh:PropertyConstraint ; # Only applies to sh:property sh:parameter [ sh:predicate ex:language ; ] .
OPTION b: Or we could do the opposite and not have sh:parameter
but instead use vanilla sh:property
straight away:
ex:LanguageConstraintType a sh:ConstraintType ; sh:scopeClass sh:PropertyConstraint ; # Only applies to sh:property sh:property [ sh:predicate ex:language ; ] .
Validators
Apart from the declaration of parameters, the other part of a sh:ConstraintComponent
is a set of validators.
A validator provides machine-readable instructions on how to produce validation results from the given parameter values (and the focus node).
For each possible context in which a sh:ConstraintComponent
is used (sh:property
, sh:inverseProperty
, and sh:constraint
), different validators can be used.
The properties sh:propertyValidator
, sh:inversePropertyValidator
, and sh:nodeValidator
point at instances of sh:Validator
:
sh:Validator sh:NodeValidationFunction # (for simple boolean tests of all focus nodes) sh:SPARQLValidator # (for arbitrary tests) ... other executable languages
Extending the example above:
ex:LanguageConstraintComponent a sh:ConstraintComponent ; sh:scopeClass sh:PropertyConstraint ; # Only applies to sh:property sh:parameter [ sh:predicate ex:language ; ... ] ; sh:propertyValidator [ a sh:SPARQLValidator ; sh:message "All values must have the language tag '$language'" ; sh:sparql "SELECT $this WHERE { $this $predicate ?value . FILTER (lang(?value) != $language) }" ; ] .
This design means that only a single ConstraintComponent
needs to be declared for each of the three possible use cases.
Furthermore, multiple validators for different environments or executable languages can be provided.
In many cases (such as in the language example above), a simple boolean test over all values is sufficient. This proposal includes node validation functions (similar to the official draft), allowing a compact definition such as:
sh:ClassConstraintComponent a sh:ConstraintComponent ; sh:scopeClass sh:PropertyConstraint, sh:InversePropertyConstraint, sh:NodeConstraint ; sh:parameter [ sh:predicate sh:class ] ; sh:propertyValidator ex:hasClass ; sh:inversePropertyValidator ex:hasClass ; sh:nodeConstraint ex:hasClass .
with complete code-reuse between all three contexts.
In the example above, ex:hasClass
is declared as a sh:NodeValidationFunction
which embeds a SPARQL ASK
query for an input node.
The engine can figure out the rest from the shapes graph:
- For a given constraint (e.g., a
sh:PropertyConstraint
), find all applicablesh:ConstraintComponents
usingsh:scopeClass
(orsh:context
pending discussions) - From those
sh:ConstraintComponents
select those where all non-optional parameters are present - From those
sh:ConstraintComponents
, find the most suitable validator in the given context - Execute the validator using the parameter values as input.
Discussion
This design is trying to address both Arthur's and Holger's concerns from the discussion.
Arthur wants to avoid metaclasses where possible, no subclassing between sh:MinCountConstraint
and sh:PropertyConstraint
and suggested to use shapes for the mix-in mechanism.
Holger wants to make sure that describing constraint components uses the same mechanisms as other data structures so that constraints can be edited, analyzed, and validated with the same algorithms as other data.
Observing that the constraint components are never instantiated anyway, they don't need to be classes.
This is resolving the controversial issues of metaclasses and subclassing.
As an added bonus, this approach also provides a richer syntax to express parameters, e.g., their value types (and the sh:order
for functions), resolving issue 4) from Holger above. The constraint components are shapes, which consistently ensures that "if a property constraint has values for sh:pattern
, then these values must be xsd:strings
". It however does not require that these properties MUST be present at all sh:PropertyConstraints
, because they do not enforce a cardinality constraint. (The class sh:ConstraintComponent
could ensure sh:maxCount=1
on all parameters, but that's optional).
- KC: I find the "constraint, constraint component, parameter" concepts 1) to be deeply non-intuitive (what is called a parameter here is what one would normally call a constraint) and 2) there seems to be a great deal of overlap between constraint components and parameters that I am not sure is necessary. cf "Constraint Component: sh:ClassConstraintComponent; Parameters: sh:class" or "Constraint Component: sh:ClassInConstraintComponent; Parameters:sh:classIn". They all go like that. Unless the constraint component classes are absolutely necessary for the functionality of the language, then I think it should be simplified. I would prefer a "constraint group" (what is now "constraint"), and "constraints" (what is now "parameter"). There can be conceptual types of constraint groups that aid the reader of the document; that doesn't mean that a class must be defined. Karen Coyle (talk) 17:34, 26 April 2016 (UTC)
- HK: Did you look at sh:pattern/sh:flags? The complexity is partially a by-product of scenarios in which multiple parameters are bundled together. This is particularly important for extensions.
- KC: Holger, I looked at section 3.4.3 sh:pattern, is that what you were referring to? That doesn't explain it to me, but I am assuming that this is a function of your application. The question then becomes whether there are other solutions that don't require this redundancy everywhere. I guess that becomes a question for Peter since he has a different application. I believe Dimitris also has an application, so I will ask him as well. Karen Coyle (talk) 17:00, 27 April 2016 (UTC)
- HK: I do not understand what you mean. There is nothing application-specific in the definition of sh:pattern. It was just one example among others that have multiple parameters. Qualified value shapes and closed shapes are others. Maybe move this to emails?
- pfps: Karen, you might want to target these comments on the SHACL document itself, as these concepts show up there as well. See "parameters of constraint components" in Section 2.3 of that document.
- KC: Holger, thanks, I will look at that. Peter, yes, it really stands out in the document. I find the "Constraint" area quite confusing. I tried to read through it to make sure that it used the terms constraint and constraint component consistently but got totally lost. I have suggested to Holger and Dimitris that we need definitions up front in the sections 2.3 and 3 (plus I think section 2.3 should be 3 and 3 should be 3.1...) to make this clearer. Karen Coyle (talk) 14:46, 27 April 2016 (UTC)
- HK: Karen, I have tried to use the example of the beginning of the document to introduce the key terminology (feedback appreciated): https://github.com/w3c/data-shapes/commit/b14f517fa93ebb121d193e3d635ec89db2c1bcca And yes, our goal is to use the terminology consistently. We may not have succeeded yet, so any specific pointers will help us make a better job.
Proposal 4
This proposal collapses shapes and constraints, and does away with the different kinds of constraints. More information can be found at Refactor and Refactor metamodel.
Shape designers create SHACL instances of sh:Shape
. These instances use the core SHACL components to put constraints in shapes. Each such constraint takes the form of a single property on the shape. There is no need for any distinction between the various constraints that go in a shape.
sh:Shape a rdfs:Class .
An example shape (with complete typing) is given here:
ex:MyShape a sh:Shape ; sh:propValues ( ex:myProperty [ a sh:Shape ; sh:class ex:Person ; sh:in ( ex:Susan ex:Bill ) ] ) .
The extension language is driven by component templates, which are SHACL instances of sh:ComponentTemplate
. These templates are also properties and shapes. The SPARQL code for a template is given by sh:sparqlTemplate
.
sh:ComponentTemplate a rdfs Class ; a rdf:Property ; a sh:Shape . sh:sparqlTemplate a rdf:Property ; rdfs:range xs:string .
Component templates then look like:
sh:class a sh:ComponentTemplate ; sh:nodeKind sh:IRI ; sh:sparqlTemplate """ ... $this ... $parameter ... """ .
sh:in a sh:ComponentTemplate ; sh:list [ sh:or ( [ sh:nodeKind sh:IRI ] [ sh:nodeKind sh:Literal ] ) ] ; sh:sparqlTemplate """ ... $this GRAPH $parameterGraph { $parameter rdf:rest*/rdf:first ?possible } ... """ .
sh:nodeKind a sh:ComponentTemplate ; sh:in ( sh:IRI sh:Literal sh:BlankNode ) ; sh:sparqlTemplate """... $this ... $parameter ...""" .
The SPARQL code has access to an environment where $this
is the focus node, $parameter
is the object of the component triple, and $parameterGraph
is the name of a graph containing the neighbourhood of the object. (This could be the entire shapes graph.) Alternatively, it would be possible to eliminate the parameter graph in favour of bindings to fillers of paths, roughly replacing the parameter graph query above with an argument whose initial bindings would be the fillers of the analogous path, written something like sh:propValues ((ex:inArgs (closure rdf:rest) rdf:first) ...)
.
These templates are then used just like core SHACL components. If desired, the core SHACL components can be implemented as component templates.
Meeting minutes SHACL metamodel discussion
Admin
- Date: 10 February 2016
- Attendees: Arthur, Holger, Simon
- Agenda:
- Arthur reviews Proposal 1
- Holger reviews Proposal 2
- Discussion
- Four identified (major) differences between both proposals
- Technical issues concerning HTML generation
Difference 1: Defining a metaclass of all constraints
Arthur's POV
Arthur proposes to refer to the properties of the subclasses of sh:Constraint as class annotations which are specified in an associated shape, e.g.:
sh:ConstraintAnnotations a sh:Shape ; sh:property [ sh:predicate sh: parameter; ... ]; sh:property [ sh:predicate sh:option; ... ]; ...
For each constraint that has to provide those annotations, a triple referring to it must be added:
sh:ConstraintAnnotations sh:scopeNode sh:MinCountConstraint, sh:MaxCountConstraint, ..., sh:PatternConstraint.
Holger's POV
Holger suggests to make constraints instances of a subclass of rdfs:Class (e.g. called sh:ConstraintType) which would allow to have rdfs:domain assignments for properties such as sh:propertyValidator. This metaclass would then ensure that
sh:propertyValidator a rdf:Property ; rdfs:domain sh:ConstraintType .
sh:EqualsConstraint a sh:ConstraintType ; ... sh:propertyValidator [ ... ] .
Discussion
All agree that both approaches are in general feasible.
Holger points out that required sh:scopeNode
triples of Arthur's proposal impose an additional maintenance burden and potential source of errors which all could be addressed by using sh:scopeClass
(i.e., metaclass) instead. He also justifies the necessity of such a metaclass - and subsequently domain statements - by emphasizing gained benefits for building forms (mentioning TopBraid as an example).
Arthur refuses to accept justifications driven by ease of implementation, however, acknowledges Holger's comment regarding scopeNode/scopeClass and clearly states that he won't (strongly/-1) object to a proposal that includes that additional metaclass.
Difference 2: Subclassing between constraint types
That issue resulted in a long and intensive discussion on whether the context (i.e., either property/invproperty/node) of a constraint should be defined via subclassing (Holger's POV) or via sh:context (Arthur's POV); no agreement was reached.
Holger wants to make sure that describing constraint types uses the same mechanisms as other data structures so that constraints can be edited, analyzed and validated with the same algorithms as other data. He also argues that many implementations are using subclass/hierarchy information for form building and he's strongly against introducing yet another "inheritance-like" mechanism since rdfs:subclassOf could be utilized for that purpose.
Arthur's main argument is that context types are mutually disjoint so using the subclass relation is inappropriate hence a new property, sh:context, needs to be introduced to express this relation. He refuses to accept any justifications driven by ease of implementation and emphasizes the faulty use of subclassing in this scenario.
Remark: Holger has made a new Proposal that defines the metaclass of all constraints to be a shape and uses sh:scopeClass
to refer from a specific constraint type to its contexts.
Difference 3: Linking validation implementations to constraint types
Arthur's POV
Arthur suggests to use the single-values properties, such as sh:propSparql, sh:invPropSparql, and sh:nodeSparql
(for SPARQL) to link constraints to their respective implementations.
ex:MyConstraint a rdfs:Class ; rdfs:subClassOf sh:Constraint ; [...] sh:nodeSparql "SELECT ..." ; sh:propSparql "SELECT ..." ; sh:invPropSparl "SELECT ..." .
Holger's POV
Holger suggests to use an intermediate class sh:Validator
having subclasses for respective execution languages such as sh:SPARQLValidator
for SPARQL. Constraints are then linked to their validators using, e.g. sh:propertyValidator, sh:inversePropertyValidator, sh:nodeValidator
.
sh:EqualsConstraint a sh:ConstraintType ; rdfs:subClassOf sh:Constraint ; [...] sh:propertyValidator [ a sh:SPARQLValidator ; sh:sparql "SELECT ..." ; ] .
Discussion
Holger argues that using validator instances instead of single-values properties allows to (i) define/add additional properties that might be required for validation, e.g., required JS libraries or annotations/metadata specific to a certain execution.
Arthur argues that he wanted to start with the simplest possible solution that satisfies our requirements and notes that we still don't have any information/insights about potential requirements of implementations using JS as execution language, hence, are just able to make assumptions of potential requirements.
Holger notes that most of the SPARQL queries that are used for validation only differ in the injection of the value to validate, but are otherwise almost identical. So he proposes that one could define a sh:Function that shall serve as a validator, whereby a SHACL engine would then be responsible for using this validation function in the right context.
Arthur agrees with Holger's observation regarding duplicate code/information but is sceptical about the "context negotiation" which is responsible for injecting the validation function call in the respective context dependend SPARQL query.
Holger mentions that the current SHACL spec is using such validation functions already and they have been proven to be very useful.
Arthur agrees but notes that validation functions can't be applied for any type of constraint.
All agree.
Difference 4: Declaration of parameters
Arthur's POV
The sh:parameter
property is a proposed class annotation property. It is multi-valued and indicates which parameters are associated with the constraint.
Some constraints also have optional parameters, e.g., sh:PatternConstraint
has the option sh:flags
. If an optional parameter is present then it modifies the definition of the constraint. Optional parameters are declared by the proposed, multivalued sh:option
class annotation property.
sh:PatternConstraint
a rdfs:Class; rdfs:subClassOf sh:Constraint; sh:parameter sh:pattern; sh:option sh:flags .
Holger's POV
A sh:parameter
points at instances of sh:Parameter
, which have additional values including most of the constraint properties such as sh:predicate
, sh:datatype
, sh:nodeKind
, sh:name
, sh:description
as well as sh:optional
.
sh:EqualsConstraint
a sh:ConstraintType ; rdfs:subClassOf sh:Constraint ; sh:parameter [ sh:predicate sh:equals ] ; [...]
Discussion
All agree that using the term "Template" is obsolete, and that "arguments" should actually be referred to as "parameters".
Holger notes that in his proposal an additional type of constraint (e.g., sh:ParameterConstraint
) would be required to handle those "more expressive" parameter definitions and mentions that he could live with Arthur's proposal.
He later proposed an alternative solution that comes with a richer syntax for describing parameters, but hasn't been discussed yet.