Warning:
This wiki has been archived and is now read-only.

ISSUE-95: Metamodel simplifications

From RDF Data Shapes Working Group

Jump to: navigation, search

CAUTION TURTLE AREA

The purpose of this page is to collect proposals for simplifying the SHACL metamodel. The less controversial parts of the metamodel have been copied from the SHACL specification and collected into a plain Turtle vocabulary file, shacl-vocab.ttl. This Turtle file is the official source for the vocabulary and is stored in our GitHub repository along with the SHACL specification. The Turtle file has been automatically converted to an HTML file, shacl-vocab.html. Edits MUST be made to the Turtle source file, not the generated HTML file.

One of the primary design goals is to reduce the amount of duplication within the specification in the area of constraints. At present, there are three main classes of constraints: PropertyConstraints, InversePropertyConstraints, and NodeConstraints. However, many constraints could have variants for each of these main classes. Some sort of "mix-in" approach would eliminate the need to repeat each constraint in each class.

We also have the design point that each constraint is injected by virtue of the presence of its parameters. We need to specify this injection mechanism and make it uniformly apply to both built-in constraints and user-defined constraints.

Proposal 1

Author: aryman

The SHACL vocabulary defines terms that appear in shapes. These terms are used by people who write shapes.

The SHACL vocabulary also defines terms that may not appear in shapes. Some of these terms are used by people who write extension constraints. We refer to this part of the vocabulary as the metamodel. The metamodel may be used by SHACL processors. It is therefore desirable that the metamodel be easy to understand and process. It is also desirable for the metamodel to be consistent with RDFS modelling practices.

The main area in which we should simplify the metamodel is constraints. We have built-in constraints and extensions. Built-in constraint are defined by the SHACL spec. Extensions are defined by users and require an implementation in SPARQL or another extension language. The metamodel should define the built-in constraints and provide a mechanism for users to define extensions.

Goals for Simplification

We should simplify the vocabulary in the following areas:

Eliminate the concept of Template since it is a misnomer and we can achieve the same results by annotating Constraint classes.
Avoid metaclasses, and instead use a flat class/property/individual classification for all terms.
Eliminate abstract classes since they are primarily an OO programming concept.
Do not create a separate constraint class for each context in which a constraint may be used, i.e. property constraints, inverse property constraints, and focus node constraints. Instead add annotations to the Constraint class that specify where the constraint may be used.

The spec currently uses the concept of a Template and models both built-in and extension constrains as templates. I propose that we eliminate the concept of a Template. First of all, a template is a text processing concept in which one defines a textual skeleton for some artifact. The skeleton includes some slots where parameters can be filled in. However, the concept of template does not apply in our case. For SPARQL, we define a set of variables that can be used in the SPARQL query. We do this because SPARQL provides no call interface, e.g., functions or subroutines. Other languages such as Javascript do provide functions, so the SHACL engine would pass the system variables into the function. There are no templates involved. Therefore Template is a misnomer. Second, we already have the concept of a Constraint, so we don't need another concept. We can simply associate the SPARQL or Javascript with the constraint.

We should avoid the use of metaclasses, since these make the metamodel harder to understand. We should produce a model in which each resource can be classified as either a class, a property, or an individual.

Early versions of the metamodel included abstract classes. This may be good OO design but adds a lot of unnecessary terms to the vocabulary. We should avoid them completely.

Finally, the three different contexts in which a constraint may appear should be defined via annotations of a single Constraint subclass instead of required a separate subclass for each context. This will reduce the number of Constraint subclasses by almost a factor of three.

Constraint Contexts

I have used the term context in the preceding discussion but have not defined it precisely. Context is a key concept in SHACL. However, the specification treats it implicitly. We should spell out this concept in the specification.

By context, I mean the set of nodes that a constraint instance takes as input. For example, sh:MinCountConstraint counts the number of nodes in the context and reports a violation if the number of nodes is less than the value given by the sh:minCount parameter of the constraint instance. sh:PatternConstraint matches each node in the context against the regular expression given by the sh:pattern parameter and reports a violation if any of these nodes does not match the pattern.

In general, a constraint applies a set of validation rules to a context. The context is determined by the focus node and the context type of the constraint. SHACL defines three types of context, which correspond to the properties sh:constraint, sh:property, and sh:inverseProperty. We have introduced the classes sh:NodeConstraint, sh:PropertyConstraint, and sh:InversePropertyConstraint, which are each subclasses of sh:Constraint, for these context types. The proposed definition of these classes is as follows:

sh:NodeConstraint is the set of all constraint instances in which the context is computed by taking just the focus node. The context in this case always consists of exactly one node.

sh:PropertyConstraint is the set of all constraint instances in which the context is computed by taking the set of all object nodes C of triples of the form (F, P, C) in the data graph where F is the focus node and P is the term given by the sh:predicate parameter of the constraint. The context in this case consists of zero or more nodes.

sh:InversePropertyConstraint is the set of all constraint instances in which the context is computed by taking the set of all subject nodes C of triples of the form (C, P, F) in the data graph where F is the focus node and P is the term given by the sh:predicate parameter of the constraint.

Every constraint instance MUST have an unambiguous context rule associated with it. Therefore sh:NodeConstraint, sh:PropertyConstraint, and sh:InversePropertyConstraint are mutually disjoint. It is conceivable that the WG may identify other useful context rules, e.g., those defined using SPARQL property paths. For now, let's assume we have just these three context types.

In this proposal, I introduced the constraint class annotation property sh:context that links a constraint class to the contexts in which it applies. For example, sh:MinCountConstraint is useless in the context sh:NodeConstraint, since the context always contains exactly one node. However, most of the constraints defined by SHACL do make sense in any context.

Holger asked, "Why use sh:context instead of rdfs:subClassOf?". The answer is that the context types are mutually disjoint so using the subclass relation is inappropriate. Recall that if A rdfs:subClassOf B, C then every instance of A is also an instance of B and C. But if B and C are disjoint then they have no instances in common. The relation between a constraint and its context rule is therefore not a subclass relation. Hence, we need to introduce a new property, sh:context, to express this relation.

Since each constraint is composed of a context rule and a set of validation rules that operate on the context, the metamodel might be clearer if we introduced subclasses of sh:Constraint as follows:


 # context rules
 sh:ContextConstraint          rdfs:subClassOf  sh:Constraint .
 sh:NodeConstraint             rdfs:subClassOf  sh:ContextConstraint .
 sh:PropertyConstraint         rdfs:subClassOf  sh:ContextConstraint .
 sh:InversePropertyConstraint  rdfs:subClassOf  sh:ContextConstraint .

 # validation rules
 sh:ValidationConstraint  rdfs:subClassOf  sh:Constraint .
 sh:MinCountConstraint    rdfs:subClassOf  sh:ValidationConstraint .
 sh:MaxCountConstraint    rdfs:subClassOf  sh:ValidationConstraint .
 sh:PatternConstraint     rdfs:subClassOf  sh:ValidationConstraint .
 ...

Proposed Metamodel

A general SHACL constraint is a actually a conjunction of zero or more basic constraints. The presence of a basic constraint is indicated by the presence of its associated properties which define input parameters that specialize the constraint. To illustrate this, consider the following example:


 ex:AliceShape a sh:Shape ;
   sh:property [
     sh:predicate ex:name ;
     sh:minCount 1 ;
     sh:maxCount 1 ;
     sh:nodeKind sh:Literal ;
     sh:pattern "^Alice" ;
     sh:flags "i"
   ]

the object (a blank node) of sh:property is an instance of the class sh:PropertyConstraint which is a subclass of sh:Constraint
the presence of the sh:minCount property on the constraint instance indicates that the instance is also an instance of sh:MinCountConstraint which is also a subclass of sh:Constraint
the object "1" of the sh:minCount property is a parameter that is used in the definition of sh:MinCountConstraint
the presence of sh:maxCount implies the constraint instance is also an instance of the class sh:MaxCountConstraint parameterized by the value "1"
the presence of sh:nodeKind implies the subclass sh:NodeKindConstraint
the presence of sh:pattern implies the subclass sh:PatternConstraint
the property sh:flags is an optional parameter used by sh:PatternConstraint

We can model these constraints as follows. First consider sh:MinCountConstraint.


 sh:MinCountConstraint
   a                 rdfs:Class ;
   rdfs:subClassOf   sh:ValidationConstraint ;
   sh:parameter      sh:minCount .

The sh:parameter property is a proposed class annotation property. It is multi-valued and indicates which parameters are associated with the constraint. If all of these parameters are present in a sh:Constraint instance, then this indicates that the instance is an instance of the associated sh:Constraint subclass, in this case sh:MinCountClass.

Some constraints also have optional parameters, e.g., sh:PatternConstraint has the option sh:flags. If an optional parameter is present, then it modifies the definition of the constraint. Optional parameters are declared by the proposed multivalued sh:option class annotation property.


 sh:PatternConstraint 
   a                rdfs:Class ;
   rdfs:subClassOf  sh:ValidationConstraint ;
   sh:parameter     sh:pattern ;
   sh:option        sh:flags .

Not all constraints apply to each of the three contexts. To declare where a constraint applies, use the proposed multivalued sh:context class annotation property. sh:MinCountConstraint does not apply to the focus node context. sh:PatternConstraint applies to all three contexts.


 sh:MinCountConstraint  sh:context  sh:PropertyConstraint , 
                                    sh:InversePropertyConstraint .
 sh:PatternConstraint   sh:context  sh:PropertyConstraint , 
                                    sh:InversePropertyConstraint , 
                                    sh:NodeConstraint .

The three class annotation properties (sh:parameter, sh:option, and sh:context) are all that are needed to declare the built-in constraints. Of course, each constraint should also have other RDFS properties, especially rdfs:comment, to give a prose description of the constraint.

We can provide machine-processable information about the constraint parameters by creating shapes for the constraints. However, these do not replace the normative specification.

For extension constraints, we also need to provide the definitions in the extension language. For SPARQL, use the following single-valued properties: sh:propSparql, sh:invPropSparql, and sh:nodeSparql. We should also use these properties to provide compliant SPARQL implementations of the built-in constraints.


 ex:MyConstraint
   a                rdfs:Class ;
   rdfs:subClassOf  sh:ValidationConstraint ;
   sh:parameter     ex:myParameter ;
   sh:context       sh:PropertyConstraint , 
                    sh:InversePropertyConstraint , 
                    sh:NodeConstraint ;
   sh:nodeSparql = "SELECT ..." ;
   sh:propSparql = "SELECT ..." ;
   sh:invPropSparl = "SELECT ..." .

Summary

sh:Constraint is the class of all constraints
each built-in and extension constraint is a subclass of sh:Constraint
there are three contexts for constraints: sh:NodeConstraint, sh:PropertyConstraint, and sh:InversePropertyConstraint
the required and optional parameters for a constraint are declared using sh:parameter and sh:option
the context for a constraint is declared using sh:context
extension constraints implemented using SPARQL provide context-dependent implementations using sh:nodeSparql, sh:propSparql, and sh:invPropSparql

The following RDF terms have been either mentioned in the spec or have been discussed before:

sh:Constraint
sh:PropertyConstraint
sh:InversePropertyConstraint
sh:NodeConstraint

The following RDF terms are new terms associated with this proposal:

sh:parameter
sh:option
sh:context
sh:propSparql
sh:invPropSparql
sh:nodeSparql
for each built-in constraint, its associated class, e.g., sh:MinCountConstraint, sh:PatternConstraint

The Shape of a Constraint Subclass

Holger asked: "How can we provide metadata about the shape of a subclass of sh:Constraint, i.e., How would an editor know to allow sh:parameter, etc., on sh:MinCountConstraint, sh:PatternConstraint, etc.?" Holger proposed introducing a new metaclass sh:ConstraintType, making constraints instances of this instead of rdfs:Class, and making it the scope of a shape. This would certainly work. However, there is no need to introduce a new metaclass and use the sh:scopeClass mechanism because we can directly link any shape to any resource.

Of course, we need to define a shape. The WG resolved that shape information goes into a separate Turtle file, i.e., not included in shacl-vocab.ttl. This requirement for a shape is common to any solution. In Proposal 1, I refer to the properties of the subclasses of sh:Constraint as class annotations. Let sh:ConstraintAnnotations be the associated shape. It looks like:


 sh:ConstraintAnnotations a sh:Shape ;
   sh:property [
     sh:predicate sh: parameter ;
     ... ];
   sh:property [
     sh:predicate sh:option ;
     ... ];
   ...

For each annotated constraint we add a triple:


 sh:ConstraintAnnotations sh:scopeNode 
   sh:MinCountConstraint, 
   sh:MaxCountConstraint, 
   ..., 
   sh:PatternConstraint.

Proposal 2 (edited by Holger)

THIS PROPOSAL WAS AN INTERMEDIATE STEP AND IS NOW WITHDRAWN, SEE PROPOSAL 3 FOR AN UPDATE.

In the absence of a full proposal, here are links to the relevant emails for now:

Here is a link to the vocabulary extension that I am working on:

https://github.com/w3c/data-shapes/blob/gh-pages/shacl/shacl-vocab-hk.ttl

Note that this has replaced the term Argument with Parameter, and Templates have been removed in favor of Validators (which are Parameterizable).

An example definition of a constraint type (same syntax for extensions, but this here is for the actual sh:equals constraint):

   sh:EqualsConstraint
       a sh:ConstraintType ;
       rdfs:subClassOf sh:Constraint ;
       sh:parameter [ sh:predicate sh:equals ] ;
       sh:message "Value sets of {?predicate} and {?equals} must be equal" ;
       sh:propertyValidator [
           a sh:SPARQLValidator ;
           sh:sparql "SELECT ..." ;
       ] .
   
   sh:PropertyConstraint rdfs:subClassOf sh:EqualsConstraint .

Another example: sh:class is reusing a node validation function (sh:hasClass)

   sh:ClassConstraint
       a sh:ConstraintType ;
       rdfs:subClassOf sh:Constraint ;
       sh:parameter [ sh:predicate sh:class ] ;
       sh:message "Values must be instances of {?class}" ;
       sh:propertyValidator sh:hasClass ;
       sh:inversePropertyValidator sh:hasClass ;
       sh:nodeValidator sh:hasClass .
   
   sh:PropertyConstraint rdfs:subClassOf sh:ClassConstraint .
   sh:InversePropertyConstraint rdfs:subClassOf sh:ClassConstraint .
   sh:NodeConstraint rdfs:subClassOf sh:ClassConstraint .

Differences from Arthur's Proposal 1

1) Metaclass or not: For the constraint types (such as sh:ClassConstraint) Arthur suggests to just use rdfs:Class. I suggest to use a subclass of rdfs:Class (e.g., called sh:ConstraintType) because this allows us to have rdfs:domain assignments for the various properties (e.g., sh:parameter and sh:propertyValidator). Such domain statements are beneficial to drive UI and set expectations about what properties such constraint types must have. The usual example is TopBraid-like input forms - if sh:ClassConstraint is just an rdfs:Class, how would I know which properties can be applied? An alternative to metaclasses might be to define a sh:Shape, with a suitable scope. The scope could be "all subclasses of sh:Constraint" but then it becomes very hard to use this metadata. In general, I don't see why we would want to avoid this one metaclass at all costs. There is nothing complex about it, but instead it makes the design more explicit. This feels like a matter of personal taste only.

2) Tree under sh:Constraint: We agree that the various constraint types (sh:ClassConstraint, etc.) should be rdfs:subClassOf sh:Constraint. How does that class relate to sh:PropertyConstraint, sh:InversePropertyConstraint and sh:NodeConstraint though? This is not clarified in Arthur's. I guess these are also subclasses of sh:Constraint (agreed). But then he introduced a new property sh:context where he could have simply used rdfs:subClassOf, and all problems solved:

sh:Constraint

   sh:ClassConstraint
       sh:PropertyConstraint
       sh:InversePropertyConstraint
   sh:PatternConstraint
       sh:PropertyConstraint
       sh:NodeConstraint ...

I understand some people have raised concerns against this subclassing, but it has several advantages. Relying on yet another "inheritance-like" mechanism such as sh:context is not going to be supported by tools and any other existing algorithm, so it creates unnecessary costs and barriers. It also would be much harder to validate the shapes graph, because the system wouldn't know to apply the constraints of sh:PatternConstraint to sh:PropertyConstraint instances.

3) Validators: This is hopefully easy to resolve. Arthur proposes directly linking a constraint type with a SPARQL string, but I suggest to introduce an intermediate class sh:Validator with one subclass sh:SPARQLValidator. This has two advantages

The validator is an object that can carry additional properties (e.g.,

we will need this for JavaScript support to point at function call + source file, and then there are annotation properties that Dimitris wanted, as well as other things like rdfs:comments or selectors for different platforms such as a marker for TopBraid-optimized queries or other metadata for execution).

In Arthur's current proposal, each of nodeSparql, propSparql, invPropSparql

need their own SELECT query. But in many cases the query is almost identical and only differs in the injection of the value to validate. In my design a validator may be a Function returning a boolean. Functions can be backed by a simple ASK and the engine turns this into the correct SELECT query based on the context.

4) Declaration of parameters: (BTW I agree that the name Template is no longer appropriate and Parameter is better than Argument). Arthur's design has sh:parameter pointing at properties directly. In my design, sh:parameter points at instances of sh:Parameter, which have additional values including most of the constraint properties such as sh:predicate, sh:datatype, sh:nodeKind, sh:name, sh:description as well as sh:optional. I do not believe that a single property is sufficient, even though it may appear to work for the core vocabulary. In my design (inherited from SPIN), the sh:Parameters are useful to validate a shapes graph. The fact that we are using rdfs:domain and rdfs:range does not scale to the web where namespaces and URIs are being reused in multiple classes. As you know it's impossible to use rdfs:domain for multiple classes nor to use rdfs:range for different value types depending on the context class. So in most cases, sh:parameter would need to be combined with a sh:property constraint which just repeats the same reference to the property, leading to a disconnect that is adding extra maintenance burden. Finally, parameters automatically have maxCount=1. OTOH, the addition of sh:Parameter requires a bit of extra code to look for relevant properties, so this issue is not completely decided for me yet. (BTW parameters are also used in functions, and there the parameters must be ordered, e.g., using sh:order). (Also, if accepted, then sh:optional should be renamed to sh:optionalParameter.)

Proposal 3

This proposal has evolved from the discussion between Arthur, Simon and Holger on 2016-02-10.

Technical details for this proposal can be found in the Turtle file [shacl-vocab-hk.ttl] which is reasonably complete pending cosmetics and comments.

Constraints and Constraint Components

Each sh:Shape can have three constraint properties: sh:property, sh:inverseProperty, and sh:constraint. The values of these properties are instances of the subclasses of sh:Constraint, which form the following hierarchy:

   sh:Constraint
       sh:AbstractPropertyConstraint        # (defines sh:predicate)
           sh:PropertyConstraint            # (default type for sh:property)
           sh:InversePropertyConstraint     # (default type for sh:inverseProperty)
       sh:NodeConstraint                    # (default type for sh:constraint)

Extensions can define additional subclasses, such as sh:SPARQLConstraint. There is also a class sh:Parameter which may or may not be a subclass of sh:AbstractPropertyConstraint (pending discussions).

In the core language, users only ever instantiate sh:PropertyConstraint, sh:InversePropertyConstraint, and sh:NodeConstraint. However, these constraint instances can combine properties from multiple constraint components. Example built-in constraint components are sh:ClassConstraintComponent and sh:PatternConstraintComponent. Anyone can define their own constraint components using exactly the same mechanism as the built-in ones. For example, a sh:PropertyConstraint may be

   ex:MyShape
       sh:property [
           sh:predicate ex:myProperty ;
           sh:class ex:Person ;
           sh:pattern "^urn:something:[A-Z]*" ;
       ] .

A processor looking at the shape above can figure out what to do by going through the declared instances of the class sh:ConstraintComponent. Each sh:ConstraintComponent defines a set of so-called parameters (such as sh:pattern), some of which may be optional (such as sh:flags). Each sh:ConstraintComponent can also contain machine-readable instructions on how to evaluate a constraint based on the values of the parameters. In this design, the core language is just one possible vocabulary among others, anyone can add their own constraint properties.

Here is an example from the Turtle file (without labels and comments):

   sh:PatternConstraintComponent
       a sh:ConstraintComponent ;
       sh:scopeClass sh:NodeConstraint, sh:PropertyConstraint, sh:InversePropertyConstraint ;
       sh:parameter [
           sh:predicate sh:pattern ;
       ] ;
       sh:parameter [
           sh:predicate sh:flags ;
           sh:optional true ;
       ] .
   
   sh:pattern
       a rdf:Property ;
       rdfs:range xsd:string .

Since each sh:ConstraintComponent defines a grouping of properties for certain nodes, we could regard them as shapes. In the design above, the class sh:ConstraintComponent is a subclass of sh:Shape, and the values of sh:parameter are very similar to those of sh:property. The difference is that parameters can have at most one value and not have sh:minCount/sh:maxCount, but most other property characteristics also make a lot of sense to parameter declarations. Tools and algorithms written for SHACL can be reused if parameters are declared this way.

For example, someone could define their own extension constraint component, to ensure that all values of a property are in a certain language:

   ex:LanguageConstraintComponent
       a sh:ConstraintComponent ;
       sh:scopeClass sh:PropertyConstraint ;    # Only applies to sh:property
       sh:parameter [
           sh:predicate ex:language ;
           sh:name "language" ;
           sh:description "The language tag that all values of the property must have, e.g. 'en'." ;
           sh:datatype xsd:string ;
           sh:minLength 2 ;
           sh:maxLength 2 ;
           sh:pattern "[a-z][a-z]" ;
       ] .
   
   ex:MyShape
       a sh:Shape ;
       sh:property [
           sh:predicate ex:germanLabel ;
           ex:language "de" ;
       ] .

OPTION a: We could use sh:context instead of sh:scopeClass above, if we don't want to declare constraint types as shapes:

   ex:LanguageConstraintType
       a sh:ConstraintType ;
       sh:context sh:PropertyConstraint ;    # Only applies to sh:property
       sh:parameter [
           sh:predicate ex:language ;
       ] .

OPTION b: Or we could do the opposite and not have sh:parameter but instead use vanilla sh:property straight away:

   ex:LanguageConstraintType
       a sh:ConstraintType ;
       sh:scopeClass sh:PropertyConstraint ;    # Only applies to sh:property
       sh:property [
           sh:predicate ex:language ;
       ] .

Validators

Apart from the declaration of parameters, the other part of a sh:ConstraintComponent is a set of validators. A validator provides machine-readable instructions on how to produce validation results from the given parameter values (and the focus node). For each possible context in which a sh:ConstraintComponent is used (sh:property, sh:inverseProperty, and sh:constraint), different validators can be used. The properties sh:propertyValidator, sh:inversePropertyValidator, and sh:nodeValidator point at instances of sh:Validator:

   sh:Validator
       sh:NodeValidationFunction               # (for simple boolean tests of all focus nodes)
       sh:SPARQLValidator                      # (for arbitrary tests)
       ... other executable languages

Extending the example above:

   ex:LanguageConstraintComponent
      a sh:ConstraintComponent ;
      sh:scopeClass sh:PropertyConstraint ;    # Only applies to sh:property
      sh:parameter [
          sh:predicate ex:language ;
          ...
      ] ;
      sh:propertyValidator [
          a sh:SPARQLValidator ;
          sh:message "All values must have the language tag '$language'" ;
          sh:sparql "SELECT $this WHERE { $this $predicate ?value . FILTER (lang(?value) != $language) }" ;
      ] .

This design means that only a single ConstraintComponent needs to be declared for each of the three possible use cases. Furthermore, multiple validators for different environments or executable languages can be provided.

In many cases (such as in the language example above), a simple boolean test over all values is sufficient. This proposal includes node validation functions (similar to the official draft), allowing a compact definition such as:

   sh:ClassConstraintComponent
       a sh:ConstraintComponent ;
       sh:scopeClass sh:PropertyConstraint, sh:InversePropertyConstraint, sh:NodeConstraint ;
       sh:parameter [ sh:predicate sh:class ] ;
       sh:propertyValidator ex:hasClass ;
       sh:inversePropertyValidator ex:hasClass ;
       sh:nodeConstraint ex:hasClass .

with complete code-reuse between all three contexts. In the example above, ex:hasClass is declared as a sh:NodeValidationFunction which embeds a SPARQL ASK query for an input node.

The engine can figure out the rest from the shapes graph:

For a given constraint (e.g., a sh:PropertyConstraint), find all applicable sh:ConstraintComponents using sh:scopeClass (or sh:context pending discussions)
From those sh:ConstraintComponents select those where all non-optional parameters are present
From those sh:ConstraintComponents, find the most suitable validator in the given context
Execute the validator using the parameter values as input.

Discussion

This design is trying to address both Arthur's and Holger's concerns from the discussion. Arthur wants to avoid metaclasses where possible, no subclassing between sh:MinCountConstraint and sh:PropertyConstraint and suggested to use shapes for the mix-in mechanism. Holger wants to make sure that describing constraint components uses the same mechanisms as other data structures so that constraints can be edited, analyzed, and validated with the same algorithms as other data. Observing that the constraint components are never instantiated anyway, they don't need to be classes. This is resolving the controversial issues of metaclasses and subclassing.

As an added bonus, this approach also provides a richer syntax to express parameters, e.g., their value types (and the sh:order for functions), resolving issue 4) from Holger above. The constraint components are shapes, which consistently ensures that "if a property constraint has values for sh:pattern, then these values must be xsd:strings". It however does not require that these properties MUST be present at all sh:PropertyConstraints, because they do not enforce a cardinality constraint. (The class sh:ConstraintComponent could ensure sh:maxCount=1 on all parameters, but that's optional).

KC: I find the "constraint, constraint component, parameter" concepts 1) to be deeply non-intuitive (what is called a parameter here is what one would normally call a constraint) and 2) there seems to be a great deal of overlap between constraint components and parameters that I am not sure is necessary. cf "Constraint Component: sh:ClassConstraintComponent; Parameters: sh:class" or "Constraint Component: sh:ClassInConstraintComponent; Parameters:sh:classIn". They all go like that. Unless the constraint component classes are absolutely necessary for the functionality of the language, then I think it should be simplified. I would prefer a "constraint group" (what is now "constraint"), and "constraints" (what is now "parameter"). There can be conceptual types of constraint groups that aid the reader of the document; that doesn't mean that a class must be defined. Karen Coyle (talk) 17:34, 26 April 2016 (UTC)

HK: Did you look at sh:pattern/sh:flags? The complexity is partially a by-product of scenarios in which multiple parameters are bundled together. This is particularly important for extensions.

KC: Holger, I looked at section 3.4.3 sh:pattern, is that what you were referring to? That doesn't explain it to me, but I am assuming that this is a function of your application. The question then becomes whether there are other solutions that don't require this redundancy everywhere. I guess that becomes a question for Peter since he has a different application. I believe Dimitris also has an application, so I will ask him as well. Karen Coyle (talk) 17:00, 27 April 2016 (UTC)

HK: I do not understand what you mean. There is nothing application-specific in the definition of sh:pattern. It was just one example among others that have multiple parameters. Qualified value shapes and closed shapes are others. Maybe move this to emails?

pfps: Karen, you might want to target these comments on the SHACL document itself, as these concepts show up there as well. See "parameters of constraint components" in Section 2.3 of that document.

KC: Holger, thanks, I will look at that. Peter, yes, it really stands out in the document. I find the "Constraint" area quite confusing. I tried to read through it to make sure that it used the terms constraint and constraint component consistently but got totally lost. I have suggested to Holger and Dimitris that we need definitions up front in the sections 2.3 and 3 (plus I think section 2.3 should be 3 and 3 should be 3.1...) to make this clearer. Karen Coyle (talk) 14:46, 27 April 2016 (UTC)

HK: Karen, I have tried to use the example of the beginning of the document to introduce the key terminology (feedback appreciated): https://github.com/w3c/data-shapes/commit/b14f517fa93ebb121d193e3d635ec89db2c1bcca And yes, our goal is to use the terminology consistently. We may not have succeeded yet, so any specific pointers will help us make a better job.

Proposal 4

This proposal collapses shapes and constraints, and does away with the different kinds of constraints. More information can be found at Refactor and Refactor metamodel.

Shape designers create SHACL instances of sh:Shape. These instances use the core SHACL components to put constraints in shapes. Each such constraint takes the form of a single property on the shape. There is no need for any distinction between the various constraints that go in a shape.

 sh:Shape a rdfs:Class .

An example shape (with complete typing) is given here:

 ex:MyShape a sh:Shape ;
   sh:propValues ( ex:myProperty
 	         [ a sh:Shape ;
	       	   sh:class ex:Person ;
		   sh:in ( ex:Susan ex:Bill ) ] ) .

The extension language is driven by component templates, which are SHACL instances of sh:ComponentTemplate. These templates are also properties and shapes. The SPARQL code for a template is given by sh:sparqlTemplate.

 sh:ComponentTemplate a rdfs Class ; a rdf:Property ; a sh:Shape .
 sh:sparqlTemplate a rdf:Property ; rdfs:range xs:string .

Component templates then look like:

 sh:class a sh:ComponentTemplate ;
   sh:nodeKind sh:IRI ;
   sh:sparqlTemplate """ ... $this ... $parameter ... """ .

 sh:in a sh:ComponentTemplate ;
   sh:list [ sh:or ( [ sh:nodeKind sh:IRI ] [ sh:nodeKind sh:Literal ] ) ] ;
   sh:sparqlTemplate """ ... $this 
                         GRAPH $parameterGraph { $parameter rdf:rest*/rdf:first ?possible }
                         ... """ .

 sh:nodeKind a sh:ComponentTemplate ;
   sh:in ( sh:IRI sh:Literal sh:BlankNode ) ;
   sh:sparqlTemplate """... $this ... $parameter ...""" .

The SPARQL code has access to an environment where $this is the focus node, $parameter is the object of the component triple, and $parameterGraph is the name of a graph containing the neighbourhood of the object. (This could be the entire shapes graph.) Alternatively, it would be possible to eliminate the parameter graph in favour of bindings to fillers of paths, roughly replacing the parameter graph query above with an argument whose initial bindings would be the fillers of the analogous path, written something like sh:propValues ((ex:inArgs (closure rdf:rest) rdf:first) ...).

These templates are then used just like core SHACL components. If desired, the core SHACL components can be implemented as component templates.

Meeting minutes SHACL metamodel discussion

Admin

Date: 10 February 2016
Attendees: Arthur, Holger, Simon
Agenda:
- Arthur reviews Proposal 1
- Holger reviews Proposal 2
- Discussion
  - Four identified (major) differences between both proposals
  - Technical issues concerning HTML generation

Difference 1: Defining a metaclass of all constraints

Arthur's POV

Arthur proposes to refer to the properties of the subclasses of sh:Constraint as class annotations which are specified in an associated shape, e.g.:


sh:ConstraintAnnotations a sh:Shape ;
  sh:property [
    sh:predicate sh: parameter;
    ... ];
  sh:property [
    sh:predicate sh:option;
    ... ];
  ...

For each constraint that has to provide those annotations, a triple referring to it must be added:


sh:ConstraintAnnotations sh:scopeNode 
  sh:MinCountConstraint, 
  sh:MaxCountConstraint, 
  ..., 
  sh:PatternConstraint.

Holger's POV

Holger suggests to make constraints instances of a subclass of rdfs:Class (e.g. called sh:ConstraintType) which would allow to have rdfs:domain assignments for properties such as sh:propertyValidator. This metaclass would then ensure that


sh:propertyValidator
  a rdf:Property ;
  rdfs:domain sh:ConstraintType .

sh:EqualsConstraint
 a sh:ConstraintType ;
 ...
 sh:propertyValidator [ ... ] .

Discussion

All agree that both approaches are in general feasible.

Holger points out that required sh:scopeNode triples of Arthur's proposal impose an additional maintenance burden and potential source of errors which all could be addressed by using sh:scopeClass (i.e., metaclass) instead. He also justifies the necessity of such a metaclass - and subsequently domain statements - by emphasizing gained benefits for building forms (mentioning TopBraid as an example).

Arthur refuses to accept justifications driven by ease of implementation, however, acknowledges Holger's comment regarding scopeNode/scopeClass and clearly states that he won't (strongly/-1) object to a proposal that includes that additional metaclass.

Difference 2: Subclassing between constraint types

That issue resulted in a long and intensive discussion on whether the context (i.e., either property/invproperty/node) of a constraint should be defined via subclassing (Holger's POV) or via sh:context (Arthur's POV); no agreement was reached.

Holger wants to make sure that describing constraint types uses the same mechanisms as other data structures so that constraints can be edited, analyzed and validated with the same algorithms as other data. He also argues that many implementations are using subclass/hierarchy information for form building and he's strongly against introducing yet another "inheritance-like" mechanism since rdfs:subclassOf could be utilized for that purpose.

Arthur's main argument is that context types are mutually disjoint so using the subclass relation is inappropriate hence a new property, sh:context, needs to be introduced to express this relation. He refuses to accept any justifications driven by ease of implementation and emphasizes the faulty use of subclassing in this scenario.

Remark: Holger has made a new Proposal that defines the metaclass of all constraints to be a shape and uses sh:scopeClass to refer from a specific constraint type to its contexts.

Difference 3: Linking validation implementations to constraint types

Arthur's POV

Arthur suggests to use the single-values properties, such as sh:propSparql, sh:invPropSparql, and sh:nodeSparql (for SPARQL) to link constraints to their respective implementations.


 ex:MyConstraint 
   a rdfs:Class ;
   rdfs:subClassOf sh:Constraint ;
   [...]
   sh:nodeSparql "SELECT ..." ;
   sh:propSparql "SELECT ..." ;
   sh:invPropSparl "SELECT ..." .

Holger's POV

Holger suggests to use an intermediate class sh:Validator having subclasses for respective execution languages such as sh:SPARQLValidator for SPARQL. Constraints are then linked to their validators using, e.g. sh:propertyValidator, sh:inversePropertyValidator, sh:nodeValidator.


 sh:EqualsConstraint
   a sh:ConstraintType ;
   rdfs:subClassOf sh:Constraint ;
   [...]
   sh:propertyValidator [
       a sh:SPARQLValidator ;
       sh:sparql "SELECT ..." ;
   ] .

Discussion

Holger argues that using validator instances instead of single-values properties allows to (i) define/add additional properties that might be required for validation, e.g., required JS libraries or annotations/metadata specific to a certain execution.

Arthur argues that he wanted to start with the simplest possible solution that satisfies our requirements and notes that we still don't have any information/insights about potential requirements of implementations using JS as execution language, hence, are just able to make assumptions of potential requirements.

Holger notes that most of the SPARQL queries that are used for validation only differ in the injection of the value to validate, but are otherwise almost identical. So he proposes that one could define a sh:Function that shall serve as a validator, whereby a SHACL engine would then be responsible for using this validation function in the right context.

Arthur agrees with Holger's observation regarding duplicate code/information but is sceptical about the "context negotiation" which is responsible for injecting the validation function call in the respective context dependend SPARQL query.

Holger mentions that the current SHACL spec is using such validation functions already and they have been proven to be very useful.

Arthur agrees but notes that validation functions can't be applied for any type of constraint.

All agree.

Difference 4: Declaration of parameters

Arthur's POV

The sh:parameter property is a proposed class annotation property. It is multi-valued and indicates which parameters are associated with the constraint.

Some constraints also have optional parameters, e.g., sh:PatternConstraint has the option sh:flags. If an optional parameter is present then it modifies the definition of the constraint. Optional parameters are declared by the proposed, multivalued sh:option class annotation property. sh:PatternConstraint


 a rdfs:Class;
 rdfs:subClassOf sh:Constraint;
 sh:parameter sh:pattern;
 sh:option sh:flags .

Holger's POV

A sh:parameter points at instances of sh:Parameter, which have additional values including most of the constraint properties such as sh:predicate, sh:datatype, sh:nodeKind, sh:name, sh:description as well as sh:optional. sh:EqualsConstraint


 a sh:ConstraintType ;
 rdfs:subClassOf sh:Constraint ;
 sh:parameter [ sh:predicate sh:equals ] ;
 [...]

Discussion

All agree that using the term "Template" is obsolete, and that "arguments" should actually be referred to as "parameters".

Holger notes that in his proposal an additional type of constraint (e.g., sh:ParameterConstraint) would be required to handle those "more expressive" parameter definitions and mentions that he could live with Arthur's proposal.

He later proposed an alternative solution that comes with a richer syntax for describing parameters, but hasn't been discussed yet.

Retrieved from "https://www.w3.org/2014/data-shapes/wiki/index.php?title=ISSUE-95:_Metamodel_simplifications&oldid=3306"

ISSUE-95: Metamodel simplifications

Contents

Proposal 1

Goals for Simplification

Constraint Contexts

Proposed Metamodel

Summary

The Shape of a Constraint Subclass

Proposal 2 (edited by Holger)

Differences from Arthur's Proposal 1

Proposal 3

Constraints and Constraint Components

Validators

Discussion

Proposal 4

Meeting minutes SHACL metamodel discussion

Admin

Difference 1: Defining a metaclass of all constraints

Arthur's POV

Holger's POV

Discussion

Difference 2: Subclassing between constraint types

Difference 3: Linking validation implementations to constraint types

Arthur's POV

Holger's POV

Discussion

Difference 4: Declaration of parameters

Arthur's POV

Holger's POV

Discussion

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Navigation

extra links

Tools