Warning:
This wiki has been archived and is now read-only.

ISSUE-66: SHACL spec ill-founded due to non-convergence on data loops

From RDF Data Shapes Working Group
Jump to: navigation, search

This page collects use cases for recursion in order to advance the resolution of ISSUE-66.

ISSUE-66 is about recursion in SHACL. A SHACL program is said to be recursive when a shape refers to itself, directly or indirectly. The SHACL spec must give a precise, well-founded meaning to all valid SHACL programs.

All non-recursive SHACL programs are can be given a precise, well-founded meaning since they can be translated into a single SPARQL query. All recursive SHACL programs that do not use complex combinations of shapes (not, or, xor) are can be given a meaning using the semantics of OSLC Resource Shapes (see Recursion in RDF Data Shape Languages). A larger class of well-founded recursive SHACL programs is proposed in Core SHACL Semantics.

The Working Group needs to decide on what constitutes a valid SHACL program. At one extreme, we can prohibit recursion. At the other extreme, we can allow very general types of recursion. Since allowing larger sets of valid recursive SHACL programs increases the complexity of the specification and the cost of implementations, the Working Group needs to see real-world use cases in which the use of recursion is either necessary or preferable to the alternatives.

When considering alternatives, the following criteria should be kept in mind:

  • Impact on the author of the SHACL program. Does the alternative impose a skill or effort burden on the author? It is a requirement

that the Core SHACL language not require a knowledge of SPARQL.

  • Impact on the data source. Does the alternative require modification of the data source, e.g. the addition of type triples? Any

requirement that the data source conform to some set of SHACL-specific guidelines will reduce the applicability of SHACL.

Contacts

Submitted by: aryman

It has been suggested that recursion can be avoided in many cases by associating a shape with the class of a resource. This example illustrates the common situation in which the data contains resources of a given class, but the constraints on the resources depend on their context and not just on their class.

Consider a web application that manages contact documents. Clients can create or modify contact documents via HTTP POST and PUT, and retrieve them via HTTP GET. When a client POSTs or PUTs a document, the application validates its content. If the document fails to validate, the request is rejected.

The application uses the following informally stated validation rules:

  1. The document has exactly one primary topic.
  2. The primary topic of a document is a person, known as the contact.
  3. The contact must have a name, email address, and telephone number.
  4. The contact may know some other people, known as associates.
  5. The document contains zero or more associates.
  6. Each associate must have a name.
  7. Each associate must be known by the contact.

In the RDF representation of contact documents, both the contact and the associates have just the RDF type foaf:Person, i.e. there are no role-specific types for the different kinds of people, e.g. ex:Contact and ex:Associate.

Here is a statement of the validation rules in terms of the RDF representation of contact documents.

  1. A valid contact document must have exactly one foaf:primaryTopic triple. The subject of the foaf:primaryTopic triple is the contact document URI. The contact document node is the focus node from which validation begins.
  2. The object of the foaf:primaryTopic triple is the contact. The contact must have an rdf:type triple with class foaf:Person.
  3. The contact must have exactly one name given by a foaf:name property, and exactly one email address given by a foaf:mbox property, and exactly one telephone number given by an ex:telephone property.
  4. The contact has zero or more foaf:knows properties which denote people known by the contact. These people are called associates.
  5. Each associate has an rdf:type property with class foaf:Person.
  6. Each associate must have exactly one name given by a foaf:name property.
  7. Each associate is known by just the contact.

The following Turtle listing is a valid contact document for Alice.

 @prefix foaf: <http://xmlns.com/foaf/0.1/> .
 @prefix ex: <http://example.org/ns/contacts#>.
  
 @base <http://example.org/contacts/> .
  
 <alice> foaf:primaryTopic <alice#me> .
  
 <alice#me> a foaf:Person ;
   foaf:name "Alice Taylor" ;
   foaf:mbox <mailto:alice@example.org> ;
   ex:telephone "+1-416-555-1234" ;
   foaf:knows <bob#me>, <charlie#me> .
  
 <bob#me> a foaf:Person ;
   foaf:name "Bob Smith" .
  
 <charlie#me> a foaf:Person ;
   foaf:name "Charlie Jones" .

The following Turtle listing is a recursive SHACL shape that describes valid contact documents. The document itself has no RDF type, but has the shape ex:DocumentShape which refers to ex:ContactShape. ex:ContactShape refers to ex:DocumentShape and ex:AssociateShape. ex:AssociateShape refers to ex:ContactShape. Thus there are cycles in the shape references and therefore the shape is recursive.

 @prefix sh: <http://www.w3.org/ns/shacl#> .
 @prefix foaf: <http://xmlns.com/foaf/0.1/> .
 @prefix ex: <http://example.org/ns/contacts#>.
 @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
  
 ex:DocumentShape
   a sh:Shape ;
   sh:property [
     sh:predicate foaf:primaryTopic ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:valueClass foaf:Person ;
     sh:valueShape ex:ContactShape .
   ] .
  
 ex:ContactShape
   a sh:Shape ;
   sh:inverseProperty [
     sh:predicate foaf:primaryTopic ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:valueShape ex:DocumentShape .
   ] ;
   sh:property [
     sh:predicate foaf:name ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:datatype xsd:string .
   ] ;
   sh:property [
     sh:predicate foaf:mbox ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:nodeKind sh:IRI .
   ] ;
   sh:property [
     sh:predicate ex:telephone ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:datatype xsd:string .
   ] ;
   sh:property [
     sh:predicate foaf:knows ;
     sh:valueClass foaf:Person ;
     sh:valueShape ex:AssociateShape.
   ] .
  
 ex:AssociateShape
   a sh:Shape ;
   sh:inverseProperty [
     sh:predicate foaf:knows ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:valueClass foaf:Person ;
     sh:valueShape ex:ContactShape .
   ] ;
   sh:property [
     sh:predicate foaf:name ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:datatype xsd:string .
   ] .

The use of shape references is intuitive and helps to make the validation rules clearer. The fact that the shape references are recursive causes no difficulties. This shape has a well-founded meaning and could probably be represented by a single SPARQL query. However, writing this SPARQL query is certainly beyond the skill level of the target SHACL author.

Comments by Peter F. Patel-Schneider

This is not a suitable example, as the SHACL shapes do not implement the validation rules. There are several divergences, which I point out below:

1. The document has exactly one primary topic. - this appeals to something that SHACL does not define - there is no notion of a unique "the document" enforced by the SHACL shapes - either this rule or the SHACL shapes needs to be changed

2. The primary topic of a document is a person, known as the contact. - this appeals to something that SHACL does not define - the SHACL shapes say more than this [important]

3. The contact must have a name, email address, and telephone number. - the SHACL shapes say more than this [not important]

4. The contact may know some other people, known as associates. - this is too vague - are all the people that the contact person knows associates? - this is an example of where maximizing may not be a good idea

5. The document contains zero or more associates.

6. Each associate must have a name. - the SHACL shapes say more than this [not important]

7. Each associate must be known by the contact. - the SHACL shapes say more than this [important!]

For example, the following graph will not validate against the SHACL shapes, but should validate according to the validation rules.

 ex:d ex:primaryTopic ex:pt .
 ex:c rdf:type foaf:Person .
 ex:c foaf:name "ex:c" .
 ex:c foaf:name ex:mb .
 ex:c ex:telephone "NOYB" .
 ex:c foaf:knows ex:a .
 ex:a foaf:name "ex:a" .
 ex:b foaf:knows ex:a .

Reply by aryman

Peter, your comments seem to be more about the informality of the requirements than about recursion. I have provided a more precise writeup.

Reply by pfps

The SHACL shapes still do not implement the validation rules, starting with "exactly one foaf:primaryTopic triple". To make this a useful example the SHACL shapes must be correct. There may be a way to do this using advanced features of SHACL, but this has not been shown.

Workaround without recursion (Dimitris)

There might be some typos in the syntax and it also depends on how validation starts

 @prefix sh: <http://www.w3.org/ns/shacl#> .
 @prefix foaf: <http://xmlns.com/foaf/0.1/> .
 @prefix ex: <http://example.org/ns/contacts#>.
 @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
   
 ex:DocumentShape
   a sh:Shape ;
   sh:property [
     sh:predicate foaf:primaryTopic ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:valueClass foaf:Person ;
   ] .
  
 ex:ContactShape
   a sh:Shape ;
   sh:scopeClass foaf:Person
   sh:filterShape [
       sh:inverseProperty [
         sh:predicate foaf:primaryTopic ;
         sh:minCount 1;
         sh:maxCount 1;
       ] ;
   ]
   sh:property [
     sh:predicate foaf:name ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:datatype xsd:string ;
   ] ;
   sh:property [
     sh:predicate foaf:mbox ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:nodeKind sh:IRI ;
   ] ;
   sh:property [
     sh:predicate ex:telephone ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:datatype xsd:string ;
   ] ;
   sh:property [
     sh:predicate foaf:knows ;
     sh:valueClass foaf:Person ;
   ] .
  
 ex:AssociateShape
   a sh:Shape ;
   sh:scopeClass foaf:Person ;
   sh:filterShape [
       sh:inverseProperty [
         sh:predicate foaf:knows ;
         sh:minCount 1;
         sh:maxCount 1;
       ] ;
   ]
   sh:property [
     sh:predicate foaf:name ;
     sh:minCount 1;
     sh:maxCount 1;
     sh:datatype xsd:string ;
   ] .

Comments (HK): Your solution does something different. When someone triggers validation of the root Document, Arthur's original shapes do validate the depending resources too. Your solution would only validate the root Document and requires validation of the complete graph to detect the other cases. I also do not believe your solution is intuitive - almost every language supports recursion, why not SHACL. People would just expect that to work. Finally, I do not believe your work-around is generic and can be applied to other patterns. Maybe Arthur wants to modify the example data to drop the rdf:type triple, to highlight that these constraints are context-specific.

Comments (DK): Depending resources are also validated. ex:DocumentShape requires exactly one foaf:primaryTopic with foaf:Person. Since one must exist, the rest of the shapes do validate the depending resources.

Comments (HK): How so? Your example uses sh:valueClass only. For traversing depending resources you would need to use sh:valueShape.

Comments (DK): ex:DocumentShape requires exactly one instance of foaf:primaryTopic with foaf:Person as range. Assuming this constraint does not fail, a foaf:primaryTopic exists and ex:ContactShape is triggered to validate the main contact. If any foaf:knows exists the ex:AssociateShape is triggered and validates the associate contacts.

Comments (HK): No, ex:ContactShape is not triggered, neither is AssociateShape. This would only happen with sh:valueShape. Note that the starting point is the validateNodeAgainstShape operation, which only checks the start node plus its dependants.

Comment (pfps): There is no notion given as to how the validation is triggered. The writeup intimates that the triggering is global becasue it says that there is exactly one of something in the entire document.

Hierarchical Organizations

Suggested by Holger

schema.org has a base class schema:Organization with a property schema:subOrganization. A ex:NotForProfitOrganization is an Organization where ex:annualProfit is zero, and all subOrganizations are also not for profit. The task here is to verify whether ex:MarsupialRescueFoundation is a not-for-profit organization.

   schema:Organization
       a sh:ShapeClass ;
       sh:property [
           sh:predicate ex:annualProfit ;
           sh:datatype xsd:float ;
           sh:maxCount 1 ;
       ] ;
       sh:property [
           sh:predicate schema:subOrganization ;
           sh:valueClass schema:Organization ;
       ] .
   
   ex:NotForProfitOrganization
       a sh:Shape ;
       sh:property [
           sh:predicate ex:annualProfit ;
           sh:maxInclusive 0 ;
       ] ;
       sh:property [
           sh:predicate schema:subOrganization ;
           sh:valueShape ex:NotForProfitOrganization ;
       ] .
   
   ex:MarsupialRescueFoundation
      a schema:Organization ;
      ex:annualProfit 0 ;
      schema:subOrganization ex:KoalaRescueFoundation ;
      schema:subOrganization ex:QuokkaRescueFoundation .
   
   ex:KoalaRescueFoundation
      a schema:Organization ;
      ex:annualProfit 0 .
   
   ex:QuokkaRescueFoundation
      a schema:Organization ;
      ex:annualProfit 0 .


Comment by Peter F. Patel-Schneider

This is not a suitable example because there is no indication of what task is being performed nor what data it is to be performed on.

Answer (HK): Done. The instances are in the same graph, for simplicity.

Comment by Peter F. Patel-Schneider

OK, the task is to determine whether ex:MarsupialRescueFoundation meets the requirements defined above. (The requirements are given as if they were a description of the requirements for belonging to a class, leading to the obvious solution of using OWL, but let's try using SHACL.)

   ex:NotForProfitOrganization
       a sh:Shape ;
       sh:constraint [ a sh:SPARQLConstraint;
         sh:sparql """
           SELECT $this
           WHERE { $this schema:subOrganization* ?subsidiary ;
                   ?subsidiary ex:annualProfit ?profit ;
                   FILTER ( ?profit > 0 ) }
            """ ;
       ] .

I think that this works, but SPARQL may not allow "empty" paths, in which case you would need a check that the organization itself had 0 profit and a slightly more complex property path.

If SPARQL property paths were allowed in SHACL core, then this would be much simpler.

Regions, Countries and States

A simple example for context-sensitive constraints. The values of ex:neighbor are either states or countries, depending on whether they are used inside of a state or country. Below, ex:Australia is invalid, the others are valid. The type triples are not needed to determine those constraints.

 # Classes ----------------------
 ex:Region
   a sh:ShapeClass ;
   rdfs:subClassOf rdfs:Resource ;
   sh:property [
       rdfs:comment "A region adjacent to this region."^^xsd:string ;
       sh:nodeKind sh:IRI ;
       sh:predicate ex:neighbor ;
       sh:valueClass ex:Region ;
   ] ;
 .
 # Shapes -----------------------
 ex:CountryShape
   a sh:Shape ;
   sh:property [
       rdfs:comment "The neighboring countries"^^xsd:string ;
       sh:predicate ex:neighbor ;
       sh:valueShape ex:CountryShape ;
   ] ;
 .
 ex:StateShape
   a sh:Shape ;
   sh:property [
       rdfs:comment "The neighboring states"^^xsd:string ;
       sh:predicate ex:neighbor ;
       sh:valueShape ex:StateShape ;
   ] ;
 .
 # Instances ----------------
 ex:ACT
   a ex:Region ;
   ex:neighbor ex:NSW ;
   sh:nodeShape ex:StateShape ;
 .
 ex:NSW
   a ex:Region ;
   ex:neighbor ex:ACT ;
   sh:nodeShape ex:StateShape ;
 .
 ex:Australia
   a ex:Region ;
   ex:neighbor ex:ACT ;   # Violation!
   sh:nodeShape ex:CountryShape ;
 .

Comment by pfps

This example is very strange. The data we are given is in terms of being a member of the ex:Region class but validating against ex:CountryShape and ex:StateShape. This is completely unnatural. The data we would be given would probably be in terms of being a member of the ex:Country and ex:State classes, each of which being a subclass of ex:Region.

Why do ex:ACT and ex:NSW validate against ex:StateShape? Simply because they are neighbours of each other. If the input data included

ex:NSW sh:nodeShape ex:CountryShape .

ex:NSW would still validate. Therefore the example is useless.

DBpedia Example (Data that we cannot change)

When data is in a format that we cannot change (e.g. dbpedia) then we cannot rely on rdf:type triples. Instead the validation needs to be triggered manually, based on Shapes. In the example below, the task is to verify sh:hasShape(dbpedia:Australia, ex:Country) and sh:hasShape(dbpedia:Canberra, ex:Capital).

   ex:Capital
       a sh:Shape ;
       sh:property [
           sh:predicate dbpedia-owl:country ;
           sh:maxCount 1 ;
           sh:minCount 1 ;
           sh:valueShape ex:Country ;
       ] ;
   .
   ex:Country
       a sh:Shape ;
       sh:property [
           sh:predicate dbpedia-owl:capital ;
           sh:minCount 1 ;
           sh:valueShape ex:Capital ;
       ] ;
   .
   
   dbpedia:Australia dbpedia-owl:capital dbpedia:Canberra .
   dbpedia:Canberra dbpedia-owl:country dbpedia:Australia .

Workaround without recursion (Dimitris)

   ex:Capital
       a sh:Shape ;
       sh:filterShape [
         sh:inverseProperty [
           sh:predicate dbpedia-owl:capital ;
           sh:minCount 1;
         ] ;
       ]
       sh:property [
           sh:predicate dbpedia-owl:country ;
           sh:maxCount 1 ;
           sh:minCount 1 ;
       ] ;
   .
   ex:Country
       a sh:Shape ;
       sh:filterShape [
         sh:inverseProperty [
           sh:predicate dbpedia-owl:country ;
           sh:minCount 1;
         ] ;
       ]
       sh:property [
           sh:predicate dbpedia-owl:capital ;
           sh:minCount 1 ;
       ] ;
   .

Comments (HK): As above this doesn't do the same thing as my shapes: yours does not walk into the validation of the Country when you start at the Capital. You would need to start the validateGraph Operation which would traverse the whole graph, which is not feasible for something like dbpedia.

Comment by pfps

The shapes don't work right. They validate the following as well:

   dbpedia:NewZealand dbpedia-owl:capital dbpedia:Canberra .
   dbpedia:Australia dbpedia-owl:capital dbpedia:Wellington .
   dbpedia:Canberra dbpedia-owl:country dbpedia:Australia .
   dbpedia:Wellington dbpedia-owl:country dbpedia:NewZealand .

This is thus not a valid example.