Warning:
This wiki has been archived and is now read-only.

Shacl-sparql

From RDF Data Shapes Working Group
Jump to: navigation, search
SHACL

A SHACL Specification based on SPARQL

This is a proposal on how the meaning of the guts of SHACL can be directly based on unmodified and unextended SPARQL. The meaning of the non-SPARQL portion of SHACL is provided by a transformation into SPARQL queries and viewing the results of these queries as verification violations. This proposal is a specification of the meaning of SHACL in terms of SPARQL queries. There is no requirement here that a SHACL implementation actually use a SPARQL engine nor that a SHACL implementation that uses a SPARQL engine actually use the SPARQL queries described here. All that matters is that the same violations result.

This is not a complete proposal for SHACL. It does not cover generating human-readable error messages. It does not cover providing decorations for constructs in the non-SPARQL part of SHACL. It does, however, provide a full specification for determining whether and what validation violations are present in an RDF graph or dataset.

Several SHACL proposed or under consideration or inherent requirements cannot be handled in SPARQL (namely recursive shapes and maybe also closed shapes) and so are not in this proposal for SHACL. All SHACL approved requirements are either already in this proposal or can be easily added to it without changing its basic underpinnings.

Important Aspects of this Proposal

  • This is a constraint proposal, along the lines of OWL Constraints, Stardog ICV, SPIN, and RDF Unit.
  • Unmodified SPARQL provides the semantic foundations of SHACL.
    • A single SPARQL query is used for each constraint and its results are violations of the constraint.
    • Recursive shapes are not allowed
    • Closed shapes need to be expressed as SPARQL.
  • RDFS vocabulary is given its RDFS meaning.
  • Validation is driven by constraints, which are separate from classes.
    • Constraints are either just a SPARQL query or are the combination of a scope part and a shape part, which can either be SPARQL code or constructs in a simpler language.
    • For constraints that have a scope, only nodes that satisfy the scope and do not satisfy the shape are violations.
  • The simple language is chosen to isolate many users from the details of SPARQL and to allow for implementation without requiring a full SPARQL engine.
  • The validation API takes two arguments, an RDF graph that contains constraints and an RDF dataset that contains data to be validated. These two can be completely separate but the constraint graph can be one of the graphs in the dataset.
  • Everything else is details, particularly the form of the simple language.

Preliminaries

Throughout the text of this document IRIs are written in CURIE form using the following prefixes:

rdf = http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs = http://www.w3.org/2000/01/rdf-schema#
xsd = http://www.w3.org/2001/XMLSchema#
sh = http://www.w3.org/.../...

SHACL Core Language

SHACL is based on the SPARQL 1.1 Query Language. In SHACL, certain results of SPARQL 1.1 Query Language queries (hereafter SPARQL queries) on an RDF graph or dataset under the RDFS entailment regime are interpreted as constraint violations.

The different kinds of results for SPARQL queries require different ways of interpreting results in SHACL. For a SELECT query, each separate mapping is interpreted as a separate violation of the constraint. If there are no mappings then the constraint is not violated. For a CONSTRUCT query, each RDFS instance of sh:Violation (node whose denotation is in the class extension of sh:Violation in all RDFS models of the constructed graph) is a separate violation of the constraint. If there are no RDFS instances of sh:Violation in the constructed graph then the constraint is not violated. For an ASK query, a true result is interpreted as a violation of the constraint and a false result is interpreted as not a violation. (This interpretation makes ASK constraints similar to the other kinds of constraints.) DESCRIBE queries are not used in SHACL.

SHACL Constraint Language

SHACL constraints that are directly specified in the form of SPARQL queries can be directly evaluated as SPARQL queries. All that is needed is the SHACL constraint and an RDF graph or dataset. The result of the constraint is the result of the query according to the SPARQL 1.1 Query Language specification, and is interpreted as above.

SHACL constraints can also be encoded and collected in RDF graphs and extra information can be associated with constraints. Each node in such a graph that is an RDFS instance of sh:Constraint is the control node of a SHACL constraint. A SHACL engine takes as arguments an RDF graph containing SHACL constraints and an RDF graph or dataset containing information on which to evaluate the constraints. (There is no requirement that these two be different, but permitting them to be different allows for separating the constraints and the data.) A SHACL engine takes each node in the constraint graph that is an RDFS instance of sh:Constraint and evaluates the node's constraint query against the information RDF graph or dataset.

Note: This proposal uses RDFS classes to provide and organize some of its constructs and thus it uses the RDFS semantics for RDF graphs and for SPARQL queries. Using a simpler semantics would eliminate the ability to have shapes associated with superclasses also apply to their subclasses. However, if there is no use of the RDFS vocabulary (essentially here meaning rdfs:subClassOf, rdfs:subPropertyOf, rdfs:domain, and rdfs:range) in either the control graph or the data graph, then there would be little difference between the RDF semantics and simpler semantics.

SHACL control nodes can have a sh:severity link to one or more of sh:fatalError, sh:error, or sh:warning, indicating the severity of any violations of the constraint.

SHACL Core Constraint Language

The simplest kind of SHACL control node has a sh:query link to a SPARQL query encoded as an RDF string literal and at most one sh:severity link as above. These nodes are called SHACL Core Control Nodes.

SHACL Extended Constraint Language

Other SHACL control nodes allow the separation of a constraint into three sections: a scope section, a shape section, and a reporting section. These SHACL control nodes, called SHACL Extended Control Nodes, must have precisely one of the ways below of specifying the scope and the shape, at most one way of specifying reporting, and at most one sh:severity link.

Note: This proposal tries to be as representationally pure as possible. It uses typed literals for IRIs that end up as pieces of the SPARQL transformation. A version that is not so representationally pure would instead use RDF graph nodes that are IRIs directly and put these IRIs themselves into the transformation. The places that would have to be changed are marked like this.

The scope of a SHACL constraint is specified via

  • a sh:individualScope link to an IRI literal (an RDF literal with datatype xsd:anyURI),
  • a sh:classScope link to an IRI literal,
  • a sh:shapeScope link to a SHACL shape (see below), or
  • a sh:sparqlScope link to a string literal.

The shape of a SHACL constraint is specified via

  • a sh:shape link to a SHACL shape (see below), or
  • a sh:sparqlShape link to a string literal.

The reporting for a SHACL constraint is specified via

  • a sh:report link to a string literal.

These kinds of SHACL control nodes are handled by first constructing three parts of the SHACL constraint.

  • The scope portion of the constraint, <scope>, is
    • VALUES ?this { <IRI> } for a sh:individualScope link to "<IRI>"^^xsd:anyURI
    •  ?this rdf:type <IRI> . for a sh:classScope link to "<IRI>"^^xsd:anyURI
    • <shape> for a sh:shapeScope link to a node that encodes <shape> (see below)
    • <sparql> for a sh:sparqlScope link to "<sparql>"^^xsd:string
  • The shape portion of the constraint, <shape>, is
    • <shape> for a sh:shape link to a node that encodes <shape> (see below)
    • <sparql> for a sh:sparqlShape link to the string "<sparql>"^^xsd:string
  • The reporting portion of the constraint, <report>, is
    • SELECT ?this when there is no sh:report link
    • <sparql> for a sh:report link to "<sparql>"^^xsd:string

The constraint for a SHACL Extended Control Node is then constructed as

 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
 PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
 PREFIX sh: ... 
 <report> WHERE
 { <scope> MINUS { <shape> } }

In order for constraints for SHACL Extended Control Nodes to be valid SPARQL they have to create bindings, as opposed to just performing filtering, for example. To enforce this,

  • the scope portion of the constraint must be binding, as defined below, and
  • if the shape component would not be binding, then it is replaced with the concatenation of the scope and shape components.

A scope coming from a sh:individualScope or sh:classScope link is binding. A scope or shape coming from a sh:sparqlScope link or a sh:sparql link is binding if the SPARQL code would produce bindings for ?this. A scope or shape coming from a sh:shapeScope link or a sh:shape link is binding as defined in the section on the SHACL Simple Shape Language. (Alternatively, it is possible to just assume that using SPARQL directly means that the scope or shape is correctly formed and thus does bind if it is in a context that is binding.)

Note: It is possible to create bindings when needed by either adding in

?this type rdfs:Resource .

or

?this ?p ?that . ?that ?p ?this .

The first requires the correct handling of a part of the RDFS entailment regime that might not be universally implemented by SPARQL engines. The second is inefficient and also requires DISTINCT.

SHACL Simple Shape Language

SHACL provides a vocabulary for generating the shape and scope portion of constraints, acting as a macro facility to ease the creation of simple constraints. In conjunction with the SHACL Extended Control Language this vocabulary permits the construction of many, but not all, constraints without needing to write SPARQL queries.

The SHACL Simple Shape Language does not provide any facilities that are not available by directly writing SPARQL. Therefore, what is in or not in the SHACL Simple Shape Language is not particularly important nor are the particulars of the mapping. What matters is that a desired intent can be captured in a SPARQL query or fragment of a SPARQL query.

A SHACL Shape Node is a node that is an RDFS instance of sh:Shape. Each SHACL shape encodes some SPARQL syntax, its shape, that can be used in SHACL constraints. A SHACL Shape Node must not refer to itself, either directly or indirectly.

SHACL Type and Individual Nodes

A SHACL Type Node is a SHACL Shape Node that has an sh:type link to an IRI Literal. The shape of a SHACL Type Node is

 ?this rdf:type <type> .

for an sh:type link to "<type>"^^xsd:anyURI. A SHACL Type Node is binding.

A SHACL Individual Node is a SHACL Shape Node that has a sh:individual link to an IRI literal. The shape of a SHACL Individual Node is

 VALUES ?this { <individual> }

for an sh:individual link to "<individual>"^^xsd:anyURI. A SHACL Variable Node is binding.

SHACL And and Or Nodes

A SHACL And Node is a SHACL Shape Node that has an sh:and link to a list of SHACL shape nodes, its conjuncts. The shape of a SHACL And Node is formed by concatenating the shapes of its conjuncts, in order. A SHACL And Node is binding if any of its conjuncts is binding.

A SHACL Or Node is a SHACL Shape Node that has an sh:or link to a list of SHACL shape nodes, its disjuncts. The shape of a SHACL Or Node is formed by concatenating the shapes of its disjuncts, in order, separated by UNION. A SHACL Or Node is binding if all of its disjuncts are binding.

SHACL Property Nodes

Many SHACL Shape Nodes utilize an RDF property and are called SHACL Property Nodes. The property of a SHACL Property Node is specified by a sh:predicate or sh:inversePredicate link to an IRI literal. The <path> of a SHACL Property Node is <property> for a sh:predicate link to "<property>"^^xsd:anyURI and ^<property> for a sh:inversePredicate link to "<property>"^^xsd:anyURI.

A SHACL Property Node has one or more of the following property shape components, but at most one of each of them.

  • A has-value component, which is a sh:hasValue link to "<value>"^^xsd:anyURI and has component shape
?this <path> ?V . FILTER ( ?V = <value> )

where ?V is a fresh variable. A SHACL Property Node with a has-value component is binding.

  • An allowed-values component, which is a sh:allowedValues link to a list of

IRI literals and has component shape

?this <path> ?V . FILTER ( ?V IN ( <value1> , <value2>, ..., <valuen> ) )

where ?V is a fresh variable and the <valuei> are the IRI literals in the list. A SHACL Property Node with an allowed-value component is binding.

  • A node-type component, which is a sh:nodeType link to a list of one or

more of the following string literals: "IRI", "blank", "literal". The component shape for node-type components is

FILTER NOT EXISTS { ?this <path> ?V . FILTER ( ! ( <test> ) ) }

where <test> is the SPARQL conjunction (&&) of isIRI(?V) if "IRI" is in the list, isBlank(?V) if "blank" is in the list, and isLiteral(?V) if "literal" is in the list.

  • A value-type component, which is a sh:valueType link to "<valueType>"^^xsd:anyURI and has component shape
FILTER NOT EXISTS { ?this <path> ?V . 
   	             FILTER NOT EXISTS { ?V rdf:type <valueType> . } }

where ?V is a fresh variable.

  • A cardinality component, which is either a sh:cardinality link to a

non-negative integer literal (whose value is <cardinality>) or one or both of a sh:minCardinality link to a non-negative integer literal (whose value is <minCardinality>) and a sh:maxCardinality link to a non-negative integer literal (whose value is <maxCardinality>). The component shape for cardinality components is the concatenation of the relevant fragments below

FILTER EXISTS { SELECT ?this WHERE { ?this <path> ?V } GROUP BY ?this 
  HAVING ( COUNT (DISTINCT ?V) = <cardinality> ) }
FILTER EXISTS { SELECT ?this WHERE { ?this <path> ?V } GROUP BY ?this 
  HAVING ( COUNT (DISTINCT ?V) >= <minCardinality> ) }
FILTER EXISTS { SELECT ?this WHERE { ?this <path> ?V } GROUP BY ?this 
  HAVING ( COUNT (DISTINCT ?V) <= <maxCardinality> ) }

where ?V is a fresh variable for each fragment. If the cardinality component has a positive cardinality or minimum cardinality then

?this <path> ?V .

is also concatenated, where ?V is a fresh variable. If this is the case, then the SHACL Property Node is binding. Note that the use of subqueries here is likely to be inefficient. SHACL implementations that employ a SPARQL engine are encouraged to use more efficient equivalent constructs.

  • A value-shape component, which is a sh:valueShape link to a SHACL Shape

Node and has component shape

FILTER NOT EXISTS { ?this <path> ?V . 
   FILTER NOT EXISTS { SELECT ?this AS ?V WHERE { ?this ^<path> ?W . <shape> } } }

where ?V and ?W are fresh variables and <shape> is the shape of the SHACL Shape Node. Because shapes are not allowed to refer to themselves either directly or indirectly this expansion must terminate. Note that the use of a subquery may be inefficient so SHACL implementations that employ a SPARQL engine are encouraged to use a more efficient construct.

The shape of a SHACL Property Node is formed by concatenating each of the component shapes.

Other SHACL Shape Nodes

... add other kinds of SHACL Shape Nodes here as needed ...

SHACL Errors

If a node in an RDF graph is both a core control node and an extended control node the result of evaluating the graph is undefined. If a node in an RDF graph that is an RDFS instance of sh:Constraint is neither a SHACL Core Control Node nor a SHACL Extended Control Node the result of evaluating the graph is undefined. If a SHACL Shape Node in an RDF graph encodes more than one shape then the result of evaluating the graph is undefined. If a SHACL Shape Node in an RDF graph refers to itself either directly or indirectly then the result of evaluating the graph is undefined. SHACL engines should signal an error on such graphs.

If evaluation of a constraint would produce a SPARQL error the constraint is invalid. If the scope portion of a SHACL Extended Control Node is not binding the constraint is not valid. SHACL engines should signal an error for such constraints.

Example

Here is an example constraint (changing only the reporting variable in one form) in various forms as a single constraint in an RDF graph. (The obvious prefix directives have to be added to make these be legal Turtle.)

[ rdf:type sh:Constraint ;
 sh:severity sh:fatalError ;
 sh:constraint 
 """PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> 
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> 
    PREFIX sh: ... 
    PREFIX ex: <http://example.org/> 
    SELECT ?person WHERE 
    { ?person rdf:type ex:Person .
      MINUS
      FILTER NOT EXISTS { ?person ex:offspring ?v . 
        FILTER NOT EXISTS { ?v rdf:type ex:Person . } } }"""
] .
[ rdf:type sh:Constraint ;
 sh:severity sh:fatalError ;
 sh:report "SELECT ?this" ;
 sh:classScope "http://example.org/Person"^^xsd:anyURI ;
 sh:sparqlShape
   """FILTER NOT EXISTS { ?this ex:offspring ?v . 
        FILTER NOT EXISTS { ?v rdf:type ex:Person . } }""" 
] .
[ rdf:type sh:Constraint ;
 sh:severity sh:fatalError ;
 sh:report "SELECT ?person" ;
 sh:sparqlScope "?person rdf:type <http://example.org/Person>" ;
 sh:sparqlShape
   """FILTER NOT EXISTS { ?person ex:offspring ?offspring . 
        FILTER NOT EXISTS { ?offspring rdf:type ex:Person . } }""" 
] .
[ rdf:type sh:Constraint ;
 sh:severity sh:fatalError ;
 sh:report """SELECT ?this""" ;
 sh:shapeScope [ rdf:type sh:Shape ;
     sh:predicate "http://www.w3.org/1999/02/22-rdf-syntax-ns#type"^^xsd:anyURI
     sh:hasValue "http://example.org/Person"^^xsd:anyURI ] ;
 sh:shape [ rdf:type sh:Shape ;
     sh:predicate "http://example.org/offspring"^^xsd:anyURI ;
     sh:valueType "http://example.org/Person"^^xsd:anyURI ] 
] .


Here is a partial approximation of the issue example.

:IssueConstraint
 rdf:type sh:Constraint ;
 sh:severity sh:error ;
 sh:shapeScope [ rdf:type sh:Shape ;
 		  sh:predicate "http://example.org/state"^^xsd:anyURI ;
 		  sh:allowedValues ( "http://example.org/assigned"^^xsd:anyURI 
                    "http://example.org/unassigned"^^xsd:anyURI
                     ) ] ;
 sh:shape [ rdf:type sh:Shape ;
   sh:and
   ( [ rdf:type sh:Shape ;
     	sh:predicate "http://example.org/reportedBy"^^xsd:anyURI ;
     	sh:cardinality 1 ;
	sh:valueType "http://example.org/User"^^xsd:anyURI ]
     [ rdf:type sh:Shape ;
     	sh:predicate "http://example.org/reportedOn"^^xsd:anyURI ;
     	sh:cardinality 1 ;
	sh:valueDatatype "http://www.w3.org/2001/XMLSchema#dateTime"^^xsd:anyURI

]

     # does not include optional bit yet
     [ rdf:type sh:Shape ;
     	sh:predicate "http://example.org/reportedOn"^^xsd:anyURI ;
       sh:minCardinality 0 ;
     	sh:valueShape [ rdf:type sh:Shape ;
         sh:predicate "http://example.org/state"^^xsd:anyURI ;
         sh:allowedValues ( "http://example.org/assigned"^^xsd:anyURI 
				    "http://example.org/unassigned"^^xsd:anyURI
             ) ] ] )
 ] .