Shape Expressions Definition

This document defines the opperational semantics of the Shape Expressions language. The accompanying Shape Expressions Primer provides an informal introduction to the language.

Shape Expressions can be used to validate documents, communicate expected graph patterns for interfaces, and generate user interface forms and interface code.

Status of this Document

This document was developed in response to the W3C RDF Validation Workshop. The contributions are not associated with any W3C Working Group or any Recommendations-track work.

Abstract Syntax

The Shape Expressions provides one surface syntax for a conceptual model, written here in Set Builder notation:

RDF Abstract Syntax

Graph          ::= Set(Triple)
RDFTerm        ::= IRI | BlankNode | RDFLiteral
Triple         ::= (s:Subject, p:Predicate, o:Object)
Subject        ::= IRI | BlankNode
Predicate      ::= IRI
Object         ::= IRI | BlankNode | RDFLiteral
RDFLiteral     ::= typedLiteral | plainLiteral
typedLiteral   ::= (lexicalValue:String, datatype:IRI)
plainLiteral   ::= (lexicalValue:String, datatype:IRI, langtag:String) /* note this change since RDF 1.1 */

Shape Expression Abstract Syntax

Shape          ::= (label:Label, rule:Rule)
Rule           ::= ArcRule | GroupRule | AndRule | OrRule
AndRule        ::= (conjoints:Set(Rule))
OrRule         ::= (disjoints:Set(Rule))
GroupRule      ::= (rule:Rule, opt:Bool, a:Set(Action))
ArcRule        ::= (id:Label, n:NameClass, v:ValueClass, min:Integer, max:Integer, a:Set(Action))
Label          ::= IRI | BlankNode
IRIstem        ::= (i:IRI)
NameClass      ::= (NameTerm | NameWild | NameStem)
NameTerm       ::= (t:IRI)
NameWild       ::= (excl:Set(IRIstem))
NameStem       ::= (s:IRI)
ValueClass     ::= ValueType | ValueSet | ValueWild | ValueStem | ValueReference
ValueType      ::= type:IRI /* Note that rdf:Resource is a fixed IRI with a special behavior */
SetValue       ::= IRI | IRIstem | RDFLiteral
ValueSet       ::= (s:Set(SetValue))
ValueWild      ::= (excl:Set(IRIstem))
ValueStem      ::= (s:IRI)
ValueReference ::= (l:Label)
Action         ::= (label:Label, code:String)
Schema         ::= (rules:Set((Label, Rule)), start:Rule?)

At the outermost level, validity is a binary operation resulting in either a pass or fail (abbreviated as "𝕡" and "𝕗" below). In order to mimic the behavior of regular expressions within optional groups (GroupRule with the optional flag set), we need to introduce two extra states: ∅ to indicate that there were 0 matches for an an ArcRule with a non-zero minimum cardinality, and 𝕫 for 0 matches for an an ArcRule with a zero minimum cardinality.

Within any nested AndRule (noting that all or none of the conjoints must match), a conjoint with a minimum cardinality and no corresponding triples is not evidence that the conjoint failed to match. The analog in regular expressions would be terms with 0 minimum cardinality appearing in an optional group. For example, "(a+b*c{2,3})d" would match "abccd", "accd", "d" but not "ad" "bd" or "ccd". See Appendix 1 for further exploration.
Within any nested OrRule (noting that only one disjoint may match), a disjoint with a minimum cardinality and no corresponding triples is not evidence that the disjoint failed to match. Passing two dijoints results in the error state ⅇ.

The required optionality context is passed as the third parameter to the validityφ⟦conj,inOpt,p,g⟧ function (below), which returns one of (𝕡 𝕗 ∅ 𝕫 ⅇ) when in an optional, and one of (𝕡 𝕗) otherwise.

OptValidity    ::= (𝕡|𝕗|∅|𝕫|ⅇ)

Shape Expression Functions

Actions are instructions outside of this specification which may have side efects, including indicating failure to match a rule:
dispatch       : Set(Action) → Bool

The findRule function selects a rule from the schema with a specified label:
findRule       : Label → Rule

There are two termMatches functions for determining if an RDFTerm matches another RDFTerm or an IRIStem respectively:
termMatches    : (RDFTerm, RDFTerm) → Bool
termMatches    : (RDFTerm, IRIStem) → Bool /* true if the RDFTerm is an IRI and starts with the characters in the IRIStem */

The following SPARQL rules are used to test and access RDF terms:
isIRI          : Object → Bool
isLiteral      : Object → Bool
isBlank        : Object → Bool
datatype       : RDFLiteral → IRI
subject        : Triple → RDFTerm
predicate      : Triple → IRI
object         : Triple → RDFTerm

Shape Expression Validation Semantics

Validation is a function of a rule in a schema φ and a pointed graph (a pair of a graph and a point in that graph), yielding a validation result:
validity⟦ , , ⟧φ   : (Rule,Node,Graph) → OptValidity
The four types of rule have the following specializations:

The validity of an AndRule with respect to a node in a graph is the combined validity of all of its members.
If the AndRule is flagged as optional, all of the rules must pass or all of the rules must fail.
validity⟦ , , ⟧φ   : (AndRule,Node,Graph) → OptValidity
validity⟦r,p,g⟧φ   = let vs  =  {conj:r.conjoints ⋅ validity⟦conj,o,p,g⟧φ} in
                       if (∃ v:vs | v = 𝕗)
                           𝕗
                       else if (∃ v1,v2:vs | v1 = 𝕡 ∧ v2 = 𝕫)
                           𝕗
                       else if (∄ v1,v2:vs | (v1 = 𝕡 ∧ v2 = 𝕫) )
                           ∅
                       else if (∃ v:vs | v = 𝕡 )
                           𝕡
                       else
                           𝕫


The validity of an OrRule with respect to a node in a graph is the validity of exactly one of its members:
validity⟦ , , ⟧φ   : (OrRule,Node,Graph) → OptValidity
validity⟦r,p,g⟧φ   = let vs = {disj:r.disjoints ⋅ validity⟦disj,o,p,g⟧φ} in
                       if (∀ v:vs | v = 𝕗)
                           𝕗
                       else if (μ v:vs | v = 𝕡)
                           𝕡
                       else if (μ v:vs | v = ∅)
                           ∅
                       else
                           𝕗

The validity of a GroupRule with respect to a node in a graph is the validity of its nested rule:
If the GroupRule is flagged as optional, it will always return a validity of true.
validity⟦ , , ⟧φ   : (GroupRule,Node,Graph) → OptValidity
validity⟦r,p,g⟧φ   = let v  =  validity⟦r.rule,o,p,g⟧φ in
                       if v ∈ {𝕗,∅}
                           v
                       else if (v = 𝕫)
                           if (o)
                               𝕫
                           else if (r.opt)
                               𝕡
                           else
                               𝕗
                       else if (dispatch(r.a))
                           𝕡
                       else
                           𝕗

An ArcRule is valid with respect to a node in a graph if none of the following produce an error:
  • Take the set of triples in the graph having a subject of point p.
  • Take the set of those triples which match the NameClass.
  • If this set is empty and the option flag is true, return either an empty mach 𝕫 or, if the minimum cardinality is 0, indeterminate ∅.
  • If any of those triples fail to match ValueClass or fail the semantic action dispatch, the rule fails.
  • The success of the validation comes from the ValueClass (which, in the case of
    a ValueReference, will include the results of validating another shape expression in a non-optional context).
This set is then tested for cardinality conformance.
validity⟦ , , ⟧φ   : (ArcRule,Node,Graph) → OptValidity
validity⟦r,p,g⟧φ   = let fromPoint = {t:g | subject(t) = p} in
                       let matchName  =  {t:fromPoint | nameClass⟦r.n,predicate(t)⟧φ} in
                       if (o ∧ #matchName = 0)
                           if (min > 0)
                               ∅
                           else
                               𝕫
                       else if (#matchName < min ∨ #matchName > r.max)
                           𝕗
                       else
                           let resultSet = {
                               ∀ t:matchName
                                   let v = valueClass⟦v,t,g⟧φ⟦v,t,q⟧φ} in
                                       if (v = false)
                                           false
                                       else
                                           dispatch(a)
                           } in
                           if (∃ s:resultSet | ! validity(s))
                               𝕗
                           else
                               𝕡


Validation of a NameClass is a function of an IRI.
nameClass⟦n,i⟧φ    : (NameClass, IRI) → Bool
The three types of NameClass have the following specializations:

nameClass⟦n,i⟧φ    : (NameTerm, IRI) → Bool
nameClass⟦n,i⟧φ    = i == n.t

nameClass⟦n,i⟧φ    : (NameWild, IRI) → Bool
nameClass⟦n,i⟧φ    = ∄ n.excl | termMatches(i, n.excl)

nameClass⟦n,i⟧φ    : (NameStem, IRI) → Bool
nameClass⟦n,i⟧φ    = termMatches(i, n.s)



Validation of a ValueClass is a function of a Triple.
valueClass⟦v,t,g⟧φ : (ValueClass, Triple, Graph) → Bool
The five types of ValueClass have the following implementations:

valueClass⟦v,t,g⟧φ : (ValueType, Triple, Graph) → Bool
valueClass⟦v,t,g⟧φ = (v.type = rdf:Resource ∨ isIRI(object(t))) ∨
                         (isLiteral(object(t)) ∧ datatype(object(t)) = type)

valueClass⟦v,t,g⟧φ : (ValueSet, Triple, Graph) → Bool
valueClass⟦v,t,g⟧φ = ∃ e:v.s | object(t) = e

valueClass⟦v,t,g⟧φ : (ValueWild, Triple, Graph) → Bool
valueClass⟦v,t,g⟧φ = ∄ e:v.exclusions | termMatches(object(t), e)

valueClass⟦v,t,g⟧φ : (ValueStem, Triple, Graph) → Bool
valueClass⟦v,t,g⟧φ = termMatches(object(t), v.s)

valueClass⟦v,t,g⟧φ : (ValueReference, Triple, Graph) → Bool
valueClass⟦v,t,g⟧φ = if !(isIRI(object(t)) ∨ isBlank(object(t))

Start Rule and Pointed Graph (informative)

The Shape Expressions grammar provides a mechanism for a schema to provide a start rule. The Shape Expression Validation Semantics define a validation function for a rule and an RDF graph with a starting node (pointed graph). This rule may be supplied by the start rule in the schema. The starting node may be supplied e.g. a media type parameter, a form parameter or some other means. It is possible, though the process may be expensive and the validation results may be ambiguous, to iterate across the rules in a schema and the subject nodes in a graph to find those combinations which result in a successful validation.

Parsing Rules

The semantics of the productions and terminals from the Turtle specification are defined in Turtle Terse RDF Triple Language section 7 Parsing.

The following grammar rules are mapped to the abstract syntax:

Productions:
[1]	`ShExDoc`	::=	`statement*`
[2]	`statement`	::=	`directive \| shape`
[3]	`directive`	::=	`sparqlPrefix \| sparqlBase \| "start" "=" ( label \| typeSpec CODE* )` let b = a fresh blank node typeSpec.setLabel(b) new ArcRule()
[4]	`sparqlPrefix`	::=	`SPARQL_PREFIX PNAME_NS IRIREF`
[5]	`sparqlBase`	::=	`SPARQL_BASE IRIREF`
[6]	`shape`	::=	`label typeSpec CODE*` if (CODE.SIZE > 0) let r = new GroupRule(false, typeSpec) r.label = label else typeSpec.label = label
[7]	`typeSpec`	::=	`"{" OrExpression "}"`
[8]	`OrExpression`	::=	`l:AndExpression ( "\|" r:AndExpression )*` if (r) new OrRule(l, r) else l
[9]	`AndExpression`	::=	`l:UnaryExpression ( "," r:UnaryExpression )*` if (r) new AndRule(l, r) else l
[10]	`UnaryExpression`	::=	`arc`
			`\| "(" OrExpression ")" opt:"?"? CODE*` new GroupRule(OrExpression, opt, CODE)
[11]	`label`	::=	`iri \| BlankNode`
[12]	`arc`	::=	`nameClass valueSpec default? repeatCount? properties? CODE*` new ArcRule(nameClass, valueSpec, default, repeatCount, properties, CODE)
[13]	`nameClass`	::=	`iriStem` NameStem(iriStem)
			`\| "." exclusions` NameWild(exclusions)
			`\| RDF_TYPE` NameTerm(rdf:type)
[14]	`valueSpec`	::=	`"@" label` ValueReference(label)
			`\| typeSpec` let b = a fresh blank node typeSpec.setLabel(b)
			`\| valueSet` new ValueSet(valueSet)
			`\| object` new ValueType(object)
			`\| exclusions` new ValueWild(exclusions)
[15]	`iriStem`	::=	`iri ("~")?`
[16]	`default`	::=	`"=" object`
[17]	`properties`	::=	`"[" iri object ( ";" ( iri object )? )* "]"`
[18]	`exclusions`	::=	`( "-" iriStem )*`
[19]	`repeatCount`	::=	`"*" \| "+" \| "?" \| "{" INTEGER ( "," (INTEGER)? )? "}"`
[20]	`valueSet`	::=	`"(" (object)+ ")"`
[21]	`object`	::=	`iriStem \| BlankNode \| literal`
[22]	`literal`	::=	`RDFLiteral \| NumericLiteral \| BooleanLiteral`
[23]	`NumericLiteral`	::=	`INTEGER \| DECIMAL \| DOUBLE`
[24]	`RDFLiteral`	::=	`String ( LANGTAG \| "^^" iri )?`
[25]	`BooleanLiteral`	::=	`"true" \| "false"`
[26]	`String`	::=	`STRING_LITERAL1 \| STRING_LITERAL2 \| STRING_LITERAL_LONG1 \| STRING_LITERAL_LONG2`
[27]	`iri`	::=	`IRIREF \| PrefixedName`
[28]	`PrefixedName`	::=	`PNAME_LN \| PNAME_NS`
[29]	`BlankNode`	::=	`BLANK_NODE_LABEL \| ANON`
Terminals Not in Turtle:
[30]	<`CODE`>	::=	`"%" ( [a-zA-Z+#_] ([a-zA-Z0-9+#_])* )? "{" ( [^%\\] \| "\\" "%" )* "%" "}"`

Ordering (informative)

A Shape Expressions schema with no semantic actions can be treated as unordered. In practice, error messages can be much clearer, and extension functions much simpler to write if the engine follows the lexical order when validating the members of a conjunction. Likewise, form and interface code generation will be more predictable and controllable if the Shape Expressions are processed in a specific order.