ShEx/Obsolete/ShEx

Obsolete - please see the ShEx github wiki

ShEX, or Shape Expressions(intro), is a language for expressing constraints on RDF graphs. It includes the cardinality constraints from OSLC Resource Shapes and Dublin Core Description Set Profiles as well as logical connectives for disjuntion and polymorphism. It is intended to:

validate RDF documents.
communicate expected graph patterns for interfaces.
generate user interface forms and interface code.
compile to SPARQL queries (except for cyclic grammars).

A W3C ShEx Demo validates data against a schema, compiles SPARQL queries for the schema and generates an RDF representation.

Syntax

The ShEx syntax is modeled after RelaxNG Compact Syntax (RNC):

<IssueShape> {                            # A Issue shape
    :state ( :unassigned :assigned ),     #   has a state with 2 possible values
    :reportedBy @<UserShape>,             #   is reported by a user
    :reportedOn xsd:date,                 #   is reported on a date
    ( :reproducedBy @<UserShape>          #   can optionally have 2 properties
    , :reproducedOn xsd:date              #     reproducedBy/On
    )?,
    :related @<IssueShape>*               #   is related to several other issues
}

<UserShape> {                             # A user shape can have either
    ( foaf:name xsd:string                #  name or
    | foaf:givenName xsd:string+ ,        #  several given names and
      foaf:familyName xsd:string          #   family name
    ), 
    foaf:mbox shex:IRI ?                  # mbox Optional, any IRI
}

The previous example can be tested here (using RDFShape) and here (with Eric's fancy demo)

Shex definition can be defined in 2 syntaxes: SHEXc (SHEX compact format) and SHEX/RDF.

Semantics

ShEx (and RNC) are designed to be familiar to users of BNF and regular expressions. The conspicuous differences are that regular expressions correlate an ordered pattern of atomic characters and logical operators against an ordered sequence of characters. Shape Expressions correlate an ordered pattern of pairs of predicate and object classes (called NameClass and ValueClass) and logical operators against an unordered set of arcs in a graph. The logical operators in Shape Expressions, grouping, conjunction, disjunction and cardinality constraints, are defined to make as closely as possible to their counterparts in regular expressions and grammar languages like BNF.

Recursive shapes (like <IssueShape>) are problematic for Shape Expressions. The meanings of such shapes are open to question. The semantics for Shape Expressions does not handle them well, going into infinite loops, or being non-deterministic, or even being paradoxical.

See for more details and test cases [1]

SHEXc Language Summary

feature	example	description
		Matching a Predicate to a NameClass
NameTerm	`ex:state`	The predicate of any matching triple is the same as the NameTerm IRI.
NameStem	`ex:~`	The predicate of any matching triple starts with the IRI.
NameAny	`. - rdf:type - ex:~`	A matching triple has any predicate except those terms NameTerms or NameStems excluded by the '-' operator.
		Matching an Object to a ValueClass
ValueType	`xsd:dateTime`	The object of any matching triple is the same as the ValueType IRI.
ValueSet	`(ex:unassigned ex:assigned)`	The object of any matching triple is one of the list of triples in the ValueSet.
ValueStem	`ex:~`	The object of any matching triple starts with the IRI.
ValueAny		A matching triple has any object except those terms or stems excluded by the '-' operator.
ValueReference	`@<UserShape>`	The object of a matching triple is an IRI or blank node and the that node is the subject of triples matching the referenced shape expression.
		Rule Types
ArcRule	`foaf:givenName xsd:string+`	A matching triple matches the NameTerm and the ValueTerm. Cardinality constraints apply.
AndRule	`foaf:givenName xsd:string,` `foaf:familyName xsd:string`	Each conjoint matches the input graph.
OrRule	`foaf:givenName xsd:string` `foaf:name xsd:string`	Exactly one disjoint matches the input graph.
GroupRule	`x:reproducedBy @<EmployeeShape>,` `ex:reproducedOn xsd:dateTime)`	A matching triple matches the enclosed rule (here an AndRule). Cardinality constraints apply.
		Cardinality
?	`foaf:givenName xsd:string?`	rule must match 0 or 1 times.
+	`foaf:givenName xsd:string+`	rule must match 1 or more times.
*	`foaf:givenName xsd:string*`	rule must match 0 or more times.
{m}	`foaf:givenName xsd:string{3}`	rule must match m times.
{m,n}	`foaf:givenName xsd:string{3,5}`	rule must match at least m times and no more than n times.
	Cardinality constraints may appear after an ArcRule. A '?' may also appear after a GroupRule to indicate that it is optional. Any AndRule nested immediately inside the GroupRule must have every rule match or no rule match.
		Rule Inclusions
&RuleName	`& <PersonShape>`	Include the referenced rule in place of the include directive.
	Rule Inclusions may appear before a shape definition inside of a definition. Befor a shape definition, they signify the inclusion of the referenced rule ("included rule") at the beginning of the one being defined, as well as asserting that ValueReferences to the included rule accept the defined shape as well.
		Semantic Actions
%lang{ code %}	`%js{ return _.o.lex > report.lex; %}` `%sparql{ ?s ex:reportedOn ?rpt . FILTER (?o > ?rpt) %}`	Invoke semantic actions when a rule is satisfied.
	Semantic Actions may appear after an ArcRule, a Group Rule or a named Shape Expression. When used with validation, they are invoked only a valid pairs of a triple and a rule. Their use for interface validation is currently undefined.

SHEX/RDF format

The page ShEx/RDF serialization defines SHEX/RDF schema which does self validate (Work in progress).

Formal definitions

ShEx semantics has been explained and documented with several documents describing its formalisms:

ShEx Primer - introduction to ShEx with links to editable examples.
Denotational Semantics (compare to Relax NG Semantics)
Regular Bag Expressions
Z Notation
ShEx/OperationalSemantics Operational semantics inspired by Relax NG Semantics.

Implementations

There are currently the following implementations of Shape Expressions

Fancy ShEx Demo

Formats: SHEXc
Language: javascript based
Algorithm: State based
Developer: Eric Prud'Hommeaux

Live Demo Examples:

ShEx Demo - test data against a schema, generate SPARQL and Resource Shape for the schema.
GenX Demo - use ShEx semantic actions to translate RDF to XML.
multiple inheritance example - demo ShEx's polymorphism

JSShexTest

Formats: SHEX/RDF and SHEXc(partly)
Language: Javascript based
Algorithm: State based
Developer: Jesse van Dam
Working version: [2].
Source code [3] (uses a local web server that can be started with ruby and accessed via localhost:4567).

For the validation code see [4] for the validation process (easy to read). Further description can be found here at ValidationCode.

RDFShape

Syntax: ShExc (Shex compact syntax) with some extensions like regex
Semantics: Open/Closed view of shapes
Developer: Jose Emilio Labra Gayo
Algorithm: Regular expression derivatives
Programming language: Scala
Extra features: Online RDF validator based on Shexcala

Shexcala

Syntax: ShEx compact syntax
Semantics: Closed and Open view of shapes based on Iovka's proposal
Developer: Jose Emilio Labra Gayo
Algorithm: Regular expression derivatives and backtracking (by selection)
Programming language: Scala
Extra features: Negation, Reverse arcs, language tags, regexps

Haws

Syntax: Abstract syntax
Semantics: Closed shapes based on operational semantics
Developer: Jose Emilio Labra Gayo
Algorithm: Backtracking
Programming language: Haskell

Test cases

Test script that uses simplified semantics to test the matching logic created by Eric Prud'hommeaux can be found at [5]

The SHEX test suite is defined in a standardized format that can be found here [6] and the official set of test cases can be found here [7]

SHEX/RDF based test cases still(todo) only included in Jesse van Dam scripts can be found here [8]

Examples

A separate page contains some simple examples using ShEx.

The following list contains a list of examples that employ ShEx:

Publications about ShEx

Complexity and Expressiveness of ShEx for RDF, In International Conference on Database Theory (ICDT) 2015. With S. Staworko, J. E. Labra Gayo, S. Hym, E. G. Prud’hommeaux, and H. Solbrig. PDF
Towards an RDF validation language based on Regular Expression derivatives, Jose Emilio Labra Gayo, Eric Prud'Hommeaux, Slawek Staworko and Harold Solbrig. PDF Slides
Shape Expressions: An RDF validation and transformation language, Eric Prud'hommeaux, Jose Emilio Labra Gayo, Harold Solbrig, 10th International Conference on Semantic Systems, Sept. 2015, Leipzig, Germany, PDF Slides
Validating and Describing Linked Data Portals using RDF Shape Expressions, Jose Emilio Labra Gayo, Eric Prud'hommeaux, Harold Solbrig, 1st Workshop on Linked Data Quality, Sept. 2015, Leipzig, Germany, PDF Slides

Proposed Features

UNIQUE

Proposed for 1.1

A UNIQUE constraint takes an optional scope (FOCUS|GRAPH, default: FOCUS) and 1+ predicates, e.g.:

 <T> {
   :fname LITERAL,
   :lname LITERAL,
   :title LITERAL+,
   :homepage IRI
   UNIQUE(GRAPH, :fname, :lname)
   UNIQUE(LANGTAG(:title))
   UNIQUE(GRAPH, :homepage)
 }

UNIQUEs can appear arbitrarily nested in expressions:

 <PersonShape> {
     foaf:givenName .,
     foaf:familyName
     UNIQUE(foaf:given, foaf:family)
   | foaf:name .
     UNIQUE(foaf:name)
 }

UNIQUEs scoped to the FOCUSNODE can be dispatched immediately. Those scoped to the GRAPH or DATASET must have their values noted and associated with the UNIQUE constraint, noting any possible conflicts during insertion.

Shortcomings

It's possible we'd want uniques that span shapes, e.g. if the following data were permissible:

 { <s1> :code "1234"; :dept [ :code "5678" ] .
   <s2> :code "1234"; :dept [ :code "8765" ] }

but this were not:

 { <s1> :code "1234"; :dept [ :code "5678" ] .
   <s2> :code "1234"; :dept [ :code "5678" ] }

There's no way to stipulate uniqueness across repeated properties, e.g. if we wanted to make sure that creators from a.example were unique in the graph but creators from b.example were not:

 schema:
 <S> { :creator PATTERN "^http://a\\.example/",
       :creator PATTERN "^http://b\\.example/"
 }

 failing data:
 { <s1> :creator <http://a.example/1> ; :creator <http://b.example/2> .
   <s2> :creator <http://a.example/3> ; :creator <http://b.example/2> .
 }

Alternative Syntax - on shape

The UNIQUE constraints could go on the shape.

 <T>
   UNIQUE(GRAPH, :fname, :lname)
   UNIQUE(LANGTAG(:title))
   UNIQUE(GRAPH, :homepage)
   {
     :fname LITERAL,
     :lname LITERAL,
     :title LITERAL+,
     :homepage IRI
   }

This makes evaluation of predicates in disjuncts weird, e.g.

 <PersonShape> UNIQUE(foaf:given, foaf:family) UNIQUE(foaf:name)
   { foaf:givenName ., foaf:familyName | foaf:name .}

where the semantics for enforcing UNIQUE(foaf:given, foaf:family) over the data

 { <s> :foaf:name "Bob Smith". }

are a bit weird and "NULL-y".

GRAPH constraints

Proposed for 1.1

Use Cases

Enable validation outside of a single named graph:

Dataset:

 Default graph: { <s> :lookInGraph <G1> }
 <G1>: { <s> :p2 :o2 }

Schema:

 <S> { :lookInGraph GRAPH{ @<GShape> } }
 <GShape> { :p2 . }

Discussion

See here for the currently ongoing discussions

See Discussion SHEX format for a list of other discussion topics

See here for a comparison between ShEx and OWL