ShEx/Obsolete/ShEx

From Semantic Web Standards

Obsolete - please see the ShEx github wiki



ShEX, or Shape Expressions(intro), is a language for expressing constraints on RDF graphs. It includes the cardinality constraints from OSLC Resource Shapes and Dublin Core Description Set Profiles as well as logical connectives for disjuntion and polymorphism. It is intended to:

  • validate RDF documents.
  • communicate expected graph patterns for interfaces.
  • generate user interface forms and interface code.
  • compile to SPARQL queries (except for cyclic grammars).

A W3C ShEx Demo validates data against a schema, compiles SPARQL queries for the schema and generates an RDF representation.

Syntax

The ShEx syntax is modeled after RelaxNG Compact Syntax (RNC):

<IssueShape> {                            # A Issue shape
    :state ( :unassigned :assigned ),     #   has a state with 2 possible values
    :reportedBy @<UserShape>,             #   is reported by a user
    :reportedOn xsd:date,                 #   is reported on a date
    ( :reproducedBy @<UserShape>          #   can optionally have 2 properties
    , :reproducedOn xsd:date              #     reproducedBy/On
    )?,
    :related @<IssueShape>*               #   is related to several other issues
}

<UserShape> {                             # A user shape can have either
    ( foaf:name xsd:string                #  name or
    | foaf:givenName xsd:string+ ,        #  several given names and
      foaf:familyName xsd:string          #   family name
    ), 
    foaf:mbox shex:IRI ?                  # mbox Optional, any IRI
}

The previous example can be tested here (using RDFShape) and here (with Eric's fancy demo)

Shex definition can be defined in 2 syntaxes: SHEXc (SHEX compact format) and SHEX/RDF.

Semantics

ShEx (and RNC) are designed to be familiar to users of BNF and regular expressions. The conspicuous differences are that regular expressions correlate an ordered pattern of atomic characters and logical operators against an ordered sequence of characters. Shape Expressions correlate an ordered pattern of pairs of predicate and object classes (called NameClass and ValueClass) and logical operators against an unordered set of arcs in a graph. The logical operators in Shape Expressions, grouping, conjunction, disjunction and cardinality constraints, are defined to make as closely as possible to their counterparts in regular expressions and grammar languages like BNF.

Recursive shapes (like <IssueShape>) are problematic for Shape Expressions. The meanings of such shapes are open to question. The semantics for Shape Expressions does not handle them well, going into infinite loops, or being non-deterministic, or even being paradoxical.

See for more details and test cases [1]

SHEXc Language Summary

feature example description
Matching a Predicate to a NameClass
NameTerm ex:state The predicate of any matching triple is the same as the NameTerm IRI.
NameStem ex:~ The predicate of any matching triple starts with the IRI.
NameAny . - rdf:type - ex:~ A matching triple has any predicate except those terms NameTerms or NameStems excluded by the '-' operator.
Matching an Object to a ValueClass
ValueType xsd:dateTime The object of any matching triple is the same as the ValueType IRI.
ValueSet (ex:unassigned ex:assigned) The object of any matching triple is one of the list of triples in the ValueSet.
ValueStem ex:~ The object of any matching triple starts with the IRI.
ValueAny A matching triple has any object except those terms or stems excluded by the '-' operator.
ValueReference @<UserShape> The object of a matching triple is an IRI or blank node and the that node is the subject of triples matching the referenced shape expression.
Rule Types
ArcRule foaf:givenName xsd:string+ A matching triple matches the NameTerm and the ValueTerm. Cardinality constraints apply.
AndRule foaf:givenName xsd:string,

foaf:familyName xsd:string

Each conjoint matches the input graph.
OrRule foaf:givenName xsd:string

foaf:name xsd:string

Exactly one disjoint matches the input graph.
GroupRule x:reproducedBy @<EmployeeShape>,

ex:reproducedOn xsd:dateTime)

A matching triple matches the enclosed rule (here an AndRule). Cardinality constraints apply.
Cardinality
? foaf:givenName xsd:string? rule must match 0 or 1 times.
+ foaf:givenName xsd:string+ rule must match 1 or more times.
* foaf:givenName xsd:string* rule must match 0 or more times.
{m} foaf:givenName xsd:string{3} rule must match m times.
{m,n} foaf:givenName xsd:string{3,5} rule must match at least m times and no more than n times.
Cardinality constraints may appear after an ArcRule. A '?' may also appear after a GroupRule to indicate that it is optional. Any AndRule nested immediately inside the GroupRule must have every rule match or no rule match.
Rule Inclusions
&RuleName & <PersonShape> Include the referenced rule in place of the include directive.
Rule Inclusions may appear before a shape definition inside of a definition. Befor a shape definition, they signify the inclusion of the referenced rule ("included rule") at the beginning of the one being defined, as well as asserting that ValueReferences to the included rule accept the defined shape as well.
Semantic Actions
%lang{ code %} %js{ return _.o.lex > report.lex; %}

%sparql{ ?s ex:reportedOn ?rpt . FILTER (?o > ?rpt) %}

Invoke semantic actions when a rule is satisfied.
Semantic Actions may appear after an ArcRule, a Group Rule or a named Shape Expression. When used with validation, they are invoked only a valid pairs of a triple and a rule. Their use for interface validation is currently undefined.

SHEX/RDF format

The page ShEx/RDF serialization defines SHEX/RDF schema which does self validate (Work in progress).

Formal definitions

ShEx semantics has been explained and documented with several documents describing its formalisms:

Implementations

There are currently the following implementations of Shape Expressions

Fancy ShEx Demo

  • Formats: SHEXc
  • Language: javascript based
  • Algorithm: State based
  • Developer: Eric Prud'Hommeaux

Live Demo Examples:

JSShexTest

  • Formats: SHEX/RDF and SHEXc(partly)
  • Language: Javascript based
  • Algorithm: State based
  • Developer: Jesse van Dam
  • Working version: [2].
  • Source code [3] (uses a local web server that can be started with ruby and accessed via localhost:4567).

For the validation code see [4] for the validation process (easy to read). Further description can be found here at ValidationCode.

RDFShape

  • Syntax: ShExc (Shex compact syntax) with some extensions like regex
  • Semantics: Open/Closed view of shapes
  • Developer: Jose Emilio Labra Gayo
  • Algorithm: Regular expression derivatives
  • Programming language: Scala
  • Extra features: Online RDF validator based on Shexcala

Shexcala

  • Syntax: ShEx compact syntax
  • Semantics: Closed and Open view of shapes based on Iovka's proposal
  • Developer: Jose Emilio Labra Gayo
  • Algorithm: Regular expression derivatives and backtracking (by selection)
  • Programming language: Scala
  • Extra features: Negation, Reverse arcs, language tags, regexps

Haws

  • Syntax: Abstract syntax
  • Semantics: Closed shapes based on operational semantics
  • Developer: Jose Emilio Labra Gayo
  • Algorithm: Backtracking
  • Programming language: Haskell

Test cases

Test script that uses simplified semantics to test the matching logic created by Eric Prud'hommeaux can be found at [5]

The SHEX test suite is defined in a standardized format that can be found here [6] and the official set of test cases can be found here [7]

SHEX/RDF based test cases still(todo) only included in Jesse van Dam scripts can be found here [8]

Examples

A separate page contains some simple examples using ShEx.

The following list contains a list of examples that employ ShEx:

Publications about ShEx

  • Complexity and Expressiveness of ShEx for RDF, In International Conference on Database Theory (ICDT) 2015. With S. Staworko, J. E. Labra Gayo, S. Hym, E. G. Prud’hommeaux, and H. Solbrig. PDF
  • Towards an RDF validation language based on Regular Expression derivatives, Jose Emilio Labra Gayo, Eric Prud'Hommeaux, Slawek Staworko and Harold Solbrig. PDFSlides
  • Shape Expressions: An RDF validation and transformation language, Eric Prud'hommeaux, Jose Emilio Labra Gayo, Harold Solbrig, 10th International Conference on Semantic Systems, Sept. 2015, Leipzig, Germany, PDFSlides
  • Validating and Describing Linked Data Portals using RDF Shape Expressions, Jose Emilio Labra Gayo, Eric Prud'hommeaux, Harold Solbrig, 1st Workshop on Linked Data Quality, Sept. 2015, Leipzig, Germany, PDFSlides

Proposed Features

UNIQUE

Proposed for 1.1

A UNIQUE constraint takes an optional scope (FOCUS|GRAPH, default: FOCUS) and 1+ predicates, e.g.:

 <T> {
   :fname LITERAL,
   :lname LITERAL,
   :title LITERAL+,
   :homepage IRI
   UNIQUE(GRAPH, :fname, :lname)
   UNIQUE(LANGTAG(:title))
   UNIQUE(GRAPH, :homepage)
 }

UNIQUEs can appear arbitrarily nested in expressions:

 <PersonShape> {
     foaf:givenName .,
     foaf:familyName
     UNIQUE(foaf:given, foaf:family)
   | foaf:name .
     UNIQUE(foaf:name)
 }

UNIQUEs scoped to the FOCUSNODE can be dispatched immediately. Those scoped to the GRAPH or DATASET must have their values noted and associated with the UNIQUE constraint, noting any possible conflicts during insertion.


Shortcomings

It's possible we'd want uniques that span shapes, e.g. if the following data were permissible:

 { <s1> :code "1234"; :dept [ :code "5678" ] .
   <s2> :code "1234"; :dept [ :code "8765" ] }

but this were not:

 { <s1> :code "1234"; :dept [ :code "5678" ] .
   <s2> :code "1234"; :dept [ :code "5678" ] }

There's no way to stipulate uniqueness across repeated properties, e.g. if we wanted to make sure that creators from a.example were unique in the graph but creators from b.example were not:

 schema:
 <S> { :creator PATTERN "^http://a\\.example/",
       :creator PATTERN "^http://b\\.example/"
 }
 failing data:
 { <s1> :creator <http://a.example/1> ; :creator <http://b.example/2> .
   <s2> :creator <http://a.example/3> ; :creator <http://b.example/2> .
 }

Alternative Syntax - on shape

The UNIQUE constraints could go on the shape.

 <T>
   UNIQUE(GRAPH, :fname, :lname)
   UNIQUE(LANGTAG(:title))
   UNIQUE(GRAPH, :homepage)
   {
     :fname LITERAL,
     :lname LITERAL,
     :title LITERAL+,
     :homepage IRI
   }

This makes evaluation of predicates in disjuncts weird, e.g.

 <PersonShape> UNIQUE(foaf:given, foaf:family) UNIQUE(foaf:name)
   { foaf:givenName ., foaf:familyName | foaf:name .}

where the semantics for enforcing UNIQUE(foaf:given, foaf:family) over the data

 { <s> :foaf:name "Bob Smith". }

are a bit weird and "NULL-y".

GRAPH constraints

Proposed for 1.1

Use Cases

Enable validation outside of a single named graph:

Dataset:

 Default graph: { <s> :lookInGraph <G1> }
 <G1>: { <s> :p2 :o2 }

Schema:

 <S> { :lookInGraph GRAPH{ @<GShape> } }
 <GShape> { :p2 . }

Discussion

See here for the currently ongoing discussions

See Discussion SHEX format for a list of other discussion topics

See here for a comparison between ShEx and OWL