ShEx

From Semantic Web Standards
Jump to: navigation, search

ShEX, or Shape Expressions(intro), is a language for expressing constraints on RDF graphs. It includes the cardinality constraints from OSLC Resource Shapes and Dublin Core Description Set Profiles as well as logical connectives for disjuntion and polymorphism. It is intended to:

  • validate RDF documents.
  • communicate expected graph patterns for interfaces.
  • generate user interface forms and interface code.
  • compile to SPARQL queries (except for cyclic grammars).

A W3C ShEx Demo validates data against a schema, compiles SPARQL queries for the schema and generates an RDF representation.

Syntax

The ShEx syntax is modeled after RelaxNG Compact Syntax (RNC):

 <AppUserShape> {                    # <AppUserShape> has:
     (                               # either
        foaf:name xsd:string         #   a FOAF name
      |                              #  or
        foaf:givenName xsd:string+,  #   one or more givenNames
        foaf:familyName xsd:string), #   and one familyName.
     foaf:mbox IRI                   # one FOAF mbox.
 }

Semantics

ShEx (and RNC) are designed to be familiar to users of BNF and regular expressions. The conspicuous differences are that regular expressions correlate an ordered pattern of atomic characters and logical operators against an ordered sequence of characters. Shape Expressions correlate an ordered pattern of pairs of predicate and object classes (called NameClass and ValueClass) and logical operators against an unordered set of arcs in a graph. The logical operators in Shape Expressions, grouping, conjunction, disjunction and cardinality constraints, are defined to make as closely as possible to their counterparts in regular expressions and grammar languages like BNF.


RDF serialization

The page about ShEx/RDF serialization defines RDF schema using SHEX itself, which self validates (Work in progress).

Issues

language for formal semantics

ShEx has been documented with a Denotational Semantics, Z Notation and as a Regular Bag Expression. The target audience are used to e.g. the Relax NG Semantics.

greedy matching

Should there be a "greedy" semantics so there's only one solution to:

schema:

 start=<a>
   <a> { <p1> . }
   <b> & <a> { <p2> . }

data:

 <s> <p1> 1 .
 <s> <p2> 2 .

Note that the live example validates as both <a> and <b> because invoking <a> implies all derived types, i.e.

((<p1> .)|
 (& <a>,
  <p2> ("2"^^<http://www.w3.org/2001/XMLSchema#integer>)))

One can always use VIRTUAL to prevent an ancestor from providing a valid solution.

schema:

 start=<a>
   VIRTUAL <a> { <p1> . }
   <b> & <a> { <p2> . }

hierarchical punctuation

Should the inclusion character be &, : (as OO folks are used to) or something else. & could also be reserved for the intersection of two other shapes.

Language Summary

feature example description
Matching a Predicate to a NameClass
NameTerm ex:state The predicate of any matching triple is the same as the NameTerm IRI.
NameStem ex:~ The predicate of any matching triple starts with the IRI.
NameAny . - rdf:type - ex:~ A matching triple has any predicate except those terms NameTerms or NameStems excluded by the '-' operator.
Matching an Object to a ValueClass
ValueType xsd:dateTime The object of any matching triple is the same as the ValueType IRI.
ValueSet (ex:unassigned ex:assigned) The object of any matching triple is one of the list of triples in the ValueSet.
ValueStem ex:~ The object of any matching triple starts with the IRI.
ValueAny A matching triple has any object except those terms or stems excluded by the '-' operator.
ValueReference @<UserShape> The object of a matching triple is an IRI or blank node and the that node is the subject of triples matching the referenced shape expression.
Rule Types
ArcRule foaf:givenName xsd:string+ A matching triple matches the NameTerm and the ValueTerm. Cardinality constraints apply.
AndRule foaf:givenName xsd:string,

foaf:familyName xsd:string

Each conjoint matches the input graph.
OrRule foaf:givenName xsd:string

foaf:name xsd:string

Exactly one disjoint matches the input graph.
GroupRule x:reproducedBy @<EmployeeShape>,

ex:reproducedOn xsd:dateTime)

A matching triple matches the enclosed rule (here an AndRule). Cardinality constraints apply.
Cardinality
? foaf:givenName xsd:string? rule must match 0 or 1 times.
foaf:givenName xsd:string+ rule must match 1 or more times.
* foaf:givenName xsd:string* rule must match 0 or more times.
{m} foaf:givenName xsd:string{3} rule must match m times.
{m,n} foaf:givenName xsd:string{3,5} rule must match at least m times and no more than n times.
Cardinality constraints may appear after an ArcRule. A '?' may also appear after a GroupRule to indicate that it is optional. Any AndRule nested immediately inside the GroupRule must have every rule match or no rule match.
Rule Inclusions
&RuleName & <PersonShape> Include the referenced rule in place of the include directive.
Rule Inclusions may appear before a shape definition inside of a definition. Befor a shape definition, they signify the inclusion of the referenced rule ("included rule") at the beginning of the one being defined, as well as asserting that ValueReferences to the included rule accept the defined shape as well.
Semantic Actions
%lang{ code %} %js{ return _.o.lex > report.lex; %}

%sparql{ ?s ex:reportedOn ?rpt . FILTER (?o > ?rpt) %}

Invoke semantic actions when a rule is satisfied.
Semantic Actions may appear after an ArcRule, a Group Rule or a named Shape Expression. When used with validation, they are invoked only a valid pairs of a triple and a rule. Their use for interface validation is currently undefined.

Matching process

See ValidationCode for a working set of test cases that validates a RDF database against a SHEX definition, which works with the RDF serialization of SHEX.

Resources

Current Discussion

See here the current discussion of defining the SHEX standard.