Copyright © 2014 W3C. This document is available under the W3C Document License. See the W3C Intellectual Rights Notice and Legal Disclaimers for additional information.
Shape Expressions associate RDF graphs with labeled patterns called "shapes". Shapes can be used for validation, documentation and transformation of RDF data.
Shape Expressions Primer is a general introduction to the Shape Expressions language. The concepts in this document are linked to the normative definitions in Shape Expressions Definition.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document describes the Shape Expressions langauge developed as a community effort. It is being submitted to W3C so that it can inform the development of a future RDF Data Shape specification.
By publishing this document, W3C acknowledges that the Submitting Members have made a formal Submission request to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. A W3C Team Comment has been published in conjunction with this Member Submission. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership. Please consult the requirements associated with Member Submissions of section 3.3 of the W3C Patent Policy. Please consult the complete list of acknowledged W3C Member Submissions.
Most data expression languages have an associated constraints language.
For instance, SQL has DDL, a language with expressions like CREATE TABLE "Issue" ("state" ENUM("unassigned", "assigned"), "reportedBy" (FOREIGN KEY "reportedBy" REFERENCES "User"("ID")...)
Likewise, XML has W3C XML Schema and RelaxNG to define data structure.
Shape Expressions is intended to perform the same function for RDF graphs.
Shape Expressions can be used to validate documents, communicate expected graph patterns for interfaces, and generate user interface forms and interface code.
The syntax and semantics of Shape Expressions are designed to be familiar to users of regular expressions.
The conspicuous differences are that regular expressions correlate an ordered pattern of atomic characters and logical operators against an ordered sequence of characters.
Shape Expressions correlate an ordered pattern of pairs of predicate and object classes (called NameClass
and ValueClass
) and logical operators against an unordered set of arcs in a graph.
The logical operators in Shape Expressions, grouping, conjunction, disjunction and cardinality constraints, are defined to behave as closely as possible to their counterparts in regular expressions and grammar languages like BNF.
The examples in this document can be used in an online demo. Links to the demo are indicated with a demoref class. Most of the document will focus on an annotated issue tracking example. An accompanying Examples document lists the pre-built examples and describes the demo user interface.
Most RDF languages have adopted the SPARQL conventions of BASE
and PREFIX
declarations.
The behavior of these is detailed Turtle section 2.4 IRIs but briefly described here.
The PREFIX
directive provides associates a short string with a long URI called a namespace.
These are used when writing URIs with the form prefix:localName
.
The URI denoted by this is the concatonation of namespace associated with the prefix, and the part to the right of the first ':' ("localName", in the above example).
Our example uses BASE
and PREFIX
directives:
BASE <http://base.example/#> PREFIX ex: <http://ex.example/#> PREFIX foaf: <http://foaf.example/#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Some grammar languages provide some starting point for validating documents or generating forms.
In Shape Expressions, the starting point is specified by the start
keyword.
start = <IssueShape>
This directive says to start with the <IssueShape>
, which is really <http://base.example/#IssueShape>
because of the BASE
directive above.
It is not necessary to identify a particular node in the graph for validation operations.
Nor is it necessary to provide a start
ing point for all operations.
For instance, generating a sequence of forms obviously needs to start somewhere, but some documents can be validated by optimistically testing each shape expression against each node in the graph.
This exhaustive search is more expensive and raises the possibility that a document validates in a way that the author of the document did not intend.
This document treats the more constrained scenario with a starting point in both the graph and the schema.
A shape expression is a labeled pattern for a set of RDF Triples with a common subject. Syntactically, it's a pairing of a label, which is an IRI or a blank node, and a rule inside a pair of "{" "}". Typically, this rule is a conjunction of constraints separated by ',':
<IssueShape> { ex:state (ex:unassigned ex:assigned), ex:reportedBy @<UserShape>, ex:reportedOn xsd:dateTime, ( ex:reproducedBy @<EmployeeShape>, ex:reproducedOn xsd:dateTime )?, ex:related @<IssueShape>* }
The rules in the above example have links to sections describing their interpretation.
ex:state
The first constraint above, ex:state (ex:unassigned ex:assigned)
, specifies that the ex:state
attribute must be one of the values ex:unassigned
or ex:assigned
.
The first part, ex:state, is called the NameClass
, and identifies a class of RDF predicates.
The second part, (ex:unassigned ex:assigned), is called the ValueClass
, and identifies a class of RDF objects.
Together, they form an ArcRule
.
In this ArcRule
, the NameClass
is a pfIRI
(defn) and the ValueClass
is a ValueSet
(defn).
When used for validation, these combine to say that for some node in a graph to conform to an <IssueShape>
, it must have exactly one ex:state
with a value of ex:unassigned
or ex:assigned
.
When used for interface definition, this constraint could produce an input in a form with a selection for state of either "unassigned" or "assigned", e.g.
ex:reportedBy
The second ArcRule
(constraint) above, ex:reportedBy @<UserShape>
, asserts that object of the ex:reportedBy
property conforms to another labeled shape expression called <UserShape>
.
This is a ValueReference
(defn) to a shape expression described below.
As with ex:state
above, the cardinality is exactly one.
ex:reportedOn
The third ArcRule
above, ex:reportedOn xsd:dateTime
, asserts that object of the ex:reportedOn
property is of type xsd:dateTime
.
ShEx supports the same set of W3C XML Schema datatypes as does SPARQL.
Unlike SPARQL, ShEx validates the lexical representation of these datatypes, so the object of the ex:reportedOn
property is tested against the XML Schema definition for dateTime.
(rule1, rule2)?
The fourth constraint above, ( ex:reproducedBy @<EmployeeShape>, ex:reproducedOn xsd:dateTime )?
, is a GroupRule
(defn).
The '?' says that the cardinality is 0 or 1.
Together they assert that there may both an ex:reproducedBy
and ex:reproducedOn
, or neither.
The enclosed rules are ArcRule
s and similar to the ex:reportedBy
and ex:reportedOn
ArcRule
s above.
This <EmployeeShape>
shape expression is described below.
ex:related @<IssueShape>
The last constraint above, ex:related @<IssueShape>
, is an example of a cyclic rule.
The '*' says that there may be any number of related issues, including 0.
<UserShape>
The following UserShape is referenced by the IssueShape above. Where the above labeled shape expression describes the data related to issues in some issue tracker, this captures the information about the users of that system.
<UserShape> { (foaf:name xsd:string | foaf:givenName xsd:string+, foaf:familyName xsd:string), foaf:mbox rdf:Resource }
(rule1, rule2)?
The first constraint above, foaf:name xsd:string | foaf:givenName xsd:string+, foaf:familyName xsd:string
, is an XorRule
.
The rule on the right side of the '|'
is another conjunction.
Together they assert that a user of the system has either a foaf:name
or at least one foaf:givenName
(+
means a cardinality of one or more) and exactly one foaf:familyName
.
RDF 1.1 established that the datatype of a plain literal is xsd:string
so either [] foaf:name "Bob Smith" .
or [] foaf:name "Bob Smith"^^xsd:string .
would match the first disjoint.
The second disjoint is a conjoint of foaf:givenName xsd:string+, foaf:familyName xsd:string
.
The foaf:givenName
property has a cardinality of one or more so the following graph would match: [] foaf:givenName "Robert", "Bob", "Bobby", "Robbie" ; foaf:familyName "Smith"
.
A disjunction of n disjoints requires that the data match exactly one of the disjoints.
A graph like [] foaf:name "Bob Smith" ; foaf:givenName "Bob" ; foaf:familyName "Smith"
would be invalid because two disjoints match at once.
This implies that interfaces generated from an XorRule
have a choice to, in this example, supply either a full name or a one or more given names and a family name.
ex:mbox rdf:Resource
The last constraint above, ex:mbox rdf:Resource
, uses a special type called rdf:Resource
.
Recall that ex:reportedOn
and ex:reproducedOn
specified that the object was a literal of type of an xsd:dateTime
.
A type of rdf:Resource
means that the object is an IRI instead of a literal.
<EmployeeShape>
The EmployeeShape below is referenced by the IssueShape above and included here for completeness. It does not introduce any new features of the language.
<EmployeeShape> { foaf:givenName xsd:string+, foaf:familyName xsd:string, foaf:phone rdf:Resource+, foaf:mbox rdf:Resource }
The <IssueShape> example also includes a GroupRule with a cardinality of 0 or 1:
( ex:reproducedBy @<EmployeeShape>, ex:reproducedOn xsd:dateTime )?
As explained above, this requires that the data have either neither or both of those triples.
Cardinality constraints may appear on ArcRules and GroupRules.
They may be expressed as one of (?
, +
, *
) or as one or two integers in {}
s.
If there is only one number in {}
s, the minimum cardinality is that number and the maximum is unconstrained.
An employee record which permitted from one to three given names would look like
foaf:givenName xsd:string{1,3}
It is frequently useful to reuse or extend a shape.
For instance, if both the <UserShape> and <EmployeeShape> permitted the same alternatives for specifying a name and email address, these could be factored into a separate shape called a <PersonShape>
:
<PersonShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+, foaf:familyName xsd:string ), foaf:mbox rdf:Resource } <UserShape> { & <PersonShape> } <EmployeeShape> { & <PersonShape>, foaf:phone rdf:Resource+ }
In this example, the <UserShape>
provides no additional constraints beyond those of the included <PersonShape>
.
We may have several derivatives of <PersonShape>
, any of which could provide an <IssueShape>
's ex:reportedBy
value.
We can signify this by changing <IssueShape>
to have ex:reportedBy @<PersonShape>
and define sub-shapes of <PersonShape>
.
This is done with an inclusion directive before the shape definition.
We may not want the base <PersonShape>
to satisfy any ValueReferences directly, instead requiring only derivates of <PersonShape>
.
This is accomplished by labeling <PersonShape>
VIRTUAL
per the hierarchy example:
<IssueShape> { ex:state (ex:unassigned ex:assigned), ex:reportedBy @<PersonShape> # ... } VIRTUAL <PersonShape> { ( foaf:name xsd:string | foaf:givenName xsd:string+, foaf:familyName xsd:string ), foaf:mbox rdf:Resource } <UserShape> & <PersonShape> { # additional User properties } <EmployeeShape> & <PersonShape> { foaf:phone rdf:Resource+ # additional Employee properties }
Should there be a "greedy" directive to accept only the variant which touches the most triples? An alternative is to say that parent classes are "closed" in the sense that no other properties may appear on a subject matched by that shape.
The <IssueShape> example above includes both ex:reportedOn
and ex:reproducedOn
dateTimes.
It would be reasonable in the interest of data quality to ensure that the ex:reproducedOn
dateTime, if present, were temporally after the ex:reportedOn
dateTime.
While ShEx itself has no built-in functionality for comparing dateTimes, specific extensions may offer that functionality.
The example below (failed semantic action validation) includes semantic actions to test date order in either Javascript or SPARQL:
ex:reportedOn xsd:dateTime %js{ report = _.o; return true; %}, (ex:reproducedBy @<EmployeeShape>, ex:reproducedOn xsd:dateTime %js{ return _.o.lex > report.lex; %} %sparql{ ?s ex:reportedOn ?rpt . FILTER (?o > ?rpt) %} )
Semantic actions may also be used to generate schema-specific parsers or tools. Below is an excerpt of a tool that uses a DOM tree to translate the the Issue example into an XML document:
<IssueShape> { ex:state (ex:unassigned ex:assigned) %js{ doc = _.createDocument('http://ex.example/xml', 'Issue', undefined); issue = doc.documentElement issue.setAttribute('id', _.s.lex.substr(17)); state = doc.createElementNS('http://ex.example/xml', 'state'); state.textContent = _.o.lex.substr(17); issue.appendChild(state); %}, ex:reportedBy @<UserShape>, … } %js{ console.log(new XMLSerializer().serializeToString(doc)); %}
This example relies on a particular invocation for semantic actions, but illustrates the power in the extensibility mechanism.
The IssueShape tutorial above is oriented towards a particular use case where the schema will use a very explicit set of predicates and accept no others. Shape Expressions is also useful for controlling describing interfaces or graph patterns where any predicates are allowed except those in controlled namespaces. For example, some systems like Annotea reserved the assertion of dc:creator arcs for the system to maintain provenance information. The language summary below includes language features to describe such an interface.
feature | example | description |
---|---|---|
Matching a Predicate to a NameClass | ||
pfIRI | ex:state | The predicate of any matching triple is the same as the pfIRI IRI. |
pfIRI | ex:~ | The predicate of any matching triple starts with the IRI. |
pfWild | . - rdf:type - ex:~ | A matching triple has any predicate except those terms pfIRIs or pfIRIs excluded by the '-' operator. |
Matching an Object to a ValueClass | ||
ValueType | xsd:dateTime | The object of any matching triple is the same as the ValueType IRI. |
ValueSet | (ex:unassigned ex:assigned) | The object of any matching triple is one of the list of triples in the ValueSet. |
ValueStem | ex:~ | The object of any matching triple starts with the IRI. |
ValueWild | . - rdf:type - ex:~ | A matching triple has any object except those terms or stems excluded by the '-' operator. |
ValueReference | @<UserShape> | The object of a matching triple is an IRI or blank node and the that node is the subject of triples matching the referenced shape expression. |
Rule Types | ||
ArcRule | foaf:givenName xsd:string+ | A matching triple matches the pfIRI and the ValueTerm. Cardinality constraints apply. |
AndRule | foaf:givenName xsd:string, foaf:familyName xsd:string | Each conjoint matches the input graph. |
XorRule | foaf:givenName xsd:string | foaf:name xsd:string | Exactly one disjoint matches the input graph. |
GroupRule | (x:reproducedBy @<EmployeeShape>, ex:reproducedOn xsd:dateTime) | A matching triple matches the enclosed rule (here an AndRule). Cardinality constraints apply. |
Cardinality | ||
? | foaf:givenName xsd:string? | rule must match 0 or 1 times. |
+ | foaf:givenName xsd:string+ | rule must match 1 or more times. |
* | foaf:givenName xsd:string* | rule must match 0 or more times. |
{m} | foaf:givenName xsd:string{3} | rule must match m times. |
{m,n} | foaf:givenName xsd:string{3,5} | rule must match at least m times and no more than n times. |
Cardinality constraints may appear after an ArcRule. A '?' may also appear after a GroupRule to indicate that it is optional. Any AndRule nested immediately inside the GroupRule must have every rule match or no rule match. | ||
Rule Inclusions | ||
&RuleName | & <PersonShape> | Include the referenced rule in place of the include directive. |
Rule Inclusions may appear before a shape definition inside of a definition. Before a shape definition, they signify the inclusion of the referenced rule ("included rule") at the beginning of the one being defined, as well as asserting that ValueReferences to the included rule accept the defined shape as well. | ||
Semantic Actions | ||
%lang{ code %} | %js{ return _.o.lex > report.lex; %} %sparql{ ?s ex:reportedOn ?rpt . FILTER (?o > ?rpt) %} | Invoke semantic actions when a rule is satisfied. |
Semantic Actions may appear after an ArcRule, a Group Rule or a named Shape Expression. When used with validation, they are invoked only a valid pairs of a triple and a rule. Their use for interface validation is currently undefined. |