slanted W3C logo
Cover page images (keys)

SHACL and ShEx

RDF Data Shapes WG F2F
19 May, 2015

http://www.w3.org/2015/Talks/0519-shacl-egp/

Problem Statement

Useful data needs consistent structure:

Generators/consumers detect errors.
XML Schema analogy proposes support for:

@prefix : <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/'> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<issue7> a :Issue , :SecurityIssue ;
    :state :unassigned ;
    :reportedBy <user6> , <user2> ; # cardinality 1
    :reportedOn "2012-12-31T23:57:00"^^xsd:dateTime ;
    :assignedTo <user2>, <user1> ;
    :assignedOn "2012-11-31T23:57:00"^^xsd:dateTime ;
                       # reproduced before being reported
    :related <issue4>, <issue3>, <issue2> .
                       # referenced issues not included

<issue4> # a ???         missing type arc
    :state :unsinged ; # misspelled
    # :reportedBy ??? -  missing
    :reportedOn "2012-12-31T23:57:00"^^xsd:dateTime .

<user2> a foaf:Person ;
    foaf:givenName "Alice" ;
    foaf:familyName "Smith" ;
    foaf:phone <tel:+1.555.222.2222> ;
    foaf:mbox <mailto:alice@example.com> .

<user6> a foaf:Agent ; # should be foaf:Person
    foaf:givenName "Bob" ; # foaf:familyName "???" - missing
    foaf:phone <tel:+.555.222.2222> ; # malformed tel: URL
    foaf:mbox <mailto:alice@example.com> .

Strategy

Outline:

What's in ShEx

What is Shape Expressions?

PREFIX issue: <http://ex.example/>
PREFIX foaf: <http://xmlns.com/foaf/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX myco: <http://myco.example/#>

start = <IssueShape>

<IssueShape> {
    issue:status (issue:unassigned issue:assigned),
    issue:reportedBy @<UserShape>,
    issue:reportedOn xsd:dateTime,
    ( issue:assignedTo @<EmployeeShape>,
      issue:assignedOn xsd:dateTime )?
}

<UserShape> {
    (foaf:name LITERAL
     | foaf:givenName LITERAL+,
       foaf:familyName LITERAL),
    foaf:mbox IRI
}

<EmployeeShape> {
    foaf:page (myco:Employee~),
    foaf:givenName LITERAL+,
    foaf:familyName LITERAL,
    foaf:phone IRI*,
    foaf:mbox IRI
}

User example

What's my user profile look like?

PREFIX foaf: <http://xmlns.com/foaf/>

<UserShape> {
    (foaf:name LITERAL
     | foaf:givenName LITERAL+,
       foaf:familyName LITERAL),
    foaf:mbox IRI
}

User example

What's my user profile look like?

PREFIX foaf: <http://xmlns.com/foaf/>

<UserShape> {
    (foaf:name LITERAL
     | foaf:givenName LITERAL+,
       foaf:familyName LITERAL),
    foaf:mbox IRI
}

User example

What's my user profile look like?

PREFIX foaf: <http://xmlns.com/foaf/>

<UserShape> {
    (foaf:name LITERAL
     | foaf:givenName LITERAL+,
       foaf:familyName LITERAL),
    foaf:mbox IRI
}

compare with other schema languages...

RelaxNG Compact Syntax

    (element foaf:name { xsd:string }
     | (element foaf:givenName { xsd:string }+,
        element foaf:familyName { xsd:string })),
    element foaf:mbox { xsd:anyURI }

Regex

(N|(G+F))M
NM
GFM
GGGFM

W3C XML Schema

  <xs:complexType name="UserContent">
    <xs:sequence>
      <xs:choice>
        <xs:element name="name" type="xs:string"/>
        <xs:sequence>
          <xs:element maxOccurs="unbounded" name="givenName" type="xs:string"/>
          <xs:element name="familyName" type="xs:string"/>
        </xs:sequence>
      </xs:choice>
      <xs:element name="mbox" type="xs:anyURI"/>
    </xs:sequence>
  </xs:complexType>

Compilation to SPARQL

PREFIX foaf: <http://xmlns.com/foaf/>

<UserShape> {
    (foaf:name LITERAL
     | foaf:givenName LITERAL+,
       foaf:familyName LITERAL),
    foaf:mbox IRI
}
PREFIX foaf: <http://xmlns.com/foaf/>
ASK {
    { SELECT ?UserShape WHERE {
        {
            { SELECT ?UserShape {                             
              ?UserShape foaf:name ?o .                       
            } GROUP BY ?UserShape HAVING (COUNT(*)=1)}        
            { SELECT ?UserShape {                             
              ?UserShape foaf:name ?o . FILTER (isLiteral(?o))
            } GROUP BY ?UserShape HAVING (COUNT(*)=1)}        
        } UNION {
            { SELECT ?UserShape (COUNT(*) AS ?UserShape_c0) {      
              ?UserShape foaf:givenName ?o .                       
            } GROUP BY ?UserShape HAVING (COUNT(*)>=1)}            
            { SELECT ?UserShape (COUNT(*) AS ?UserShape_c1) {      
              ?UserShape foaf:givenName ?o . FILTER (isLiteral(?o))
            } GROUP BY ?UserShape HAVING (COUNT(*)>=1)}            
            FILTER (?UserShape_c0 = ?UserShape_c1)                 
            { SELECT ?UserShape {                                   
              ?UserShape foaf:familyName ?o .                       
            } GROUP BY ?UserShape HAVING (COUNT(*)=1)}              
            { SELECT ?UserShape {                                   
              ?UserShape foaf:familyName ?o . FILTER (isLiteral(?o))
            } GROUP BY ?UserShape HAVING (COUNT(*)=1)}              
        }
    } GROUP BY ?UserShape HAVING (COUNT(*) = 1)}
    { SELECT ?UserShape {                         
      ?UserShape foaf:mbox ?o .                   
    } GROUP BY ?UserShape HAVING (COUNT(*)=1)}    
    { SELECT ?UserShape {                         
      ?UserShape foaf:mbox ?o . FILTER (isIRI(?o))
    } GROUP BY ?UserShape HAVING (COUNT(*)=1)}    
}

Triple Constraints

Using datatypes, RDF node kinds, referenced shapes, value sets:

tripleConstraint

valueClass

groupShapeConstr

Value sets

A value set is a set of possible values.

ex:mood ("happy" "sad" "indigo")
ex:mood (mood:happy mood:sad mood:indigo)
ex:mood (mood:~)

IRI stems and exclusions

IRIs in predicates and value sets can have:

PREFIX annot: http://www.w3.org/annotea/ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

start={
  annot:context LITERAL,
  dc:~ - dc:author - dc:creator .*
}
PREFIX annot: http://www.w3.org/annotea/ns#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>

<Annot1>
  annot:context "xpath...";
  dc:abstract "stuff" ;
  dc:audience "9606" ;
  dc:description """some
long
description""" .

Reverse arcs

Test arcs coming into an object.

PREFIX issue: <http://ex.example/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

start = <IssueShape>

<IssueShape> {
    issue:status (issue:assigned issue:resolved ),
    issue:reportedOn xsd:dateTime,
    issue:assignedOn xsd:dateTime,
    issue:related @<RefdIssueShape>
}

<RefdIssueShape> {
    issue:name LITERAL,
    ^issue:related <IssueShape>
}
PREFIX issue: <http://ex.example/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

<Issue1>
    issue:status     issue:assigned ;
    issue:reportedOn "2013-01-23T10:18:00"^^xsd:dateTime ;
    issue:assignedOn "2013-01-23T11:00:00"^^xsd:dateTime ;
    issue:related    <Issue3> .

<Issue3>
    issue:name "smokey" .

Closed shapes

Use case: storing application data in a

"If I tell you X, will you understand it?"

Closed shapes

What about re-used nodes?

Closed shapes:

PREFIX issue: <http://ex.example/>
PREFIX foaf: <http://xmlns.com/foaf/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

start = <IssueShape>

<IssueShape> {
    issue:reportedBy @<UserShape>,
    issue:assignedTo @<EmployeeShape>?
}

<UserShape> {
    foaf:name LITERAL
    foaf:mbox IRI
}

<EmployeeShape> {
    foaf:givenName LITERAL+,
    foaf:familyName LITERAL,
    foaf:mbox IRI
}

<User2> fits multiple shapes:

PREFIX ex: <http://ex.example/>
PREFIX foaf: <http://xmlns.com/foaf/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

<Issue1>
    ex:reportedBy   <User2> ;
    ex:assignedTo   <User2> ;

<User2>
    foaf:name "Bob Smith" ;
    foaf:givenName "Bob" ;
    foaf:familyName "Smith" ;
    foaf:mbox <mailto:bob@example.org> .

Semantic actions

For extensibility:

For actions:

API-native

Provides purpose-fit expressivity with minimal implementation cost.

Javascript API actions embedded in %js{ … %}.

PREFIX issue: <http://ex.example/>
PREFIX foaf: <http://xmlns.com/foaf/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

start = <IssueShape>

<IssueShape> {
    issue:state (issue:unassigned issue:assigned),
    issue:reportedBy @<UserShape>,
    issue:reportedOn xsd:dateTime,
    (issue:reproducedBy @<EmployeeShape>,
     issue:reproducedOn xsd:dateTime
        %js{ return _.o > _["issue:reportedBy"]; %}
    )?,
    issue:related @<IssueShape>*
}
PREFIX issue: <http://ex.example/>
PREFIX foaf: <http://xmlns.com/foaf/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

<Issue1>
    ex:state        ex:unassigned ;
    ex:reportedBy   <User2> ;
    ex:reportedOn   "2013-01-23T10:18:00"^^xsd:dateTime ;
    ex:reproducedBy <Thompson.J> ;
    ex:reproducedOn "2013-01-23T10:00:00"^^xsd:dateTime .

SPARQL

Isn't there an RDF query language?

%sparql{ … %} can complement our %js{ … %} actions.

PREFIX issue: <http://ex.example/>
PREFIX foaf: <http://xmlns.com/foaf/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

start = <IssueShape>

<IssueShape> {
    issue:state (issue:unassigned issue:assigned),
    issue:reportedBy @<UserShape>,
    issue:reportedOn xsd:dateTime,
    (issue:reproducedBy @<EmployeeShape>,
     issue:reproducedOn xsd:dateTime
        %js{ return _.o > _["issue:reportedBy"]; %}
        %sparql{ ?s issue:reportedOn ?rpt . FILTER (?o > ?rpt) %}
    ),
    issue:related @<IssueShape>*
}
PREFIX issue: <http://ex.example/>
PREFIX foaf: <http://xmlns.com/foaf/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

<Issue1>
    ex:state        ex:unassigned ;
    ex:reportedBy   <User2> ;
    ex:reportedOn   "2013-01-23T10:18:00"^^xsd:dateTime ;
    ex:reproducedBy <Thompson.J> ;
    ex:reproducedOn "2013-01-23T10:00:00"^^xsd:dateTime .

SPARQL vs. SPARQL

%sparqlA{ … %} inherits BASE and PREFIX. %sparqlB{ … %} does not.

PREFIX issue: <http://ex.example/>
PREFIX foaf: <http://xmlns.com/foaf/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

start = <IssueShape>

<IssueShape> {
    issue:reportedOn xsd:dateTime,
    issue:reproducedOn xsd:dateTime
      %sparqlA{ ?s issue:reportedOn ?rpt . FILTER (?o > ?rpt) %}
}
PREFIX issue: <http://ex.example/>
PREFIX foaf: <http://xmlns.com/foaf/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

start = <IssueShape>

<IssueShape> {
    issue:reportedOn xsd:dateTime,
    issue:reproducedOn xsd:dateTime
}
%sparqlB{
  BASE  <http://example.issue/ns/issue#>
  PREFIX issue: <http://example.issue/ns/issue#>
  PREFIX issue: <http://example.issue/ns/issue#>
  ASK {
    ?this issue:reportedOn ?rpt . FILTER (?o > ?rpt)
  }
%}

Documents

Semantic Foundation

  1. well-defined, impementation-independent semantics.
    covers permutations of:
    • recursion (valueShape)
    • negation
    • oneOf
    • someOf
    • closed shapes
    • closed schemas
    • repetitions, e.g. one or more of (User or Group)
    • multi-occurance, i.e. same predicate more than once
  2. covers use cases XML users expect and JSON users will grow to want.
  3. has clear and predictable results, which matters if we have a SHACL 1.1
  4. encourages examination and feedback from language validation experts.

Example SPARQL extension

The RDF representations are basically the same as the SPIN proposal.

The ShEx syntax can include the body of SPARQL actions:

<ProvidedCHO> {
    a :ProvidedCHO 
    edm:aggregatedCHO IRI %sparql{ FILTER (?s = ?o) %}
}

DC Use Case #2

#2 For every CHO or ore:Proxy dc:coverage or dc:subject or dc:type 
#  and dcterms:spatial; dc:language for text, dc:title or dc:description, 
#  must be present (DC:R-68) (W3:R-5.2)
<CHOShape> {
  ( ore:Proxy . 
   | dc:coverage . 
   | dc:subject . 
   | dc:type . ),
  dc:language xsd:string?,
  dc:title xsd:string?,
  dc:description xsd:string?
}