W3C

Examples of RDF Validation


Abstract

Most data representation languages used in conventional settings offer some sort of input validation, ranging from parsing grammars for domain-specific languages to XML Schema or RelaxNG for XML structures. The open world constraints placed on RDF languages make validation difficult and less complete than their counterparts in other data formats. A variety of approaches exists to somewhat address this, and further development of validation tools and protocols could greatly enhance the uptake of RDF.

This document is intended to provide ideas and inspiration for the W3C Workshop on RDF Validation. The use cases, techniques and technologies listed here do not constraint the Workshop.

Status of This Document

This is a draft by W3C staff members. This document is not endorsed by the W3C or its member companies.


Introduction

Like XML, RDF has schema languages to describe the structure of RDF instance data. Unlike XML Schema, RDF Schema is generally interpreted as supplementing rather than validating RDF data.

Uses

Example: Issue Tracking

Taking as a use case an issue tracking database, we have interrelated issues reported by people. A simple class model uses one common vocabulary (FOAF) and one domain-specific vocabulary:

image/svg+xml :Issue :state :reportedOn :reproducedOn foaf:Person foaf:givenName foaf:FamilyName foaf:phone foaf:mbox :related :reproducedBy :reportedBy

Sample instance data in Turtle represention (with errors) will help illustrate our validationr requirements:

@prefix : <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/'> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<issue7> a :Issue , :SecurityIssue ;
    :state :unassigned ;
    :reportedBy <user6> , <user2> ; # only one reportedBy permitted
    :reportedOn "2012-12-31T23:57:00"^^xsd:dateTime ;
    :reproducedBy <user2>, <user1> ;
    :reproducedOn "2012-11-31T23:57:00"^^xsd:dateTime ; # reproduced before being reported
    :related <issue4>, <issue3>, <issue2> . # referenced issues not included

<issue4> # a ??? - missing type arc
    :state :unsinged ; # misspelled term in value set.
    # :reportedBy ??? - missing required property
    :reportedOn "2012-12-31T23:57:00"^^xsd:dateTime .

<user2> a foaf:Person ;
    foaf:givenName "Alice" ;
    foaf:familyName "Smith" ;
    foaf:phone <tel:+1.555.222.2222> ;
    foaf:mbox <mailto:alice@example.com> .

<user6> a foaf:Agent ; # should be foaf:Person
    foaf:givenName "Bob" ; # foaf:familyName "???" - missing required property
    foaf:phone <tel:+.555.222.2222> ; # malformed tel: URL
    foaf:mbox <mailto:alice@example.com> .

The above errors include:

In the open world, there could always be more information supplying properties or referents. With languages like OWL, it's even possible that a value could be asserted to be equivalent to one in a value set. Many validation use cases require closing the world and reporting errors over the information provided to the validator.

Approaches

Different approaches to validation will result in different value in terms of expressivity, simplicity and predictability. Below are various ways to represent and enforce the "valid" schema.

SPARQL

SPARQL has been used to validate data, e.g. to test the results of parsing RDFa (see RDFa Test Harness). This SPARQL query produces a table showing the validation results testing a representative sample of the identified validation errors:

PREFIX : <http://www.w3.org/2012/12/rdf-val/SOTA-ex#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/'>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?issue
  (if(BOUND(?t), "passed", "missing") AS ?typeArc)
  (if(BOUND(?state) && (?state=:unassigned || ?state=:assigned),
      "passed", "invalid") AS ?stateValue)
  (if(BOUND(?reportedBy), "passed", "missing") AS ?reportedByArc)
  (if(BOUND(?reportedOn), "passed", "missing") AS ?reportedOnArc)
  (if(!BOUND(?reportedByCount), "expected 1, got 0", 
   if(?reportedByCount=1, "passed", 
      CONCAT("expected 1, got ", STR(?reportedByCount)))) AS ?reportedByArcCount)
  (if(!BOUND(?reproducedOn) || ?reproducedOn > ?reportedOn,
      "passed", "bad sequence") AS ?reproducedOnSequence)
  (if(BOUND(?reportedOn), "passed", "missing") AS ?reportedOnArc)
  (if(BOUND(?missingRelatedIssuesStr), ?missingRelatedIssuesStr, "passed")
      AS ?missingRelatedIssues)

  WHERE {

  # Get all viable :Issues by use of related predicates.
  { SELECT DISTINCT ?issue WHERE {
           { ?issue a :Issue }
     UNION { ?issue :reportedBy|:reportedOn|:reproducedBy|:reproducedOn|:related ?rprt }
    }
  }

  # Test for a type arc and state.
  OPTIONAL { ?issue a ?t FILTER (?t = :Issue) }
  OPTIONAL { ?issue :state ?state }

  # Must have 1 reportedBy.
  OPTIONAL { SELECT ?issue
      (SAMPLE(?reportedBy1) AS ?reportedBy)
      (COUNT(?reportedBy1) AS ?reportedByCount)
     WHERE {
      OPTIONAL { ?issue :reportedBy ?reportedBy1 }
    } GROUP BY ?issue
  }
  OPTIONAL { ?issue :reportedOn ?reportedOn }
  OPTIONAL { ?issue :reproducedBy ?reproducedBy }
  OPTIONAL { ?issue :reproducedOn ?reproducedOn }

  # All :related issues must be known entities.
  OPTIONAL {
    SELECT ?issue
      (GROUP_CONCAT(CONCAT("<", STR(?referent), ">"))
       AS ?missingRelatedIssuesStr) {

          # List of missing issues related to ?issue.
          SELECT ?issue ?referent
             (SUM(if(BOUND(?referentP), 1, 0)) AS ?referentCount)
           WHERE {
            ?issue :related ?referent
               OPTIONAL { ?referent ?referentP ?referentO }
          } GROUP BY ?issue ?referent
          HAVING (SUM(if(BOUND(?referentP), 1, 0)) = 0)
    } GROUP BY ?issue
  }
}

The query results associates pass/fail/error messages with validation tests for each tested entity:

?issue ?typeArc ?stateValue?reportedByArc?reportedOnArc?reportedByArcCount?reproducedOnSequence?reportedOnArc?missingRelatedIssues
<issue7> "passed" "passed" "passed" "passed""expected 1, got 2" "bad unsequence" "passed""<issue3> <issue2>"
<issue4>"missing" "invalid" "passed" "passed""expected 1, got 0" "passed" "passed" "passed"

OWL Conventional Use

The Web Ontology Language offers a fairly complex language for declaring restrictions on the use of predicates on a given type, as well as the equivalence or disjointness of given resources and classes.

OWL - Validation with OWL DL

OWL DL implements a description logic, selected for its theoretical computability. It is mostly used to inform designers when their conception of a class is theoretically unsatisfiable. It is also used to test instance data to determine if elements have mutually incompatible types, both datatypes and OWL classes. The open-world makes it somewhat more tedious to declare valid forms than in e.g. XML Schema.

@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix : <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/'> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .


:Issue a owl:Class ;
    rdfs:subClassOf
        [ owl:onProperty :state        ; owl:cardinality 1 ] ,
        [ owl:onProperty :reportedBy   ; owl:cardinality 1 ] ,
        [ owl:onProperty :reportedOn   ; owl:cardinality 1 ] ,
        [ owl:onProperty :reproducedBy ; owl:minCardinality 0 ] ,
        [ owl:onProperty :reproducedOn ; owl:minCardinality 0 ] ,
        [ owl:onProperty :related      ; owl:minCardinality 0 ] .

:state        a owl:ObjectProperty ,
                owl:FunctionalProperty ; rdfs:domain :Issue ; rdfs:range :ValidState .
:related      a owl:ObjectProperty     ; rdfs:domain :Issue ; rdfs:range :Issue .
:reportedBy   a owl:ObjectProperty     ; rdfs:domain :Issue ; rdfs:range foaf:Person .
:reportedOn   a owl:DatatypeProperty   ; rdfs:domain :Issue ; rdfs:range xsd:dateTime .
:reproducedBy a owl:ObjectProperty     ; rdfs:domain :Issue ; rdfs:range foaf:Person .
:reproducedOn a owl:DatatypeProperty   ; rdfs:domain :Issue ; rdfs:range xsd:dateTime .
:ValidState owl:oneOf ( :unassigned :assigned ) .

foaf:Person rdfs:subClassOf foaf:Agent ;
    rdfs:subClassOf
        [ owl:onProperty foaf:givenName  ; owl:minCardinality 1 ] ,
        [ owl:onProperty foaf:familyName ; owl:cardinality 1 ] ,
        [ owl:onProperty foaf:phone      ; owl:minCardinality 0 ] ,
        [ owl:onProperty foaf:mbox       ; owl:cardinality 1 ] .

foaf:givenName  a owl:DatatypeProperty ; rdfs:domain foaf:Person ; rdfs:range xsd:string .
foaf:familyName a owl:DatatypeProperty ; rdfs:domain foaf:Person ; rdfs:range xsd:string .
foaf:phone      a owl:DatatypeProperty ; rdfs:domain foaf:Person ; rdfs:range xsd:anyURI .
foaf:mbox       a owl:ObjectProperty   ; rdfs:domain foaf:Person ; rdfs:range rdfs:Resource .


[ a owl:AllDisjointClasses ;
  owl:members ( :Issue foaf:Agent ) ] .

[ a owl:AllDisjointProperties ;
  owl:members ( :state :related
                :reportedBy :reportedOn
                :reproducedOn ) ] .

[ a owl:AllDifferent ;
  owl:members ( <issue3> <issue4>  <issue7>
                <user2> <user6>
                :unassigned :assigned :unsinged ) ] .

The above OWL axioms catch two of the identified validation errors:

  • <issue7> :reportedBy <user6> , <user2> .
    :Issue expects exactly 1 :reportedBy . <user6> and <user2> are different individuals.
  • <issue4> :state :unsinged .
    :state has a range of ( :unassigned :assigned ) . :unsinged , :unassigned and :assigned are different individuals.

OWL - Validation with UNA and Closed-World

The majority of RDF data presumes that different identifiers imply different entities. Further, a validation must be performed on a given set of inputs so validation may be considered "closed-world" at that point. Any requirements on information not expected to be complete at that point would be written out of that validation, perhaps to be included in a later validation.

Tools like TrOWL and Stardog offer this sort of reasoning. Applied to the above example, they'd find errors like:

Evren Sirin (Clark & Parsia) did a thorough workup up this example which shows the validation output from Stardog.

Resource Shapes

OSLC Resource Shape provides a vocabulary for describing the properties of typed nodes.

@prefix ex: <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> .
@prefix rs: <http://www.w3.org/2012/12/rdf-val/SOTA-RS#> .
@prefix : <http://open-services.net/ns/core#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/'> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<#IssueShape> a :ResourceShape ;
    :property [
        :name "state" ; :propertyDefinition ex:state ;
        :valueType xsd:string ;
        :occurs :Exactly-one ;
    ] ;

    :property [
        :name "reportedBy" ; :propertyDefinition ex:reportedBy ;
        :range foaf:Person ;
        :valueShape <#UserShape> ;
        :occurs :Exactly-one ;
        :representation :Either 
    ] ;

    :property [
        :name "reportedOn" ; :propertyDefinition ex:reportedOn ;
        :valueType xsd:dateTime ;
        :occurs :Exactly-one ;
    ] ;

    :property [
        :name "reproducedBy" ; :propertyDefinition ex:reproducedBy ;
        :range foaf:Person ;
        :valueShape <#UserShape> ;
        :occurs :Zero-or-many
        :representation :Either 
    ] ;

    :property [
        :name "reproducedOn" ; :propertyDefinition ex:reproducedOn ;
        :valueType xsd:dateTime ;
        :occurs :Zero-or-many
    ] ;

    :property [
        :name "related" ; :propertyDefinition ex:related ;
        :range ex:Issue ;
        :valueShape <#IssueShape> ;
        :occurs :Zero-or-many
        :representation :Either 
    ] ;
.
<#UserShape> a :ResourceShape ;
    :property [
        :name "givenName" ; :propertyDefinition foaf:givenName ;
        :valueType xsd:string ;
        :occurs :One-or-many ;
    ] ;

    :property [
        :name "familyName" ; :propertyDefinition foaf:familyName ;
        :valueType xsd:string ;
        :occurs :Exactly-one ;
    ] ;

    :property [
        :name "phone" ; :propertyDefinition foaf:phone ;
        :range rdf:Resource ;
        :occurs :Zero-or-many ;
    ] ;

    :property [
        :name "mbox" ; :propertyDefinition foaf:mbox ;
        :range rdf:Resource ;
        :occurs :Exactly-one ;
    ] ;
.

Note that the range identifies the RDF type of the object of a property while its valueShape locates the Resource Shape. This permits re-use of common vocabularies, e.g. FOAF for users, and even context-sensitive rules, for instance if the user who reproduced an issue had different validation constraints than the user who reported one.

SPARQL Inferencing Notation (SPIN)

The SPIN's (spin:constraint) connects a class to a query which validates an instance of that class, e.g. this excerpt from Holger Knublauch's write up of the SPIN validating this use case:

:Issue a owl:Class ;
  spin:constraint [ a sp:Ask ;
      sp:text """# Issue was reproduced before being reported
ASK WHERE {
    ?this <http://www.w3.org/2012/12/rdf-val/SOTA-ex#reproducedOn> ?reproducedOn .
    ?this <http://www.w3.org/2012/12/rdf-val/SOTA-ex#reportedOn> ?reportedOn .
    FILTER (?reproducedOn < ?reportedOn) .
}"""^^xsd:string ;             # returning TRUE signals an error.
    ] ;
  spin:constraint [ a spl:ObjectCountPropertyConstraint ;
      arg:maxCount 1 ;
      arg:property :reportedBy ;
    ] ;
        

The constraint on :reporetedBy at the bottom is a providing additional cardinality constraints. See spl:Attribute and spl:ObjectCountPropertyConstraint in the supporting library, e.g.

ASK {
  { FILTER (
      ?minCount &&
      spin:objectCount(spin:_this, ?predicate) < ?minCount) }
  UNION
  { FILTER (
      ?maxCount &&
      spin:objectCount(spin:_this, ?predicate) > ?maxCount) }
  UNION
  { FILTER (BOUND(?valuetype)) 
    spin:_this ?predicate ?value
    FILTER (spl:instanceOf(?value, ?valueType)) }
}        

A SPIN engine may then evaluate those constraint checks for one or all instances of those classes via sets of rules like:

# Assign a type arc where there is none.
rdfs:Resource
  spin:rule [
      rdf:type sp:Construct ;
      sp:text """CONSTRUCT {
    ?instance a ?domain .
}
WHERE {
    ?property <http://www.w3.org/2000/01/rdf-schema#domain> ?domain .
    ?instance ?property ?anyValue .
    FILTER NOT EXISTS {
        ?instance a ?anyType .
    } .
}"""^^xsd:string ;
    ] ;
.
        

These can be combined with OWL restrictions interpreted with closed world and unique name assumptions (see the earlier discussion of OWL - Validation with UNA and Closed-World).

:state a owl:ObjectProperty ;
  rdfs:domain :Issue ;
  rdfs:label "state"^^xsd:string ;
  rdfs:range :ValidState .

:ValidState a owl:Class ; rdfs:subClassOf owl:Thing .
:assigned a :ValidState .
:unassigned a :ValidState .
        

Specialized Grammar

It is theoretically possible, though yet unimplemented, to define RDF patterns as a BNF-like grammar. RelaxNG compact syntax demonstrates the applicability of such a grammar to a non-character-based model. Such a syntax could capture the expressivity of something like Resource Shapes. Here's an example inspired by RelaxNG Compact Syntax and SPARQL.

PREFIX ex: <http://www.w3.org/2012/12/rdf-val/SOTA-ex#>
PREFIX rs: <http://www.w3.org/2012/12/rdf-val/SOTA-RS#>
PREFIX: <http://open-services.net/ns/core#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/'>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

IssueShape (
    ex:state ( xsd:string )
    ex:reportedBy ( foaf:Person @UserShape )
    ex:reportedOn ( xsd:dateTime )
    ex:reproducedBy ( foaf:Person @UserShape )*
    ex:reproducedOn ( xsd:dateTime )*
    ex:related ( ex:Issue @IssueShape )*
)

UserShape (
    foaf:givenName ( xsd:string )+
    foaf:familyName ( xsd:string )
    foaf:phone ( rdf:Resource )*
    foaf:mbox ( rdf:Resource )
)

possible features

Here are some ideas for features that people have mentioned so far:

RDF Schema with no Inference

RDFS defines the types expected for the subject or object of a predicate. It would be possible to insist that all nodes in "valid" documents list their types. This approach of validation would report an error every time RDF Schema inference added a new type. This would make some errors moderately visible to the user.

Eyeball

While schema-alone doesn't tell us about missing properties, we can assume the schema to be a complete list of the available terms. Eyeball uses parsed schemas to spell-check the properties and class names in an RDF graph. Eyeball also sanity checks the schema implied by the document for IRI validity and reachability (e.g. no file: URLs), language tag validity, and adherence to prefix conventions (don't use dc: to refer to FOAF).

Notes

References

Change Log

Raw CVS log:


    $Log: SOTA.html,v $
    Revision 1.27  2014-07-23 08:47:32  eric
    + grounding of Holger's text from 01 Jul 2013

    Revision 1.26  2014-07-01 14:41:52  eric
    ~ typo per mid:898698235.575556.1404224901187.open-xchange@oxweb03.eigbox.net

    Revision 1.25  2014-07-01 10:09:08  eric
    + anchor for Stardog output

    Revision 1.24  2014-04-20 21:37:00  eric
    ~ jose labra gayo said the crow's feet were reversed

    Revision 1.23  2013-09-08 07:55:24  eric
    + Eyeball

    Revision 1.22  2013-06-19 09:53:24  eric
    ...

    Revision 1.21  2013-06-19 09:52:50  eric
    ...

    Revision 1.20  2013-05-24 15:12:30  eric
    ~ simplify the validation DSL

    Revision 1.19  2013-05-17 14:57:28  eric
    ~ mid:20130517144957.GI13487@w3.org

    Revision 1.18  2013-05-16 20:28:22  eric
    ...

    Revision 1.17  2013-05-16 20:26:33  eric
    ~ incorporated Jiao Tao's input on OWL+CWA+UNA -- mid:CAMt86XY6bvVh5b8X9xME89EXJRFGw-dn8r5LytsPr0uXrRXwng@mail.gmail.com

    Revision 1.16  2013-04-30 11:37:00  eric
    ~ fixed colliding id

    Revision 1.15  2013-04-25 12:02:30  eric
    ~ fix anchors

    Revision 1.14  2013-04-07 05:33:20  eric
    + Specialized Grammar example

    Revision 1.13  2013-04-06 23:41:09  eric
    ~ anchor for SPARQL validation results

    Revision 1.12  2013-04-06 23:33:36  eric
    + Resource Shapes example

    Revision 1.11  2013-03-29 19:56:00  eric
    ~ feedback from Tom Baker mid:20130329145600.GA25399@julius

    Revision 1.10  2013-03-26 17:32:02  eric
    ~ clarify that SOTA does not limit scope

    Revision 1.9  2013-03-20 21:35:55  eric
    + BIBFRAME reference

    Revision 1.8  2013-03-20 21:30:10  eric
    + DC Application Profile
    + DC Singapore Framework

    Revision 1.7  2013-03-14 11:58:41  eric
    ~ line up turtle

    Revision 1.6  2013-03-14 11:51:17  eric
    ~ line up turtle

    Revision 1.5  2013-03-12 19:56:11  eric
    + more references

    Revision 1.4  2013-03-12 18:50:56  eric
    + OWL example

    Revision 1.3  2013-03-12 15:36:54  eric
    ~ spruced up SPARQL validation to serve as a template (and to show complexity)

    Revision 1.2  2013-03-12 13:19:45  eric
    + examples

    Revision 1.1  2012-12-11 17:38:41  eric
    CREATED

    Revision 1.6  2012-12-03 18:22:50  eric
    + placeholder for SPARQL ASK

    Revision 1.5  2012-12-03 18:19:39  eric
    ~ moved SOTD to top

    Revision 1.4  2012-12-03 17:53:56  eric
    ...