Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
Most data representation languages used in conventional settings offer some sort of input validation, ranging from parsing grammars for domain-specific languages to XML Schema or RelaxNG for XML structures. The open world constraints placed on RDF languages make validation difficult and less complete than their counterparts in other data formats. A variety of approaches exists to somewhat address this, and further development of validation tools and protocols could greatly enhance the uptake of RDF.
This document is intended to provide ideas and inspiration for the W3C Workshop on RDF Validation. The use cases, techniques and technologies listed here do not constraint the Workshop.
This is a draft by W3C staff members. This document is not endorsed by the W3C or its member companies.
Like XML, RDF has schema languages to describe the structure of RDF instance data. Unlike XML Schema, RDF Schema is generally interpreted as supplementing rather than validating RDF data.
Taking as a use case an issue tracking database, we have interrelated issues reported by people. A simple class model uses one common vocabulary (FOAF) and one domain-specific vocabulary:
Sample instance data in Turtle represention (with errors) will help illustrate our validationr requirements:
@prefix : <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> . @prefix foaf: <http://xmlns.com/foaf/0.1/'> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <issue7> a :Issue , :SecurityIssue ; :state :unassigned ; :reportedBy <user6> , <user2> ; # only one reportedBy permitted :reportedOn "2012-12-31T23:57:00"^^xsd:dateTime ; :reproducedBy <user2>, <user1> ; :reproducedOn "2012-11-31T23:57:00"^^xsd:dateTime ; # reproduced before being reported :related <issue4>, <issue3>, <issue2> . # referenced issues not included <issue4> # a ??? - missing type arc :state :unsinged ; # misspelled term in value set. # :reportedBy ??? - missing required property :reportedOn "2012-12-31T23:57:00"^^xsd:dateTime . <user2> a foaf:Person ; foaf:givenName "Alice" ; foaf:familyName "Smith" ; foaf:phone <tel:+1.555.222.2222> ; foaf:mbox <mailto:alice@example.com> . <user6> a foaf:Agent ; # should be foaf:Person foaf:givenName "Bob" ; # foaf:familyName "???" - missing required property foaf:phone <tel:+.555.222.2222> ; # malformed tel: URL foaf:mbox <mailto:alice@example.com> .
The above errors include:
In the open world, there could always be more information supplying properties or referents. With languages like OWL, it's even possible that a value could be asserted to be equivalent to one in a value set. Many validation use cases require closing the world and reporting errors over the information provided to the validator.
Different approaches to validation will result in different value in terms of expressivity, simplicity and predictability. Below are various ways to represent and enforce the "valid" schema.
SPARQL has been used to validate data, e.g. to test the results of parsing RDFa (see RDFa Test Harness). This SPARQL query produces a table showing the validation results testing a representative sample of the identified validation errors:
PREFIX : <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> PREFIX foaf: <http://xmlns.com/foaf/0.1/'> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT ?issue (if(BOUND(?t), "passed", "missing") AS ?typeArc) (if(BOUND(?state) && (?state=:unassigned || ?state=:assigned), "passed", "invalid") AS ?stateValue) (if(BOUND(?reportedBy), "passed", "missing") AS ?reportedByArc) (if(BOUND(?reportedOn), "passed", "missing") AS ?reportedOnArc) (if(!BOUND(?reportedByCount), "expected 1, got 0", if(?reportedByCount=1, "passed", CONCAT("expected 1, got ", STR(?reportedByCount)))) AS ?reportedByArcCount) (if(!BOUND(?reproducedOn) || ?reproducedOn > ?reportedOn, "passed", "bad sequence") AS ?reproducedOnSequence) (if(BOUND(?reportedOn), "passed", "missing") AS ?reportedOnArc) (if(BOUND(?missingRelatedIssuesStr), ?missingRelatedIssuesStr, "passed") AS ?missingRelatedIssues) WHERE { # Get all viable :Issues by use of related predicates. { SELECT DISTINCT ?issue WHERE { { ?issue a :Issue } UNION { ?issue :reportedBy|:reportedOn|:reproducedBy|:reproducedOn|:related ?rprt } } } # Test for a type arc and state. OPTIONAL { ?issue a ?t FILTER (?t = :Issue) } OPTIONAL { ?issue :state ?state } # Must have 1 reportedBy. OPTIONAL { SELECT ?issue (SAMPLE(?reportedBy1) AS ?reportedBy) (COUNT(?reportedBy1) AS ?reportedByCount) WHERE { OPTIONAL { ?issue :reportedBy ?reportedBy1 } } GROUP BY ?issue } OPTIONAL { ?issue :reportedOn ?reportedOn } OPTIONAL { ?issue :reproducedBy ?reproducedBy } OPTIONAL { ?issue :reproducedOn ?reproducedOn } # All :related issues must be known entities. OPTIONAL { SELECT ?issue (GROUP_CONCAT(CONCAT("<", STR(?referent), ">")) AS ?missingRelatedIssuesStr) { # List of missing issues related to ?issue. SELECT ?issue ?referent (SUM(if(BOUND(?referentP), 1, 0)) AS ?referentCount) WHERE { ?issue :related ?referent OPTIONAL { ?referent ?referentP ?referentO } } GROUP BY ?issue ?referent HAVING (SUM(if(BOUND(?referentP), 1, 0)) = 0) } GROUP BY ?issue } }
The query results associates pass/fail/error messages with validation tests for each tested entity:
?issue | ?typeArc | ?stateValue | ?reportedByArc | ?reportedOnArc | ?reportedByArcCount | ?reproducedOnSequence | ?reportedOnArc | ?missingRelatedIssues |
---|---|---|---|---|---|---|---|---|
<issue7> | "passed" | "passed" | "passed" | "passed" | "expected 1, got 2" | "bad unsequence" | "passed" | "<issue3> <issue2>" |
<issue4> | "missing" | "invalid" | "passed" | "passed" | "expected 1, got 0" | "passed" | "passed" | "passed" |
The Web Ontology Language offers a fairly complex language for declaring restrictions on the use of predicates on a given type, as well as the equivalence or disjointness of given resources and classes.
OWL DL implements a description logic, selected for its theoretical computability. It is mostly used to inform designers when their conception of a class is theoretically unsatisfiable. It is also used to test instance data to determine if elements have mutually incompatible types, both datatypes and OWL classes. The open-world makes it somewhat more tedious to declare valid forms than in e.g. XML Schema.
@prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix : <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> . @prefix foaf: <http://xmlns.com/foaf/0.1/'> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . :Issue a owl:Class ; rdfs:subClassOf [ owl:onProperty :state ; owl:cardinality 1 ] , [ owl:onProperty :reportedBy ; owl:cardinality 1 ] , [ owl:onProperty :reportedOn ; owl:cardinality 1 ] , [ owl:onProperty :reproducedBy ; owl:minCardinality 0 ] , [ owl:onProperty :reproducedOn ; owl:minCardinality 0 ] , [ owl:onProperty :related ; owl:minCardinality 0 ] . :state a owl:ObjectProperty , owl:FunctionalProperty ; rdfs:domain :Issue ; rdfs:range :ValidState . :related a owl:ObjectProperty ; rdfs:domain :Issue ; rdfs:range :Issue . :reportedBy a owl:ObjectProperty ; rdfs:domain :Issue ; rdfs:range foaf:Person . :reportedOn a owl:DatatypeProperty ; rdfs:domain :Issue ; rdfs:range xsd:dateTime . :reproducedBy a owl:ObjectProperty ; rdfs:domain :Issue ; rdfs:range foaf:Person . :reproducedOn a owl:DatatypeProperty ; rdfs:domain :Issue ; rdfs:range xsd:dateTime . :ValidState owl:oneOf ( :unassigned :assigned ) . foaf:Person rdfs:subClassOf foaf:Agent ; rdfs:subClassOf [ owl:onProperty foaf:givenName ; owl:minCardinality 1 ] , [ owl:onProperty foaf:familyName ; owl:cardinality 1 ] , [ owl:onProperty foaf:phone ; owl:minCardinality 0 ] , [ owl:onProperty foaf:mbox ; owl:cardinality 1 ] . foaf:givenName a owl:DatatypeProperty ; rdfs:domain foaf:Person ; rdfs:range xsd:string . foaf:familyName a owl:DatatypeProperty ; rdfs:domain foaf:Person ; rdfs:range xsd:string . foaf:phone a owl:DatatypeProperty ; rdfs:domain foaf:Person ; rdfs:range xsd:anyURI . foaf:mbox a owl:ObjectProperty ; rdfs:domain foaf:Person ; rdfs:range rdfs:Resource . [ a owl:AllDisjointClasses ; owl:members ( :Issue foaf:Agent ) ] . [ a owl:AllDisjointProperties ; owl:members ( :state :related :reportedBy :reportedOn :reproducedOn ) ] . [ a owl:AllDifferent ; owl:members ( <issue3> <issue4> <issue7> <user2> <user6> :unassigned :assigned :unsinged ) ] .
The above OWL axioms catch two of the identified validation errors:
The majority of RDF data presumes that different identifiers imply different entities. Further, a validation must be performed on a given set of inputs so validation may be considered "closed-world" at that point. Any requirements on information not expected to be complete at that point would be written out of that validation, perhaps to be included in a later validation.
Tools like TrOWL and Stardog offer this sort of reasoning. Applied to the above example, they'd find errors like:
:reportedBy
):issue2
):unassigned
):reportedBy
, missing foaf:familyName
) and type arcs (e.g. foaf:Agent)Evren Sirin (Clark & Parsia) did a thorough workup up this example which shows the validation output from Stardog.
OSLC Resource Shape provides a vocabulary for describing the properties of typed nodes.
@prefix ex: <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> . @prefix rs: <http://www.w3.org/2012/12/rdf-val/SOTA-RS#> . @prefix : <http://open-services.net/ns/core#> . @prefix foaf: <http://xmlns.com/foaf/0.1/'> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <#IssueShape> a :ResourceShape ; :property [ :name "state" ; :propertyDefinition ex:state ; :valueType xsd:string ; :occurs :Exactly-one ; ] ; :property [ :name "reportedBy" ; :propertyDefinition ex:reportedBy ; :range foaf:Person ; :valueShape <#UserShape> ; :occurs :Exactly-one ; :representation :Either ] ; :property [ :name "reportedOn" ; :propertyDefinition ex:reportedOn ; :valueType xsd:dateTime ; :occurs :Exactly-one ; ] ; :property [ :name "reproducedBy" ; :propertyDefinition ex:reproducedBy ; :range foaf:Person ; :valueShape <#UserShape> ; :occurs :Zero-or-many :representation :Either ] ; :property [ :name "reproducedOn" ; :propertyDefinition ex:reproducedOn ; :valueType xsd:dateTime ; :occurs :Zero-or-many ] ; :property [ :name "related" ; :propertyDefinition ex:related ; :range ex:Issue ; :valueShape <#IssueShape> ; :occurs :Zero-or-many :representation :Either ] ; .
<#UserShape> a :ResourceShape ; :property [ :name "givenName" ; :propertyDefinition foaf:givenName ; :valueType xsd:string ; :occurs :One-or-many ; ] ; :property [ :name "familyName" ; :propertyDefinition foaf:familyName ; :valueType xsd:string ; :occurs :Exactly-one ; ] ; :property [ :name "phone" ; :propertyDefinition foaf:phone ; :range rdf:Resource ; :occurs :Zero-or-many ; ] ; :property [ :name "mbox" ; :propertyDefinition foaf:mbox ; :range rdf:Resource ; :occurs :Exactly-one ; ] ; .
Note that the range
identifies the RDF type of the object of a property while its valueShape
locates the Resource Shape.
This permits re-use of common vocabularies, e.g. FOAF for users, and even context-sensitive rules, for instance if the user who reproduced an issue had different validation constraints than the user who reported one.
The SPIN's (spin:constraint
) connects a class to a query which validates an instance of that class, e.g. this excerpt from Holger Knublauch's write up of the SPIN validating this use case:
:Issue a owl:Class ;
spin:constraint [ a sp:Ask ;
sp:text """# Issue was reproduced before being reported
ASK WHERE {
?this <http://www.w3.org/2012/12/rdf-val/SOTA-ex#reproducedOn> ?reproducedOn .
?this <http://www.w3.org/2012/12/rdf-val/SOTA-ex#reportedOn> ?reportedOn .
FILTER (?reproducedOn < ?reportedOn) .
}"""^^xsd:string ; # returning TRUE signals an error.
] ;
spin:constraint [ a spl:ObjectCountPropertyConstraint ;
arg:maxCount 1 ;
arg:property :reportedBy ;
] ;
The constraint on :reporetedBy
at the bottom is a providing additional cardinality constraints. See spl:Attribute
and spl:ObjectCountPropertyConstraint
in the supporting library, e.g.
ASK { { FILTER ( ?minCount && spin:objectCount(spin:_this, ?predicate) < ?minCount) } UNION { FILTER ( ?maxCount && spin:objectCount(spin:_this, ?predicate) > ?maxCount) } UNION { FILTER (BOUND(?valuetype)) spin:_this ?predicate ?value FILTER (spl:instanceOf(?value, ?valueType)) } }
A SPIN engine may then evaluate those constraint checks for one or all instances of those classes via sets of rules like:
# Assign a type arc where there is none.
rdfs:Resource
spin:rule [
rdf:type sp:Construct ;
sp:text """CONSTRUCT {
?instance a ?domain .
}
WHERE {
?property <http://www.w3.org/2000/01/rdf-schema#domain> ?domain .
?instance ?property ?anyValue .
FILTER NOT EXISTS {
?instance a ?anyType .
} .
}"""^^xsd:string ;
] ;
.
These can be combined with OWL restrictions interpreted with closed world and unique name assumptions (see the earlier discussion of OWL - Validation with UNA and Closed-World).
:state a owl:ObjectProperty ; rdfs:domain :Issue ; rdfs:label "state"^^xsd:string ; rdfs:range :ValidState . :ValidState a owl:Class ; rdfs:subClassOf owl:Thing . :assigned a :ValidState . :unassigned a :ValidState .
It is theoretically possible, though yet unimplemented, to define RDF patterns as a BNF-like grammar. RelaxNG compact syntax demonstrates the applicability of such a grammar to a non-character-based model. Such a syntax could capture the expressivity of something like Resource Shapes. Here's an example inspired by RelaxNG Compact Syntax and SPARQL.
PREFIX ex: <http://www.w3.org/2012/12/rdf-val/SOTA-ex#> PREFIX rs: <http://www.w3.org/2012/12/rdf-val/SOTA-RS#> PREFIX: <http://open-services.net/ns/core#> PREFIX foaf: <http://xmlns.com/foaf/0.1/'> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> IssueShape ( ex:state ( xsd:string ) ex:reportedBy ( foaf:Person @UserShape ) ex:reportedOn ( xsd:dateTime ) ex:reproducedBy ( foaf:Person @UserShape )* ex:reproducedOn ( xsd:dateTime )* ex:related ( ex:Issue @IssueShape )* ) UserShape ( foaf:givenName ( xsd:string )+ foaf:familyName ( xsd:string ) foaf:phone ( rdf:Resource )* foaf:mbox ( rdf:Resource ) )
Here are some ideas for features that people have mentioned so far:
RDFS defines the types expected for the subject or object of a predicate. It would be possible to insist that all nodes in "valid" documents list their types. This approach of validation would report an error every time RDF Schema inference added a new type. This would make some errors moderately visible to the user.
While schema-alone doesn't tell us about missing properties, we can assume the schema to be a complete list of the available terms. Eyeball uses parsed schemas to spell-check the properties and class names in an RDF graph. Eyeball also sanity checks the schema implied by the document for IRI validity and reachability (e.g. no file: URLs), language tag validity, and adherence to prefix conventions (don't use dc: to refer to FOAF).
Raw CVS log:
$Log: SOTA.html,v $ Revision 1.28 2014-09-02 00:26:53 eric ~ fix link Revision 1.27 2014-07-23 08:47:32 eric + grounding of Holger's text from 01 Jul 2013 Revision 1.26 2014-07-01 14:41:52 eric ~ typo per mid:898698235.575556.1404224901187.open-xchange@oxweb03.eigbox.net Revision 1.25 2014-07-01 10:09:08 eric + anchor for Stardog output Revision 1.24 2014-04-20 21:37:00 eric ~ jose labra gayo said the crow's feet were reversed Revision 1.23 2013-09-08 07:55:24 eric + Eyeball Revision 1.22 2013-06-19 09:53:24 eric ... Revision 1.21 2013-06-19 09:52:50 eric ... Revision 1.20 2013-05-24 15:12:30 eric ~ simplify the validation DSL Revision 1.19 2013-05-17 14:57:28 eric ~ mid:20130517144957.GI13487@w3.org Revision 1.18 2013-05-16 20:28:22 eric ... Revision 1.17 2013-05-16 20:26:33 eric ~ incorporated Jiao Tao's input on OWL+CWA+UNA -- mid:CAMt86XY6bvVh5b8X9xME89EXJRFGw-dn8r5LytsPr0uXrRXwng@mail.gmail.com Revision 1.16 2013-04-30 11:37:00 eric ~ fixed colliding id Revision 1.15 2013-04-25 12:02:30 eric ~ fix anchors Revision 1.14 2013-04-07 05:33:20 eric + Specialized Grammar example Revision 1.13 2013-04-06 23:41:09 eric ~ anchor for SPARQL validation results Revision 1.12 2013-04-06 23:33:36 eric + Resource Shapes example Revision 1.11 2013-03-29 19:56:00 eric ~ feedback from Tom Baker mid:20130329145600.GA25399@julius Revision 1.10 2013-03-26 17:32:02 eric ~ clarify that SOTA does not limit scope Revision 1.9 2013-03-20 21:35:55 eric + BIBFRAME reference Revision 1.8 2013-03-20 21:30:10 eric + DC Application Profile + DC Singapore Framework Revision 1.7 2013-03-14 11:58:41 eric ~ line up turtle Revision 1.6 2013-03-14 11:51:17 eric ~ line up turtle Revision 1.5 2013-03-12 19:56:11 eric + more references Revision 1.4 2013-03-12 18:50:56 eric + OWL example Revision 1.3 2013-03-12 15:36:54 eric ~ spruced up SPARQL validation to serve as a template (and to show complexity) Revision 1.2 2013-03-12 13:19:45 eric + examples Revision 1.1 2012-12-11 17:38:41 eric CREATED Revision 1.6 2012-12-03 18:22:50 eric + placeholder for SPARQL ASK Revision 1.5 2012-12-03 18:19:39 eric ~ moved SOTD to top Revision 1.4 2012-12-03 17:53:56 eric ...