LDOM Algorithm

From RDF Data Shapes Working Group
Jump to: navigation, search

This is a first attempt at formalizing how LDOM works internally - for now very informal using pseudo-code.

Overview

LDOM currently supports three main operations:

  • checkNode(?node): Returns all constraint violations for a given node
  • checkNodeAgainstShape(?node, ?shape): Violations for a node against a shape
  • checkGraph(): Returns all constraint violations for the current graph

Implicit arguments into those operations are

  • A dataset (SPARQL named graphs)
  • The default graph to operate on
  • A ShapeSelector providing ?shapeSelectionPath and ?shapeExtensionPath
  • the included ldom:Contexts (defaults to the default context)
  • the excluded ldom:Contexts (defaults to empty)

(TODO: There is some logic to clarify that takes the default graph and expands its ldom:includes and ldom:libraries so that all relevant constraint and class declarations are visible to the engine).

Result of all operations is a new graph containing instances of ldom:ConstraintViolation, possibly with details such as ldom:message, ldom:root, ldom:path and rdf:types such as ldom:Error and ldom:Warning.

There is a set called ?constraintProperties consisting of

  • ldom:constraint
  • ldom:property
  • ldom:inverseProperty
  • ldom:argument

Some other things that SPARQL engines need to implement:

  • Ability to execute ldom:Functions (recursive SPARQL queries), details to be written up
  • Some built-in functions, esp ldom:hasShape, which maps to checkNodeAgainstShape here.

checkNode

Arguments:

  •  ?node - the RDF node to check the constraints of

   forEach ?shape matching (?node ?shapeSelectionPath ?shape)
       checkNodeAgainstShape(?node, ?shape) -> add to result graph

checkNodeAgainstShape

Arguments:

  •  ?node - the RDF node to check the constraints of
  •  ?shape - the class/shape defining the constraints to check

   forEach ?constraintProperty in ?constraintProperties
       forEach ?s := ?shape and its super-shapes found via ?shapeExtensionPath
           forEach ?constraint := ?constraintProperty at ?s
               checkNodeAgainstConstraint(?node, ?constraint) -> add to result graph

checkNodeAgainstConstraint

Arguments:

  •  ?node - the RDF node to check the constraint of
  •  ?constraint - the LDOM constraint to check
  •  ?includedContexts - (optional) the list of ldom:Contexts to include
  •  ?excludedContexts - (optional) the list of ldom:Contexts to exclude

   TODO: details on how the SPARQL query is producing the ldom:ConstraintViolations.
   This is rather straight-forward code that involves
   - don't do anything if the ldom:context of ?constraint does not match ?included/?excludedContexts
   - if ?constraint is a template, use the ldom:sparql from the template
   - otherwise use the ldom:sparql of the ?constraint directly (native SPARQL constraint)
   - pre-binding of ?node to the variable ?this
   - for ASK queries: if true then produce violation, using fields like ldom:message
   - for SELECT queries: for each result set row, product violation
   - for CONSTRUCT queries: just copy all constructed triples into the result graph

checkGraph

   forEach ?shape that has any constraints (via the ?constraintProperties)
       forEach ?constraint defined at ?shape
           checkConstraintForShape(?shape, ?constraint)
   
   checkGlobalConstraints()

checkConstraintForShape

Arguments:

  •  ?shape - the shape holding the constraints
  •  ?constraint - the ldom:Constraint to check
  •  ?includedContexts - (optional) the list of ldom:Contexts to include
  •  ?excludedContexts - (optional) the list of ldom:Contexts to exclude

   if the ldom:context of ?constraint matches ?included/?excludedContexts
       forEach ?s that is in the transitive* ?shapeExtensionPath of ?shape
           TODO: Take the SPARQL query (or queries if it's a template with superclasses)
           and execute each SPARQL query, injecting a clause to bind ?this to all
           ?shapeSelectionPath of the current ?s.
           ASK and SELECT queries are converted to CONSTRUCT, then run the CONSTRUCTs

checkGlobalConstraints

   forEach ?constraint := instance of ldom:GlobalConstraint
       See checkConstraintForClass, but without the binding of ?this and
       without depending on instances at all.

Example: checkGraph

Assuming we have the following graph:

   ex:SuperClass
       a rdfs:Class ;
       ldom:property [
           ldom:predicate ex:property ;
           ldom:maxInclusive 10 ;
       ] .
   
   ex:SubClass
       a rdfs:Class ;
       rdfs:subClassOf ex:SuperClass ;
       ldom:constraint [
           ldom:message "Instances of SubClass must not be blank nodes." ;
           ldom:sparql "SELECT (?this AS ?root) WHERE { FILTER isBlank(?this) }" ;
       ] .
   ex:Instance
       a ex:SubClass ;
       ex:property 8 .

In this example, ?shapeSelectionProperty = rdf:type and ?shapeExtensionProperty is rdfs:subClassOf.

The operation to check the whole graph would walk through all classes that have constraints attached to them. For each of those classes (and its subclasses) that have instances, it would (roughly) execute the following two queries. The first of those is the ldom:sparql query behind the ldom:maxInclusive check. It would produce ldom:Error objects for each matching row of the SPARQL queries:

   # ?maxInclusive is pre-bound to 10
   # ?predicate is pre-bound to ex:property
   SELECT (?this AS ?root) (?predicate AS ?path) ?value ?message
   WHERE {
       ?this a ex:SubClass .
       ?this ?predicate ?value .
       FILTER (?value > ?maxInclusive) .
       BIND (CONCAT("The values of ", ldom:propertyLabel(?predicate, ?this),
           " must be less than or equal to ", ldom:label(?maxInclusive),
           " but found ", ldom:label(?value)) AS ?message) .
   }
   SELECT ("Instances of SubClass must not be blank nodes" AS ?message) (?this AS ?root)
   WHERE {
       ?this a ex:SubClass .
       FILTER isBlank(?this) .
   }

Example: checkNode

If someone calls checkNode(ex:Instance) then it would basically create the same queries, but instead of injecting the first line of the WHERE clauses, it would pre-bind ?this with ex:Instance:

   # ?this is pre-bound to ex:Instance
   # ?maxInclusive is pre-bound to 10
   # ?predicate is pre-bound to ex:property
   SELECT (?this AS ?root) (?predicate AS ?path) ?value ?message
   WHERE {
       ?this ?predicate ?value .
       FILTER (?value > ?maxInclusive) .
       BIND (CONCAT("The values of ", ldom:propertyLabel(?predicate, ?this),
           " must be less than or equal to ", ldom:label(?maxInclusive),
           " but found ", ldom:label(?value)) AS ?message) .
   }
   # ?this is pre-bound to ex:Instance
   SELECT ("Instances of SubClass must not be blank nodes" AS ?message) (?this AS ?root)
   WHERE {
       FILTER isBlank(?this) .
   }