ValidationPseudoCode

From Semantic Web Standards

Obsolete - please see ShEx Semantics


SHEX validates its input by validating the subjects in RDF against the SHAPES.

This document is based on [1],[2],[3].

The matching of an Subject against a Shape is based on the behavior used in a regular expression, with the following differences (Eric)

  1. Shape Expression rules match elements in a unordered set (of triples) while regular expressions match elements in a ordered sequence (of characters). Since no order exists in the object, are all matching triples and associated objects of the subject matched and not only the next character in the sequence.
  2. Arc rules in Shape Expressions discriminate between "not found" and "found with the wrong value", whereas in a regular expression there is only "not found"
NOTE: definition for root element has to be added
NOTE: definition for closed/open shapes has to be added
NOTE: semantic actions has still to be added 

Structure overview

The complete SHEX template contain one or more Shapes.

A shape is a 'And' 'Rule group' A shape can only be referred by a Shape Property whereas a 'Rule Group' can not

Rule groups

Each 'Rule group' contains zero or more rules, which can be either a

  • 'Rule groups' named as a sub group
  • 'Rule arcs' named as a shape property

Each 'Rule group' can have the following cardinality

  • 1, match must be found and satisfied
  • ?, match may be satisfied, but if found and the values mismatch then its a failure (see 'OrRuleGroup' truth table below)
NOTE (Eric): The cardinality of groups is limited to a maximum of one because the associations of repeated groups get lost when represented in a set,
e.g {Bob, 23, Sue, 18} because indistinguishable from {18, 23, Bob, Sue}.) 

Two different types of 'Rule groups' exists

  1. And rule groups, each rule must be satisfied
  2. Or rule groups, only one of rules must be satisfied

See group truth tables below for further detail

Rule ARCs

Five different types of 'Rule ARC' exists

  1. Value Type, the value type of the object must comply with one of the given value type(s)
  2. Enumeration Set, the value must be equal to (one off) the given value(s)
  3. Shape Property, the value must be a RDF subject and the subject should comply with (one of) the given shape(s)
  4. Stem Property, the value must be a IRI and the IRI should comply to the filter given.
  5. Any, any value is accepted
NOTE: For Value Type, I allow for matching against multiple value types. (E.G. the property 'age' may be either a Integer or a String)  

For Shape Property, I allow for matching against one of the shapes. (E.G. the property 'address' may be either a Dutch or American address)

Each 'Rule Arc' can have the following cardinality

  • 1, only 1 occurrence must be found and satisfied
  • ?, only 1 occurrence may be found, but if found and the values mismatch then its a failure (see 'OrRuleGroup' truth table below)
  • +, on least 1 occurrence must be found, but multiple are allowed, however all values must match otherwise its a failure
  • *, any number occurrences may be found, however all values must match otherwise its a failure
  • {N,M}, number of occurrences found must between N(cardinalityMin) and M(cardinalityMax) and all values must match otherwise its a failure

Properties of types used in pseudo code

A RuleGroup has the following properties

  • rules, all rules within a RuleGroup, which can be either a sub group(type = RuleGroup) or a shape property (type = ArcRule)
  • cardinality, which can be either '1' or '?'
  • type, which is either OrGroup or AndGroup
    • truthTable, each type has it own truthTable (see below)

A ArcRule has the following properties

  • cardinality, which can be either '1', '?','+','*','{N,M}'
  • cardinalityMin, min count of occurrences
  • cardinalitymax, max count of occurrences
  • type, which can be either EnumerationSet,ValueType,ValueProperty,StemProperty,AnyType
    • matchValue, perform the match of a value against the ArcRule, true->match and false->no match

A EnumerationSet(subclass of ArcRule) has the following property

  • allowedValues, a list of allowed values

A ValueType(subclass of ArcRule) has the following property

  • allowedTypes, a list of allowed types

A ValueProperty(subclass of ArcRule) has the following property

  • shapes, a list of references to shapes of which one must match

A StemProperty(subclass of ArcRule) has the following property

  • pattern, the pattern to which the object IRI must match

Pseudo code

The pseudo code below matches a Subject against a Rule group(which also includes Shapes).

NOTE: that the pseudo code contain 2 types of recursion. One for the sub groups and one for the Shape Property.

The result of matchRuleGroup and matchArc can be either

  • NONE = 0, no result found while expected
  • OPEN = 1, no result found however arc/rule group is optional
  • FAIL = 2, result found but one or more values are mismatching
  • PASS = 3, result found and all values are matching
//Peform the matching of all the rules in the group, the 'truthTable' table is used to combine them (see truth tables below)
matchRuleGroup = function(subject,ruleGroup)
{
  res = null;
  for(item in ruleGroup.rules)
  {
    if(item isOfType RuleGroup) {
      //recursively call matchRuleGroup for each sub group
      //NOTE first type of recursion (subgroup recursion)
      lastRes = matchRuleGroup(subject,item){N,M}
    }
    else if(item isOfType Arc) {
      lastRes = matchArc(subject,item)
    }
    if(res == null) {
      res = lastRes;
    }
    else {
      //use the appropriate evaluation function (OrGroup or AndGroup)
      res = evaluate(res,lastRes,ruleGroup.type.truthTable);
    }
  }
  if(res == NONE && ruleGroup.cardinality == '?')
    return OPEN;
  return res;   
} 

evaluate = function(l,r,truthTable)
{
  //Combine subsequent result with the previous one using the truth table (see below)
  return truthTable[l][r];
}

matchArc = function(subject,arc)
{
  //retrieve all values for a given predicate in the subject
  values = getValues(subject,arc.predicate) 
  //check if occurrence is correct
  if(values.length == 0)
    return arc.cardinality == '?' || arc.cardinality == '*' ? OPEN : NONE
  if(values.length > 1 && (arc.cardinality == '1' || arc.cardinality == '+')
    return FAIL; //failed on multiplicity
  if(arc.cardinality == '{N,M}' && (values.length < arc.cardinalityMin || values.length > arc.cardinalityMax)
    return FAIL; //NOTE when length == 0 NONE was returned
  //each value should match other wise its a fail
  for(value in values)
  {
    //use the matchValue function of the arc type
    if(arc.type.matchValue(value,arc) == false)
      return FAIL
  }
  return PASS  
}

getValues = function(subject,predicate) { 
  ...
  //Return all values for the given predicate
  >SELECT ?values
  >WHERE
  >{
  >  $subject $predicate ?values .
  >}
  return values
}

The different ArcType matchValue functions

EnumerationSet.matchValue = function(value,arc)
{
  //check if value in contained in one of the 'allowed values' given in the 'arc' rule
  return arc.allowedValues.contains(value)
}

ValueType.matchValue = function(value,arc)
{
  //get data type of the value
  type = typeOf(value)
  //check if the data type in the list of allowed data types
  return arc.allowedTypes.contains(type)
}

ValueProperty.matchValue = function(value,arc)
{
  //must be a subject so its must be an IRI
  if(!isIRI(value))
    return false
  //must match against one the shapes given in the Arc rule (most commonly only one given)
  for(shape in arc.shapes)
  { 
    //recursively call the rule matching with the new subject and shape as rule group
    //NOTE the difference with the subgroup recursion!
    res = matchRuleGroup(value,shape)
    if(res == PASS) // || res == OPEN 
      return true;
  }
  return false;
}

StemProperty.matchValue = function(value,arc)
{
  //details to be defined further
  if(!isIRI(value))
    return false
  iri = IRI(value)
  ruturn arc.pattern.match(iri)
}

AnyType.matchValue = function(value,arc)
{
  //details to be defined further
  //any value allowed
  return true
}

RuleGroup types truth tables

Each RuleGroup type(AndRuleGroup and OrRuleGroup) have their own truth tables.

AndRuleGroup.truthTable = ...
Seq NONE OPEN FAIL PASS
NONE NONE NONE FAIL FAIL
OPEN NONE OPEN FAIL PASS
FAIL FAIL FAIL FAIL FAIL
PASS FAIL PASS FAIL PASS

(taken from [4])

OrRuleGroup.truthTable = ...
Seq NONE OPEN FAIL PASS
NONE NONE OPEN FAIL PASS
OPEN OPEN OPEN FAIL PASS
FAIL FAIL FAIL FAIL PASS*
PASS PASS PASS PASS* PASS
*(changed after testing) Note in a or rule group one of the elements must match. If one passes no matter what the other results do is will pass,
hoever it something is optional and gives a fail it will make other NONE and OPEN elemetns fail. In an and rule group however an optional element
that fails will cause an overall failure.  


DISCUSSION: Is that the required behavior

Comparison to REGEX

The pseudo code can be simplified to compare with regular expressions. Comparing to normal regular expression there are two differences

  1. No order exist in SHEX so any character at any location is matches. So pattern "ab+c*" will match for example "abc", "cba", "ccbab", but not "abababac".
  2. SHEX can find a occurrence of the predicate but the values can mismatch. This is can be mimicked by putting a capitol letter. So pattern "ab+c*" will not match any of the sequences "Abc", "cBa", "CcBab". (The capital letter mimic a failed value match -> matchValue return false)

Simplifying the pseudo above will give the following pseudo code.

NOTE: The second recursion via the Shape property is lost.

The ArcRule has now the following properties

  • cardinality, which can be either '1', '?','+','*','{N,M}'
  • cardinalityMin, min count of occurrences
  • cardinalitymax, max count of occurrences
  • letter, letter to be matched
//entry function
matchAll = function(string,shape)
{
  res = matchRuleGroup(string,shape)
  if(res != PASS)
    res = FAIL
  return res 
}

//Peform the matching of all the rules in the group, the 'truthTable' table is used to combine them (see truth tables below)
matchRuleGroup = function(string,ruleGroup)
{
  res = null;
  for(item in ruleGroup.rules)
  {
    if(item isOfType RuleGroup) {
      //recursively call matchRuleGroup for each sub group
      //NOTE only first type of recursion exist (subgroup recursion)
      lastRes = matchRuleGroup(string,item){N,M}
    }
    else if(item isOfType Arc) {
      lastRes = matchArc(string,item)
    }
    if(res == null) {
      res = lastRes;
    }
    else {
      //use the appropriate evaluation function (OrGroup or AndGroup)
      res = evaluate(res,lastRes,ruleGroup.type.truthTable);
    }
  }
  if(res == NONE && ruleGroup.cardinality == '?')
    return OPEN;
  return res;   
}

evaluate = function(l,r,truthTable)
{
  //Combine subsequent result with the previous one using the truth table (see below)
  return truthTable[l][r];
}

matchArc = function(string,arc)
{
  //retrieve all values for a given predicate in the subject
  values = getValues(string,arc.letter) 
  //check if occurrence is correct
  if(values.length == 0)
    return arc.cardinality == '?' || arc.cardinality == '*' ? OPEN : NONE
  if(values.length > 1 && (arc.cardinality == '1' || arc.cardinality == '+')
    return FAIL; //failed on multiplicity
  if(arc.cardinality == '{N,M}' && (values.length < arc.cardinalityMin || values.length > arc.cardinalityMax)
    return FAIL; //NOTE when length == 0 NONE was returned
  //each value should match other wise its a fail
  for(value in values)
  {
    if(matchValue(value) == false)
      return FAIL
  }
  return PASS  
}

getValues = function(string,letter) { 
  toRet = []
  //lowerCase: mimicked capital letter should match the pattern letter
  //find all matching letter in the complete string
  for(char in lowerCase(string))
  {
    if(char == letter)
      toRet.push(char)
  }
  return toRet;
}

matchValue = function(value)
{
  //mimic a failed match when letter is a Capitol letter
  if(isCapitol(value))
    return false
  return true;
}

AndRuleGroup.truthTable = ... //same as in full version
OrRuleGroup.truthTable = ... //same as in full version

TODO update the js code in [5]

NOTE: Using this pseudo code definition then it is allowed to define rule twice. When defining a rule twice, the secondly defined rule can make the shape only more stringent.
For example when defining a pattern like "a+ba", then the second 'a' forces the cardinality of 'a' to be one. So pattern "a+ba" will match "ab" but not "aba" or "aab".