Shape Expressions Evaluation Logic

Abstract

Shape Expressions can be represented as regular expressions. The semantics of Shape Expression connectives Xor, Seq and Opt were chosen to make Shape Expressions behave identically to regular expressions. An additional function, Evl, reduces the possible results of evaluation to a simple pass or fail.

Regular Expressions and Text Sequences

Shape Expressions follws regular expression behavior with two notable differences:

  1. Shape Expression rules match elements in a set (of triples) while regular expressions match elements in a sequence (of characters).
  2. Arc rules in Shape Expressions discriminate between "not found" and "found with the wrong value".

One can map a Shape Expression and a data graph to a regular expression and a sequence of characters. This regular expression language is a subset of conventional regular expressions:

[1] RegExpr  ::= sequence disjoint*
[2] disjoint ::= "|" sequence
[3] sequence ::= GrpOrChr GrpOrChr*
[4] GrpOrChr ::= group | char count?
[5] group    ::= "(" RegExpr ")" "?"?
[6] count    ::= [?+*]
[7] char     ::= [^\\|*+?()] | "\\" [\\|*+?()]

EvaluationLogic-regex.PEG.js

(Production [5] limites the cardinality of groups to a maximum of one because the associations of repeated groups get lost when represented in a set, e.g {Bob, 23, Sue, 18} because indistinguishable from {18, 23, Bob, Sue}.)

Shape Expression Arc rules are assigned a novel lower-case letter which is added to the regular expression. For each triple matching that arc rule, that character is added to the sequence of characters. For each triple matching that ArcRule's NameClass but not the ValueClass, the upper-case form of that character is added to the sequence. For example, the ArcRule <http://x.example/p1> (1 2) could be assigned the latter 'a'. Triples

<http://x.example/s1> <http://x.example/p1> 1 . # a
<http://x.example/s1> <http://x.example/p1> 3 . # A
<http://x.example/s1> <http://x.example/p2> 2 . # not covered by the ArcRule

in the data group would be represented as the sequence 'aA'.

The shape expressions are traversed "depth-first" from the start rule. Each sequence and disjunction is explored in lexical order, ArcRules with ReferenceValues explore the ArcRule first, then the referenced rule. Below is an example of letters assigned to Shape Expression rules producing the regular expression 'abc*d+':

start = <Shape1>
<Shape1> {
    <p1> (1 2) ,      # a
    <p2> @<Shape2> ,  # b
    <p3> xsd:string + # d
}
<Shape2> {
    <p4> xsd:integer * # c
}

A data graph

<X> <p3> "Bob", "Sue" . # dd
<X> <p1> 1 .            # a
<X> <p2> <Y> .          # b
<Y> <p4> 8, 9 .         # cc

would be represented as the sequence 'abccdd', which matches the regular expression.

Truth Tables for Connectives

The Seq and Xor functions evaluate their component rules to compute a result. Simplifying these rules to a pair of a left and a right component rules allows us to build truth tables which capture the semantics. We use these symbol assignments for the "truth" states:

For the algebraic opperations Seq and Xor, the evaluation of any pair of values can be described with a diagonally symmetric truth tables. Opt and Evl have mappings from their soul argument to a result.

Seq(l, r) = if(l=𝕖|r=𝕖) return 𝕖
            else if(l=𝕗|r=𝕗) return 𝕗
            else if(l=p∧r=∅) return 𝕗
            else if(l=∅∧r=p) return 𝕗
            else if(l=𝕡|r=𝕡) return p
            else if(l=∅|r=∅) return ∅
            else return 𝕫
Seq 𝕫 𝕗 𝕡
𝕗 𝕗
𝕫 𝕫 𝕗 𝕡
𝕗 𝕗 𝕗 𝕗 𝕗
𝕡 𝕗 𝕡 𝕗 𝕡

Xor(l, r)  = if(l=𝕖|r=𝕖) return 𝕖
            else if(l=𝕡∧r=𝕡) return 𝕖
            else if(l=𝕡|r=𝕡) return p
            else if(l=𝕫|r=𝕫) return 𝕫
            else if(l=∅|r=∅) return ∅
            else return 𝕗
Xor 𝕫 𝕗 𝕡
𝕫 𝕡
𝕫 𝕫 𝕫 𝕫 𝕡
𝕗 𝕫 𝕗 𝕡
𝕡 𝕡 𝕡 𝕡

Opt(a)    = if(a=∅|a=𝕫) return 𝕫;
            return a
Opt
𝕫
𝕫 𝕫
𝕗 𝕗
𝕡 𝕡

Evl(a)    = if(a=𝕡|a=𝕫) return 𝕡;
            return 𝕗
Evl
𝕗
𝕫 𝕡
𝕗 𝕗
𝕡 𝕡
𝕗

Cuts on ArcRule, Seq and Xor

Noting that the Seq truth table above shows that there is no result that can be combined with with 𝕗 (failure) that results in anything other than 𝕗; we can return from evaluation as soon as one is detected. Xor has a similar rule where any two 𝕡 create an error state which ensures that the match will fail. Likewise, for any Arc with cardinality > 1, the first 𝕗 will ensure that the result will be 𝕗. These can also benefit from a shortcut return.

Execute tests

This interface uses the above Regular Expression model of Shape Expressions to evaluate explore every combination of arguments. The tests are grouped into one set which minimally covers the truth tables. Following are tests which explore more complex patterns. These tests which are expressed at the top of EvaluationAsRegex.js. Finally, these tests can be mechanically translated to a test manifest and the supporting files.

cut, simplify AST view as ShEx tests