Shape Expressions can be represented as regular expressions.
The semantics of Shape Expression connectives `Xor`

, `Seq`

and `Opt`

were chosen to make Shape Expressions behave identically to regular expressions.
An additional function, `Evl`

, reduces the possible results of evaluation to a simple pass or fail.

Shape Expressions follws regular expression behavior with two notable differences:

- Shape Expression rules match elements in a set (of triples) while regular expressions match elements in a sequence (of characters).
`Arc`

rules in Shape Expressions discriminate between "not found" and "found with the wrong value".

One can map a Shape Expression and a data graph to a regular expression and a sequence of characters. This regular expression language is a subset of conventional regular expressions:

[1] RegExpr ::= sequence disjoint* [2] disjoint ::= "|" sequence [3] sequence ::= GrpOrChr GrpOrChr* [4] GrpOrChr ::= group | char count? [5] group ::= "(" RegExpr ")" "?"? [6] count ::= [?+*] [7] char ::= [^\\|*+?()] | "\\" [\\|*+?()]

^{EvaluationLogic-regex.PEG.js}

(Production [5] limites the cardinality of groups to a maximum of one because the associations of repeated groups get lost when represented in a set, e.g {Bob, 23, Sue, 18} because indistinguishable from {18, 23, Bob, Sue}.)

Shape Expression Arc rules are assigned a novel lower-case letter which is added to the regular expression.
For each triple matching that arc rule, that character is added to the sequence of characters.
For each triple matching that ArcRule's NameClass but not the ValueClass, the upper-case form of that character is added to the sequence.
For example, the ArcRule `<http://x.example/p1> (1 2)`

could be assigned the latter '*a*'.
Triples

<http://x.example/s1> <http://x.example/p1> 1 . #a<http://x.example/s1> <http://x.example/p1> 3 . #A<http://x.example/s1> <http://x.example/p2> 2 . # not covered by the ArcRule

in the data group would be represented as the sequence '*aA*'.

The shape expressions are traversed "depth-first" from the start rule.
Each sequence and disjunction is explored in lexical order, ArcRules with ReferenceValues explore the ArcRule first, then the referenced rule.
Below is an example of letters assigned to Shape Expression rules producing the regular expression '*abc*d+*':

start = <Shape1> <Shape1> { <p1> (1 2) , #a<p2> @<Shape2> , #b<p3> xsd:string + #d} <Shape2> { <p4> xsd:integer * #c}

A data graph

<X> <p3> "Bob", "Sue" . #dd<X> <p1> 1 . #a<X> <p2> <Y> . #b<Y> <p4> 8, 9 . #cc

would be represented as the sequence '*abccdd*', which matches the regular expression.

The `Seq`

and `Xor`

functions evaluate their component rules to compute a result.
Simplifying these rules to a pair of a left and a right component rules allows us to build truth tables which capture the semantics.
We use these symbol assignments for the "truth" states:

- ∅:no matches on non-0 min cardinality
- 𝕫:no matches on 0 min cardinality
- 𝕗:mismatched value
- 𝕡:match
- 𝕖:more than one disjoint passed

For the algebraic opperations `Seq`

and `Xor`

, the evaluation of any pair of values can be described with a diagonally symmetric truth tables. `Opt`

and `Evl`

have mappings from their soul argument to a result.

Seq(l, r) = if(l=𝕖|r=𝕖) return 𝕖 else if(l=𝕗|r=𝕗) return 𝕗 else if(l=p∧r=∅) return 𝕗 else if(l=∅∧r=p) return 𝕗 else if(l=𝕡|r=𝕡) return p else if(l=∅|r=∅) return ∅ else return 𝕫

Seq | ∅ | 𝕫 | 𝕗 | 𝕡 | ⅇ |
---|---|---|---|---|---|

∅ | ∅ | ∅ | 𝕗 | 𝕗 | ⅇ |

𝕫 | ∅ | 𝕫 | 𝕗 | 𝕡 | ⅇ |

𝕗 | 𝕗 | 𝕗 | 𝕗 | 𝕗 | ⅇ |

𝕡 | 𝕗 | 𝕡 | 𝕗 | 𝕡 | ⅇ |

ⅇ | ⅇ | ⅇ | ⅇ | ⅇ | ⅇ |

Xor(l, r) = if(l=𝕖|r=𝕖) return 𝕖 else if(l=𝕡∧r=𝕡) return 𝕖 else if(l=𝕡|r=𝕡) return p else if(l=𝕫|r=𝕫) return 𝕫 else if(l=∅|r=∅) return ∅ else return 𝕗

Xor | ∅ | 𝕫 | 𝕗 | 𝕡 | ⅇ |
---|---|---|---|---|---|

∅ | ∅ | 𝕫 | ∅ | 𝕡 | ⅇ |

𝕫 | 𝕫 | 𝕫 | 𝕫 | 𝕡 | ⅇ |

𝕗 | ∅ | 𝕫 | 𝕗 | 𝕡 | ⅇ |

𝕡 | 𝕡 | 𝕡 | 𝕡 | ⅇ | ⅇ |

ⅇ | ⅇ | ⅇ | ⅇ | ⅇ | ⅇ |

Opt(a) = if(a=∅|a=𝕫) return 𝕫; return a

Opt | |
---|---|

∅ | 𝕫 |

𝕫 | 𝕫 |

𝕗 | 𝕗 |

𝕡 | 𝕡 |

ⅇ | ⅇ |

Evl(a) = if(a=𝕡|a=𝕫) return 𝕡; return 𝕗

Evl | |
---|---|

∅ | 𝕗 |

𝕫 | 𝕡 |

𝕗 | 𝕗 |

𝕡 | 𝕡 |

ⅇ | 𝕗 |

Noting that the `Seq`

truth table above shows that there is no result that can be combined with with `𝕗`

(failure) that results in anything other than `𝕗`

; we can return from evaluation as soon as one is detected.
`Xor`

has a similar rule where any two `𝕡`

create an error state which ensures that the match will fail.
Likewise, for any `Arc`

with cardinality > 1, the first `𝕗`

will ensure that the result will be `𝕗`

.
These can also benefit from a shortcut return.

This interface uses the above Regular Expression model of Shape Expressions to evaluate explore every combination of arguments. The tests are grouped into one set which minimally covers the truth tables. Following are tests which explore more complex patterns. These tests which are expressed at the top of EvaluationAsRegex.js. Finally, these tests can be mechanically translated to a test manifest and the supporting files.

cut,
simplify AST
view as ShEx tests