Difference between revisions of "ShEx/CurrentDiscussion"

From Semantic Web Standards
Jump to: navigation, search
(discussion about 'or' operation (inclusive|exclusive))
(discussion about 'or' operation (inclusive|exclusive))
Line 74: Line 74:
 
It should pass however, both sites of the OrGroup pass resulting in a 'ERROR' = fail state. To overcome this problem 2 solution can be proposed
 
It should pass however, both sites of the OrGroup pass resulting in a 'ERROR' = fail state. To overcome this problem 2 solution can be proposed
 
#Add a mechanism to exlucde any triples that are already matched by any item that passed from matching any item afterwards in the group
 
#Add a mechanism to exlucde any triples that are already matched by any item that passed from matching any item afterwards in the group
**- this makes the matching algorithm more complex
+
*- this makes the matching algorithm more complex
 
#Add the ! operator feature, this might be needed because SHEX is based on open matching (related to open versus closed discussion)  
 
#Add the ! operator feature, this might be needed because SHEX is based on open matching (related to open versus closed discussion)  
**- maybe this could result in logically complex SHEX definitions  
+
*- maybe this could result in logically complex SHEX definitions  
  
 
Note that use case 4 can be solved with "(user_name xsd:string | given_name xsd:string)+" in relaxNg, however we have the cardinality limitation of 1 and so it has to be solved in a similar way as done for use case 3.
 
Note that use case 4 can be solved with "(user_name xsd:string | given_name xsd:string)+" in relaxNg, however we have the cardinality limitation of 1 and so it has to be solved in a similar way as done for use case 3.

Revision as of 13:22, 16 April 2014

Current Discussion and work

Since there are many topics to discuss, we think it best to select a view to focus on.

To support this discussions 2 supporting test application have been created see [1] and ValidationCode.

The second script has only support for the RDF Shex description, support for the Shex parsing has to be added.

Also support for closed shapes has not been added yet.

The following items have been put into focus.

Naming of the standard currently named as SHape EXpression (SHEX)

The current name however make people think/refer to shapes in geometry. We think we should think of a better name.

RDF Schema would be a nice name, however this name has already been used by some other project.

I would like anyone to suggest names here.

  • (jesse) I would like to suggest Graph Schema, but then we must also capture schema matching for ordered graphs

discussion about 'or' operation (inclusive|exclusive)

This discussion related to the definition of the And and Or Truth Tables. See tables in [2] and ValidationCode#Truth_tables for the current definition in the validation script.

The definition of the And seems to be Ok, however the definition of the Or table raises discussion.

For this discussion we were thinking of 2 types of or 'inclusive(OR)' or 'exclusive(XOR)', but for now we have chosen the exclusive(XOR) type which matches the one in NG Relax.

For this discussion we defined the following use cases, with associated SHEX definition(With the solution we think it should contain).

1. A user_name and given_name must be given

USER {
  user_name xsd:string
  given_name xsd:string
}

2. A user_name, given_name or family_name must be given but not a combination of them and not more then one

USER {
  (user_name xsd:string | given_name xsd:string | family_name xsd:tring)
}

3. A user_name or given_name must be given, both my be given, but not more then one for each of them

USER {
  ((user_name xsd:string,
    given_name xsd:string?) | 
   given_name xsd:string)
}

4. A user_name or given_name must be given, any number or combinations are allowed

USER {
  ((user_name xsd:string+,
    given_name xsd:string*) | 
   given_name xsd:string+)
}

5. Multiple user_name's or given_name's must be given, but the combination of user_name and given_name are not allowed

USER {
  (user_name xsd:string+ | given_name xsd:string+)
}

6. Either give a name or set that must have a one or more givenNames with only one family name

USER {
 foaf:name xsd:string | (foaf:givenName xsd:string+, foaf:familyName xsd:string)
}

If we would give the following data to use case 2 using the OR table as given in the validation code

<user1> user_name "p13t"
<user1> given_name "jan"

Then it will pass, instead of failing, this is because both lines will be accepted by (user_name xsd:string | given_name xsd:string | family_name xsd:tring) or rule group, without increasing the cardinality to 2.

To overcome this the cardinality has to be increased by 1. However, a (or) group can not have a cardinality of more then 1. Also an or group can not have a cardinality of more then 1 as a or group can contain an and rule group, for which it is impossible to have a cardinality higher then 1 as it is impossible to group items in an unordered structure. Making special exceptions to still support this will make thing complex and break down the mathematical logics.

So to keep the implementation simply, instead of increasing the cardinality by one we extended the or rule group table and include an error state, which will be triggered when there is a pass upon a pass (1+1 = cardinality 2 = error). So the or operator is a exclusive one. This is the same as in relaxng, which the difference that the exclusive or item can not have a cardinality off more then 1.

Using this solution we can solve most uses cases however, use case 3 can not be solved with the following data

ex:user1 ex:user_name "p13t" ;
ex:given_name "jan" .

It should pass however, both sites of the OrGroup pass resulting in a 'ERROR' = fail state. To overcome this problem 2 solution can be proposed

  1. Add a mechanism to exlucde any triples that are already matched by any item that passed from matching any item afterwards in the group
  • - this makes the matching algorithm more complex
  1. Add the ! operator feature, this might be needed because SHEX is based on open matching (related to open versus closed discussion)
  • - maybe this could result in logically complex SHEX definitions

Note that use case 4 can be solved with "(user_name xsd:string | given_name xsd:string)+" in relaxNg, however we have the cardinality limitation of 1 and so it has to be solved in a similar way as done for use case 3.

If some one can think up a (other) case that can not be solved with the exclusive or please add it as use case 7.

So the final logic tables becomes like as taken from Eric's definition

OrRuleGroup.truthTable = ...
Seq NONE OPEN FAIL PASS ERROR
NONE NONE OPEN NONE PASS ERROR
OPEN OPEN OPEN OPEN PASS ERROR
FAIL NONE OPEN FAIL PASS ERROR
PASS PASS PASS PASS ERROR ERROR
ERROR ERROR ERROR ERROR ERROR ERROR
AndRuleGroup.truthTable = ...
Seq NONE OPEN FAIL PASS ERROR
NONE NONE NONE FAIL FAIL* ERROR
OPEN NONE OPEN FAIL PASS ERROR
FAIL FAIL FAIL FAIL FAIL ERROR
PASS FAIL* PASS FAIL PASS ERROR
ERROR ERROR ERROR ERROR ERROR ERROR

(work in pogress) The use cases has been included as test cases in the validation script at github, however they are still failing as the script has to be updated.

TODO

  • include tests form eric into my(jesse) validation tests
  • add more

Discussion on closed/open shapes and exclusion of matched triples

There are already have been a discussion about open and closed shapes and excluding already matched triples from rematching any following rule, this discussion is important as it could relate to the previous discussion.

When matching a space to a piece of RDF data then the matched triples can be excluded to be further matched against any other rules. However this would make it impossible to redefine a ARC and make is more strict as it can not match the triples already matches by the less strict rule. This especially use full for defining the allowed values in the RDF:type predicate.

A shape can either be define open or closed. An open shape would match a subject if all rules in the shape are passed, however not all triples have to be matched. For a closed shape however each triple in the subject has to be matched. When a shape is defined as closed in can not be further extended.

In my(jesse) opinion is would be best if a shape is by default open and can be defined as closed.

References of and- or rule groups

The ValidationCode script is based on the RDF Shex format, which allows for referencing to named Or and And rule groups. However this is not possible yet in the SHEX syntax. In the current RDF Shex definition there is a difference between the ResourceShape and AndRuleGroup. A Resourceshape is an extension on the AndRuleGroup. Only a Resourceshape can be referenced by a ShapeArc, whereas a AndRuleGroup can not. A resource shape, however, must have an occurence of exactly one.

*Discussion point: Should we have seperate ResourceShape and AndRuleGroup or should this be merged to one.