Difference between revisions of "ShEx/CurrentDiscussion"

From Semantic Web Standards
Jump to: navigation, search
(Current Discussion and work)
(discussion about 'or' operation (inclusive|exclusive))
Line 56: Line 56:
 
   (user_name xsd:string+ | given_name xsd:string+)
 
   (user_name xsd:string+ | given_name xsd:string+)
 
  }
 
  }
 +
6. Either give a name or set that must have a one or more givenNames with only one family name
 +
USER {
 +
  foaf:name xsd:string | (foaf:givenName xsd:string+, foaf:familyName xsd:string)
 +
}
 +
 
If we would give the following data to use case 2 using the OR table as given in the validation code
 
If we would give the following data to use case 2 using the OR table as given in the validation code
 
  <user1> user_name "p13t"
 
  <user1> user_name "p13t"

Revision as of 09:35, 15 April 2014

Current Discussion and work

Since there are many topics to discuss, we think it best to select a view to focus on.

To support this discussions 2 supporting test application have been created see [1] and ValidationCode.

The second script has only support for the RDF Shex description, support for the Shex parsing has to be added.

Also support for closed shapes has not been added yet.

The following items have been put into focus.

Naming of the standard currently named as SHape EXpression (SHEX)

The current name however make people think/refer to shapes in geometry. We think we should think of a better name.

RDF Schema would be a nice name, however this name has already been used by some other project.

I would like anyone to suggest names here.

  • (jesse) I would like to suggest Graph Schema, but then we must also capture schema matching for ordered graphs

discussion about 'or' operation (inclusive|exclusive)

This discussion related to the definition of the And and Or Truth Tables. See tables in [2] and ValidationCode#Truth_tables for the current definition in the validation script.

The definition of the And seems to be Ok, however the definition of the Or table raises discussion.

For this discussion we were thinking of 2 types of or 'inclusive(OR)' or 'exclusive(XOR)', but for now I have chosen the exclusive(XOR) type which matches the one in NG Relax.

As a support to this discussion see here the description off the xml schema choice element [3]

For this discussion we defined the following use cases, with associated SHEX definition(With the solution I think it should contain).

1. A user_name and given_name must be given

USER {
  user_name xsd:string
  given_name xsd:string
}

2. A user_name, given_name or family_name must be given but not a combination of them and not more then one

USER {
  (user_name xsd:string | given_name xsd:string | family_name xsd:tring)
}

3. A user_name or given_name must be given, both my be given, but not more then one for each of them

USER {
  ((user_name xsd:string,
    given_name xsd:string?) | 
   (user_name xsd:string?,
    given_name xsd:string))
}

4. A user_name or given_name must be given, any number or combinations are allowed

USER {
  (user_name xsd:string | given_name xsd:string)+
}

5. Multiple user_name's or given_name's must be given, but the combination of user_name and given_name are not allowed

USER {
  (user_name xsd:string+ | given_name xsd:string+)
}

6. Either give a name or set that must have a one or more givenNames with only one family name

USER {
 foaf:name xsd:string | (foaf:givenName xsd:string+, foaf:familyName xsd:string)
}

If we would give the following data to use case 2 using the OR table as given in the validation code

<user1> user_name "p13t"
<user1> given_name "jan"

Then it will pass, instead of failing, this is because both lines will be accepted by (user_name xsd:string | given_name xsd:string | family_name xsd:tring) or rule group, without increasing the cardinality to 2.

To overcome this the cardinality has to be increased by 1 if any item is found that is NONE, OPEN or PASS. The count will start at 0. So an extra table has to created that would tell when the cardinality count should be increased that would look like

OrRuleGroup.addCardinalityTable = ...
Seq NONE OPEN FAIL PASS
NONE 0 1 0 1
OPEN 0 1 0 1
FAIL 0 1 0 1
PASS 0 1 0 1

Now use case 2 will fail on the given data, however usecase 3 will be now always failing as both AndRuleGroups will pass and the cardinality becomes 2. To overcome this a mechanism must be included that temporarily exclude triples that are already used to satisfied any of the already passed arc or groups in the the OrRuleGroup. So in case of usercase 3 any triples matched to satified the first And rule group

(user_name xsd:string, given_name xsd:string?)

are temporarily disabled when performing the match for the secondary and rule group

(user_name xsd:string?, given_name xsd:string)

So that it will not pass anymore based on the triples matched by the first and rule group and so the count will be stay 1 and the outer OrRuleGroup's cardinality check will pass.

The exclusion of triples is related to the open and closed shapes and discussion of excluding any triples from being rematched by any following rules in the shape. I (jesse) personally prefer to allow rematching the of triples as it allows for rule that are stricter then the match in the 'parent' shape. Please see discussion below.

Note that having an optional for an item in an OrRuleGroup does not have any extra meaning.

The use cases has been included as test cases in the validation script at github, however they are still failing as the script has to be updated.

TODO

  • update OrRulegroup can have occurs + * {N,M}
  • update the Validate script to include this solutions
  • add more

Discussion on closed/open shapes and exclusion of matched triples

There are already have been a discussion about open and closed shapes and excluding already matched triples from rematching any following rule, this discussion is important as it relates to the previous discussion.

When matching a space to a piece of RDF data then the matched triples can be excluded to be further matched against any other rules. However this would make it impossible to redefine a ARC and make is more strict as it can not match the triples already matches by the less strict rule. This especially use full for defining the allowed values in the RDF:type predicate.

A shape can either be define open or closed. An open shape would match a subject if all rules in the shape are passed, however not all triples have to be matched. For a closed shape however each triple in the subject has to be matched. When a shape is defined as closed in can not be further extended.

In my(jesse) opinion is would be best if a shape is by default open and can be defined as closed.

References of and- or rule groups

The ValidationCode script is based on the RDF Shex format, which allows for referencing to named Or and And rule groups. However this is not possible yet in the SHEX syntax. In the current RDF Shex definition there is a difference between the ResourceShape and AndRuleGroup. A Resourceshape is an extension on the AndRuleGroup. Only a Resourceshape can be referenced by a ShapeArc, whereas a AndRuleGroup can not. A resource shape, however, must have an occurence of exactly one.

*Discussion point: Should we have seperate ResourceShape and AndRuleGroup or should this be merged to one.