Discussion SHEX format

From Semantic Web Standards

Obsolete - please see ShEx Semantics

General discussion point SHEX format

Discussion 1, possibility to allow for multiple types

When I for example would like define a property to either a string or integer or reference to be either a Person or Pet then I have to define is as

ex:Person {                 
   (ex:age xsd:string | ex:age xsd:int)?,       
   (ex:friend @xsd:Person | ex:friend @xsd:Pet)*
}

I think if would be nice if you could define such cases with

ex:Person {                 
   ex:age xsd:string | xsd:int?,   
   ex:friend @xsd:Person | @xsd:Pet*
}

Discussion 2, redefining and including other SHEX definitions

The RELAX NG compact syntax allows for including other definitions and it allows for redefining element in existing definitions. I think it would be nice if SHEX would support something similar.

Redefining rules

In the following example a property is been redefined twice

ex:Person
{
  ex:age xsd:int,
  foaf:friend @xsd:Person* 
}
ex:PetLover & ex:Person
{
  foaf:friend @xsd:Pet*
} 

With the current implementation a redefinition means that both rules must be satisfied, which is good.

However, in some cases you would like to either

  1. indicate is that is either should apply to the first or the new rule (what we want in this example)
  2. redefine the rule completely
  3. remove the rule

This would look like

ex:Person
{
  ex:age xsd:int,
  foaf:friend @xsd:Person* 
}
ex:PetLover & ex:Person
{
  |= foaf:friend @xsd:Pet* #define or relation ship with the other rule properties
  = foaf:friend @xsd:Pet* #completely redefine the rule property
  - foaf:friend #remove the rule property
}

Inclusion

One SHEX file should be able to include another one if defined within a namespace. This could look something like

SHEX ex:persons #define namespace which can then be extended
{
  ex:Person
  {
    ex:age xsd:string,
    foaf:friend @xsd:Person* 
  }
}

included by the a second namespace

@include persons1.sh #include file and load all namespaces specified in these file
SHEX ex:personsv2 
{
  use ex:persons #use namespace ex:persons 
  ex:Person
  {(ex:age xsd:string | ex:age xsd:int)?,       
    = ex:age xsd:int
  }
}

named groups

To be able to update/redefine subgroup they should be named. Being able to name them and define separately without them to be directly a Resource Shape would solve the problem of having an higher expressive power in the RDF serialization format than in the SHEX language definition. See RDF_serialization Overview and discussion.

This would look something like

#defining an or rule which is not a ResourceShape and can not be referenced by a ShapeProperty, but can be used within an ruleGroup or ResourceShape
SHEX ex:persons
{
  @someOrRule { 
    prop1 xsd:string,
    prop2 xsd:string
  }
 
  ex:Person & someOrRule
  {
    @ageOrGroup(ex:age xsd:string | ex:age xsd:int)?,  
  }
}

Which can be the redefined with something like

SHEX ex:persons2
{
  use ex:persons
  @ageOrGroup {
    =ex:age xsd:int #redefine the age property to int and remove the optional
  }
}

Discussion 3, ordered shapes

RDF is inherently unordered, however ordered lists or bags can be defined and SHEX expressions can also readily applied to Graph databases. So I think it would be nice to build support for this. In case we would like to support ordered structure we need to have support to indicate, whether order is mush be satisfied or whether any order is allowed.

So when including a subGroup by using the following statement

ex:Person & someOrRule { ... } 

It would be nicer by using the following statement, so that its possible to define the order when needed.

ex:Person {
  prop1 xsd:string,
  & someOrRule,
  prop2 xsd:string
}

Several different items have to be defined when adding support for ordered structures in combination for the possibility to redefine shapes.

Discussion 4, occurs, multiplicity and reverse multiplicity

The ShapeProperty defines the occurs property, I would like to name this as multiplicity(like as they do in UML).

In my opinion its important to have support not only for forward multiplicity but also for reverse multiplicity. There is two reasons why is nice to have this

  1. more and important information about the structure is captured
  2. it's possible to define the 'tree' needed for the complete shape expression set. All can be canonicalized by following all the 0..1 to 0..N relationships, which allows for defining a nice serialization into RDF-XM or JSON-LD or make a nice visualization in a tree viewer.

This could for example be specified with the following syntax.

ex:Person {
  friends ex:Person*-* #the -* gives the reverse multiplicity
  lover ex:Person1-1 #Person can only have one lover (
  owns ex:Car*-1 #A car can only have one owner
}

Discussion 5, enumeration values, full class support

In my opinion it would be nice not to only support simple enumeration ranges, but full classification such as the Gene Ontology. Classes/terms/enumeration values within this 'extended' enumeration can define concepts that are subconcepts of other classes/terms/enumeration values.

The following is related to this, which form part of a bigger discussion about semantics.

We could define the following property.

se:subConceptOf [ a rdfs:Property;
  rdfs:subPropertyOf rdfs:subClassOf;
  rdf:description "Defines that the concepts identified by this term to be a sub class/concept of the concept identified by the super term" 
].

Doing so will make it possible to make a distinction(that at least I personally make) between concepts that are not clearly have instances with properties

ex:ImmumeReactionToTumorCell rdfs:subClassOf ImmumeReaction .
ex:Love rfds:subClassOf ex:Emotion .

and concepts that do clearly have instances with properties

ex:Female rdfs:subClassOf ex:Human .
ex:Boat rdfs:subClassOF ex:Vehicle .

This solution would break the ambiguity that exists for my(maybe not for everyone) for the rdfs:subClassOf predicate.

Discussion 6, not and xor rulegroups

The SHEX expression uses an open matching assumption, meaning that any extra properties present that are not defined are ignored and listed as remaining triples. However is some cases you would like to denote either

  1. that certain triples should not be present
  2. only of multiple options should be present, like XOR operation

This could look something like

ex:Item {
  !prop1 #indicate there should be no prop1 property
  (propA xsd:int ^ propB xsd:int) #either propA or propB must be present, but does not both at the same time
}

Discussion 7, greedy matching

Should there be a "greedy" semantics so there's only one solution to:

schema:

 start=<a>
   <a> { <p1> . }
   <b> & <a> { <p2> . }

data:

 <s> <p1> 1 .
 <s> <p2> 2 .

Note that the live example validates as both <a> and <b> because invoking <a> implies all derived types, i.e.

((<p1> .)|
 (& <a>,
  <p2> ("2"^^<http://www.w3.org/2001/XMLSchema#integer>)))

One can always use VIRTUAL to prevent an ancestor from providing a valid solution.

schema:

 start=<a>
   VIRTUAL <a> { <p1> . }
   <b> & <a> { <p2> . }

Discussion 8, hierarchical punctuation

Should the inclusion character be &, : (as OO folks are used to) or something else. & could also be reserved for the intersection of two other shapes.