Warning:
This wiki has been archived and is now read-only.

ShapeRequirements

From RDF Data Shapes Working Group
Jump to: navigation, search

This page provides a (currently short) description of requirements for constraints/shapes relevant to the RDF Data Shapes Working Group and their status.

Contents

Process

Requirements can be: 1) proposed, 2) under consideration, 3) approved, or 4) rejected.

Anyone can propose a requirement. Each requirement needs a short description, the name or initials of the person who proposed it, and possibly an example in some constraint/shape technology.

Once a requirement has been proposed, it has to be endorsed by at least three persons to move to being under consideration [1]. Requirements under consideration will then qualify for full attention by the WG which will decide how to dispose of it.

The status of a requirement is indicated by one of the following:

  • Status: Proposed
  • Status: Under consideration
  • Status: Approved: <link to relevant resolution>
  • Status: Rejected

To endorse a requirement, just add your name or initials to the Votes line. If you're the third person, change the status from "Proposed" to "Under consideration" using the above markup.

Requirements

Higher-Level Language

Constraints/shapes shall be specifiable in a higher-level language with 1. definitional capabilities, such as macro rolling up and naming, and 2. control infrastructure for, e.g., recursion.

Concise Language

Constraints/shapes shall be specifiable in a concise language

  • Status: Approved: F2F1 meeting 30 October 2014.
  • Derived from: Dublin Core Requirement 184.
  • Votes: ArthurRyman +0, HaroldSolbrig +1
  • Comment (hs): While this is a tad vague, we have three different uses for shapes:
    • Shapes as constraints: Determine whether an RDF Graph/Dataset meets the requirements asserted in a shape
    • Shapes as a query language: Use shape definitions to return a subset of a graph that meets the shape requirements. Note that this is not intended to be the same as SPARQL, (although, arguably, SPARQL could be a "compiled" language), as the purpose is to isolate data sets that have the same form and data types -- it is quite possible that the referents may be quite different.
    • Shapes as documentation: Declare in a concise, unambiguous way the structuring rules that are in place in a given data set or triple store. It is this third requirement that leads to the "concise language"

Addressability

Collections of constraints/shapes may be addressable and discoverable. Individual constraints/shapes may be addressable and discoverable.

Annotations

Constraints/shapes may incorporate extra information that does not affect validation. It shall be possible to search for constraints/shapes with particular extra information.

Constraints/Shapes on Properties

Association of Class with Shape

There will be an property to connect a class to a shape, with the implication that every instance of that class must conform to that shape.

  • Status: Under consideration
  • Derived from: S3, S10, S11, S12, S13, S15, S19, S20, S29, S36
  • Votes: HK, DTA, SSt:0.5, ericP:+1, pfps:-0.5, labra: -1
  • Comment (pfps): What does it mean that a property is associated with a class? RDF does not have anything like this.
  • Answer (HK): RDF does not have this yet, but many user stories express the desire to have a way to tell agents which properties are recommended/required to be used for instances of a given class, e.g. to create input forms.
  • Answer (HK): I am open to that, if we can come up with an agreement on what Shape means. I am for example perfectly happy to have these "classes" untyped and anonymous nodes, i.e they are just some resource where constraints can be attached to. But the term "class" must be a subset of "shape" then, and here we re-enter a discussion that we didn't conclude on a few weeks ago.
  • Objection (pfps): A constraint is something that can fail, I think. How would these constraints fail? (How would these shapes be unshapely?)
  • Answer (HK): This particular item here (Declaration of Member Properties at Classes) is really just a grouping parent for the more specialized requirements below. By itself, it doesn't have a condition that could fail, but it is rather a structural design principle that is needed to organize constraints.
  • Comment (SSt): Supporting this requirement as being a "grouping parent req." as HK mentioned.
  • Comment (hs): I also vote +1 to Eric's proposal. An obvious and common use of shapes would be the ability to say that all elements of type X should conform to shape S, but we also need to support the ability to define shapes independent of the associated type.

Property Min/Max Cardinality

The stated values for a property may be limited by minimum/maximum cardinality, with typical patterns being [0..1], [1..1], [0..*] and [1..*].

  • Status: Under consideration
  • Derived from: S10, S11, S13, S19, S20
  • Votes: HK, DTA, pfps, SSt:+1, ericP:+1, KC:+1, labra: +1
  • Comment: (KC) We have cases where there is min/max on types/graphs; there must be one instance of classX in each graph. Is that covered by this requirement?
  • Answer (HK): No, but the ability to express "there must be one instance of classX" is covered by a combination of other requirements: Complex constraints, aggregations (for counting), function macros and named graphs. However, I did notice that we have not yet recorded the requirement to have "global" (aka "static") constraints. I have added such an entry as "Static Constraints" below. You may want to add another user story to support this requirement.

Property Datatype

The values may be limited by their datatype, such as xsd:string or xsd:date.

  • Status: Under consideration
  • Derived from: S10, S11, S13, S19, S20
  • Votes: HK, pfps, SSt:+1, KC:+1, ericP:-1

Property Type

The values of a property on instances of a class may be limited by their RDF type, such as ex:Person (or a subclass thereof).

  • Status: Under consideration
  • Derived from: S10, S11, S13, S19, S20
  • Votes: HK, pfps, SSt:+1, KC+1, ericP:-.5
  • Objection (ericP): redundant against #Association of Class with Shape

Property's RDF Node Type (e.g. only IRIs are allowed)

The values of a property may be limited by their RDF node type, e.g. IRI, BlankNode, Literal, or BlankNodeOrIRI (for completeness we may want to support all 7 combinations including Node as parent).

  • Status: Under consideration
  • Derived from: S8
  • Votes: HK, ericP:+1, pfps:-1, SSt:+1, KC:+0, hs:+1
  • Comment: (KC) Is the only difference between this requirement and the one above the ability to require blank nodes? Because what this seems to be to me is a requirement that allows you multiple value types, and those types are "OR"d. This could be useful in cases where the value is: "select from this list, or provide a new value not from the list."
  • Answer (HK): Node type and value type are orthogonal to each other, and restrict another dimension of the RDF data model (value type is about rdf:type triples, while node type is about the kind of RDF node). To create a union of classes, it is already possible to create a shared superclass (e.g. PersonOrOrganization which has Person and Organization as subclasses). The issue about "or provide a new value not from the list" looks to me like something that needs to be solved by the UI tool, which could create a new instance on the fly, but does not look like a structural constraint to me.
  • Answer (HK): Yes that sounds good. I had the term "Property" in all sibling requirements, so for consistency I have changed it to "Declaration of Property's RDF Node Type" if that's OK. Maybe we can also use a different term altogether, such as "Node Kind" but I am not sure what is established.
  • Objection (pfps): There needs to be more work on this requirement. How does it interact with entailment, or skolemization, or ...?
  • Answer (HK): Ok, let's try to get this sorted out. I believe that entailment is orthogonal to this requirement, and would impact almost any other requirement too. I believe we have an open ISSUE to discuss entailment in general. I would define the behavior of this using the SPARQL built-ins isIRI, isBlank, isLiteral. If a database decides to turn a blank node into a IRI (skolemization?) on the fly then it violates the semantics of SPARQL. Overall, I don't believe you really object to the requirement (there are User Stories about this), but rather about the solution, and we need to look into the details of that once we have a proposal.

Property Default Value

It should be possible to provide a default value for a given property, e.g. so that input forms can be pre-populated and to insert a required property that is missing in a web service call.

  • Status: Proposed
  • Derived from: S11, S19, S20
  • Votes: HK, SSt:+0.5, ericP:+.5, pfps:-1, KC+1, labra: 0
  • Objection (pfps): I think this needs to be considered by the working group, but I would vote against it in its current form.
  • Comment (SSt): What's the issue with its current form?
  • Comment (pfps): I'm against constraints that are not constraints. How would this fail?
  • Answer (HK): I agree this is not a condition that can fail. However, I also think the user stories and the discussions that lead to the formation of this group (e.g. the Resource Shapes stories) have expressed the desire to not only have constraints as conditions, but also as ways to express structural information that can be used to populate input forms, call web services etc. Filling in a default value if no other value has been given sounds like a very useful thing to have from this perspective, and is a requirement that we have encountered very frequently in customer situations. For example SPIN itself uses spl:defaultValue internally, to fill in unspecified arguments of template calls. (Based on your -1 I have changed the status back to Proposed).
  • Comment (SSt): I second pfps concerns with being "non-failable" but also agree with HKs answer. Maybe we should discuss the more general problem of having useful requirements which are not able to fail per se in the next call.
  • Comment (labra): Adding default values to the input graph that we are validating means that we are changing that input graph. I think it would increase the complexity of the solutions, because we should consider in which moment we add those default values and it would be more difficult to have a declarative way to reason about the whole process of validation. Anyway, if people think it is really necessary, I would not oppose to it.
  • Answer (HK): Absolutely: this requirement is only about using default values to inform things like input forms or web services, but not to automatically "infer" any new triples from this info while constraints are evaluated. This would indeed require a difficult infrastructure and have serious performance issues. I have clarified this above. If you are satisfied, please remove your comment and my response, and possibly adjust your vote.

Property Labels

It should be possible to provide human-readable labels of a property in a shape, not just globally for the rdf:Property. Multiple languages should be supported.

  • Status: Proposed
  • Derived from: S11, S19
  • Votes: HK, ericP:0, pfps:-1, SSt:0, KC+1
  • Comment (pfps): I would need some indication of how this is a constraint before I would vote for it. I also do not understand what rdf:Property is doing here.
  • Answer (HK): This is not a constraint in the sense of a checkable condition. However, the goal of this spec is to provide *structural* constraints and labels and comments are structurally useful for building user interfaces and documenting the intent of a property.
  • Comment (SSt): So basically, if property P is used within a constraint C for class C1 and within a constraint C' for class C2 one should be able to define labels for its representation in C as well as C' (which might differ from each other)?
  • Answer (HK): Yes. An example would be ex:code to have label "ISO code" in class 1 and just "code" in class 2.
  • Objection (pfps): This doesn't seem to have anything to do with constraints. I think that this can be covered for constraints by having descriptive messages coming from constraint violations.
  • Answer (HK): Again, this will become clearer once we look at specific solution languages. If the language has a mechanism to attach properties to classes (such as oslc:property) then it makes sense to reuse the same (blank) node to also hold the label - from a user's point of view.

Property Comment

It should be possible to provide human-readable descriptions of the role of a property in a shape, not just globally using triples that have the rdf:Property as subject. Multiple languages should be supported.

  • Status: Proposed
  • Derived from: S11, S19
  • Votes: HK, ericP:-.5, pfps:-1, KC+1
  • Comment (pfps): I would need some indication of how this is a constraint before I would vote for it. I also do not understand what rdf:Property is doing here.
  • Answer (HK): See my answer on labels above. With rdf:Property I was referring to triples that have the property itself as the subject, e.g. ex:name rdfs:label "name". Clarified above.
  • Objection (pfps): This doesn't seem to have anything to do with constraints. I think that this can be covered for constraints by having descriptive messages coming from constraint violations.
  • Answer (HK): Again, this will become clearer once we look at specific solution languages. If the language has a mechanism to attach properties to classes (such as oslc:property) then it makes sense to reuse the same (blank) node to also hold a comment - from a user's point of view.

Datatype Property Facets

For datatype properties it should be possible to define frequently needed "facets" to drive user interfaces and validate input against simple conditions, including min/max value, regular expressions, string length etc. similar to XSD datatypes.

  • Status: Proposed
  • Derived from: S3, S11, S12, S13, S19, S20, S29
  • Votes: HK, ericP:-1, labra: 0, KC:+0
  • Comment (pfps): Is this the *ability* to do this or just the ability to do it *easily*?
  • Answer (HK): We could enumerate all the facets that we want to cover, including those from XML Schema (and later OWL 2 datatypes). I wanted to keep it short for now. Whether something is easy is relative, so for now it's more about the ability itself.
  • Comment (DTA): I am confused by these requirements. It seems as if the same property will be displayed and behave differently, if it is used for a member of one class than it will when used for a member of another. This seems to be what we have normally thought of as two different properties. Am I confusing what is going on here?
  • Answer (HK): This resembles local owl:Restrictions more than global rdfs:ranges. You can have multiple owl:Restrictions with different value types in different classes, and this requirement is about similar scenarios. In other words, the same property IRI can appear in different classes.
  • Objection (ericP): I'd like to treat facets as separate reqs. I've voted +1 for min/max above; I'd vote 0 for regex and strlen.
  • Answer (HK): I don't understand why you voted -1 for the overall ticket. If you want to split the current requirement, then please feel free to edit this wiki page and create new tickets. Then you should be able to vote +1 for those that you support and leave the others on 0.

Property Value Enumerations

Shapes will provide exhaustive enumerations of the valid values (both literals or other).

  • Status: Under consideration
  • Derived from: S3, S11, S37
  • Votes: HK, DTA, KC+1, ericP:+1, pfps:+1, SSt:+1, labra: +1, hs:+1
  • Comment (DTA): I am supporting this with the understanding that a solution might be as simple as using something like a skos:Concept to mean a dedicated picklist for this situation (i.e., I am reading this as a requirement, not as a spec for a solution).
  • Comment (KC) Note that S37 provides cases not explicitly mentioned here.
  • Answer (HK): Yes, S37 has additional requirements that are covered elsewhere. If you have uncovered requirements from S37, please feel free to add them as new entries.
  • Comment (KC): The "exhaustive enumeration" bothers me, because we work with "enumerations" in the hundreds of thousands. Can I indicate that the value space must come from one or more vocabularies as defined by their URI pattern? The DTA use of skos:Concept can sometimes help, but not all lists are SKOS lists. (e.g. Geonames, DBpedia)
  • Answer (HK): Restricting by graph or namespace are not covered by this requirement here. They could be expressed using the Complex constraints (assuming they are in SPARQL). All of the "Declarations of..." requirements are simply syntactic sugar for the most commonly needed patterns. If you believe the by graph or namespace is common enough then please feel free to record them as additional requirements on the same category as this ticket here.

Properties Used in Inverse Direction

Shapes will provide a syntax to require arcs into a node.

  • Status: Under consideration
  • Derived from: S36
  • Votes: HK, pfps, SSt:+1, ericP:+1, KC:0, labra: +1, hs:+1

Primary Key

It is often useful to state that given (datatype) property is the "primary key" of a class, so that the system can enforce uniqueness and also automatically build URIs from user input and data imported from relational databases or spreadsheets.

  • Status: Proposed
  • Derived from: S5, S25
  • Votes: HK, SSt:+0.5, ericP:0, labra: -0.5, KC:0
  • Comment (pfps): What is a primary key? Is there any constraint that derives from this, or should this just be for any key?
  • Commment (pfps): Is there a constraint component of the building URIs part of this requirement?
  • Answer (HK): This is all detailed at https://www.w3.org/2014/data-shapes/wiki/Primary_Keys_with_URI_Pattern (see the SPIN snippet at the end) and I would not want to repeat all this info here.
  • Comment (pfps): What makes a primary key different from a key?
  • Answer (HK): I am not sure if the term "key" is clear enough, but sure this could be changed.
  • Comment (ericP): The use cases I've seen involve quite a lot of complexity because the key is not only unique to a particular graph, but instead to a business process which spans many graphs and datasets, e.g. medical terminology codes, UPC codes, etc.
  • Answer (HK): Yes there are complex cases where no simple design pattern is sufficient. Those would need to be expressed as "complex constraint". If a key must be unique across multiple graphs then the graph declaring the constraint needs to reference the other graphs (e.g. via an include).
  • Objection (labra): Although I think adding primary keys may be interesting for simple cases, doing it right may increase the complexity of the solution as we should consider cases like compound keys, uniqueness, null values, key-references, etc.
  • Answer (HK): Yes, this ticket here would only cover the basic case of a single property. More complex cases can be represented using new templates that anybody can create and publish.

Complex Constraints

Constraint Extensions

Shapes will have a defined extension mechanism enabling other languages to provide supplementary constraints, (e.g. basic graph patterns, string and mathematical operations and comparison of multiple values).

  • Status: Under consideration
  • Derived from: S5, S21, S22, S23, S25, S26, S27, S30
  • Votes: HK, SSt:+1, KC:+1, labra: 0
  • Comment (labra): I agree with some of the "complex constraints", but I am not so sure about others

Expressivity: Basic Graph Patterns

Many constraints require matching patterns within the graph, often represented via linked triple patterns (SPO) and property paths. Requires variable bindings for matching, so that multiple values can be compared with each other.

  • Status: Proposed
  • Derived from: S1, S5, S17, S21, S22, S23, S26, S27, S30
  • Votes: HK, SSt:+1, pfps:0.5, KC+0, labra: -0.5, ericP:-1
  • Comment (pfps): Is there a constraint component of the variable bindings part of this requirement?
  • Answer (HK): I wanted to express that many constraints require comparing multiple values (added).
  • Comment (pfps): Is "variable binding" the requirement, or just being able to compare multiple values?
  • Answer (HK): I guess it's the ability to pick a node from one place in the graph and compare it with another node in another place in the graph.
  • Comment (pfps): As long as the requirement is not directly for variable bindings, but instead for being able to compare then I'm fine with this.
  • Comment (SSt): So e.g. a constraint should be able to check whether two particular values of ex:name are identical?
  • Answer (HK): Yes, checking that two instances have the same value is a classical use case here, but many others too.
  • Objection (labra): The requirement looks too SPARQL-specific. Is there some simple motivating example? I was looking at the list of user stories that are mentioned and it was not clear.
  • Comment (pfps): If pattern is not read as "SPARQL BGP" then pattern is probably OK, but I agree that some more neutral language would be better.
  • Answer (HK): I have no suggestion for a more neutral language. I really just want the equivalent of SPARQL BGPs, and we have plenty of practical examples almost everywhere.
  • Objection (ericP): +1 to labra: PROPOSE: Withdraw: redundant against #Constraint Extensions

Expressivity: Non-Existence of Patterns

Shapes will constraint on the absense as well as the presence of arcs.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S1, S2, S22, S23
  • Votes: HK, pfps, SSt:+1, ericP:+1, KC+1, labra: +1, hs:+1
  • Comment (pfps): I'm reading this as being about the absence of information, such as having no fillers, or at most one filler.
  • Answer (HK): It's basically FILTER NOT EXISTS in SPARQL.

Expressivity: String Operations

Shapes will provide string and URI manipulation.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S5, S23
  • Votes: HK, SSt:+1, DK+1, labra: 0, KC:0
  • Commment (pfps): If S5 is going to be appealed to in this requirement, the story should provide an example of string operations in the story list.
  • Answer (HK): The EPIM_ReportingHub already had some examples, but I have added details

Expressivity: Language Tags

Shapes will constrain on presence and uniqueness of language tags.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S21
  • Votes: HK, SSt:+1, KC+1, labra:+1, DK:+1
  • Comment (labra): I voted +1 to have mechanisms to compare language tags although I think the last phrase "Also to produce multi-lingual error messages" should be a separate requirement.
  • Answer (HK): Yes good point, I have added a sentence about multi-lingual constraint messages to the corresponding ticket below.

Expressivity: Mathematical Operations

Shapes will provide numeric manipulation.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S5
  • Votes: HK, SSt:+1, DK+1, KC+1, labra: 0, hs:0
  • Commment (pfps): If S5 is going to be appealed to in this requirement, the story should provide an example of mathematical operations in the story list.
  • Answer (HK): Added an example of the + operator to the EPIM ReportingHub page. If you want, I can request adding a new story on QUDT unit conversion or the Rectangle example mentioned above. But the need for these things should be obvious.

Expressivity: Literal Value Comparison

Shapes will provide numeric and date comparison.

Expressivity: Logical Operators

The language should make it possible to express the basic logical operators intersection, union and negation of conditions.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S5, S26, S33
  • Votes: HK, pfps, SSt:+1, KC+1, labra: 0, hs:+1

Expressivity: Transitive Property Traversal

Some constraints need to be able to traverse a property transitively, especially rdfs:subClassOf but also other parent-child relationships.

  • Status: Under consideration
  • Derived from: S16, S23, S26
  • Votes: HK, pfps, SSt:+1, KC+1, labra: -0.5, ericP: -0.5
  • Comment (labra): I am not sure if this is about adding some kind of inference or just traversing a property as in SPARQL property paths.
  • Answer (HK): This is about SPARQL property paths only, no inferencing in the sense of adding triples.
  • Objection (labra): It should be more clear it we are talking only about "Transitive property traversal" or general property paths as in SPARQL. Adding general property paths could increase too much the complexity of the solution as is described in this paper.
  • Answer (HK): The ticket is only mentioning transitive property traversal, so I believe you can withdraw your objection. Having said this, I currently expect the group to agree on SPARQL for the formalism, so the actual specification may include any SPARQL-based path.

Expressivity: Aggregations

Some constraints require aggregating multiple values, especially via COUNT, MIN and MAX.

  • Status: Under consideration
  • Derived from: S22
  • Votes: HK, SSt:+0.5, DK, KC-0, labra: -0.5, ericP: -1
  • Commment (pfps): If S22 is going to be appealed to in this requirement, the story list should have an example of the needed expressivity.
  • Answer (HK): The linked page http://www.w3.org/TR/vocab-data-cube/#wf-rules already has examples for MIN and COUNT (scroll down to IC-12 and IC-17).
  • Comment (SSt): It's maybe difficult to link to values that should be aggregated without having some sort of nested querying?
  • Answer (HK): Nesting is, for example, covered by user-defined functions (like in SPIN).
  • Objection (labra): Adding Aggregations would probably increase the complexity of the language, specially if they have to be covered by user-defined functions
  • Answer (HK): Aggregations are needed to count property values, so if we want the language to be self-describing then we need to support them. Also, they are part of SPARQL already so all we need to agree on would be to allow any SPARQL query, adding zero implementation costs.

Expressivity: Named Graphs

Some constraints require looking up information from other named graphs, for example to verify that certain values exist in a controlled vocabulary or background knowledge. This information is usually not explicitly imported into the query graph, and having all sub-graphs in the default query graph would be too inefficient.

  • Status: Proposed
  • Derived from: S5
  • Votes: HK, SSt:+0.5
  • Comment (pfps): This may be about graphs that are accessible via URLs, not named graphs.
  • Answer (HK): No, it's about named graphs, independently of where they are retrieved from. In the EPIM use case, a named graph of background knowledge (NPD Fact Pages) lived in a separate file on the server's hard drive, but that's not an important consideration.
  • Comment (KC): I would support a requirement for graphs that are accessible via URLS, named or not. This is needed in my area.
  • Answer (HK): The mechanisms to look up graphs are already covered by general Linked Data principles and the SPARQL SERVICE keyword in particular.

Expressivity: Closed Shapes

Some data recipients will not act as generic triple stores. "Closed shapes" reject any graph that has triples that do not match something in the shapes. (The control can probably be applied to the whole schema rather than individual shapes. At least, there's no use case or implementation experience to the contrary.)

  • Examples: issue report.
  • Status: Under consideration
  • Derived from: S4
  • Votes: ericP, SSt:+1, labra: +1
  • Comment(SSt): So basically letting the validation fail, if (crucial) information is missing? Maybe we would then need some sort of mechanism to indicate "must have" info and "if present than has shape xyz" information.
  • Comment (HK): I think this requirement can already be represented by "Non-Existence of Patterns" (FILTER NOT EXISTS).
  • Comment (HK): I did not vote yet, but I am not very positive about this requirement. One of the main features of RDF is extensibility. Not allowing to add extra annotation properties etc is probably taking the close world a bit too far. Why can't the tools that process the incoming data just ignore any triples that have no meaning to it? Most of the "closing off" already happens on a per-property level (e.g. maxCardinality). To make a stronger point, maybe you can point at more user stories.

Macro-Language Features

The language should enable the definition of macros as shortcuts to recurring patterns, and to enable inexperienced users define rich constraints. Macros should be high-level terms that improve overall readability, separation of concerns and maintainability. This overlaps with the already approved "Higher-Level Language".

  • Status: Under consideration
  • Derived from: S5, S7, S16, S21, S27, S28, S32
  • Votes: HK, pfps, SSt:+1, KC+1, labra: 0
  • Comment (KC): with the caveat that we need to determine if these macro features are duplicated in modularization.
  • Comment (labra): It should be more clear what you mean by "macros", sometimes it seems a macro is the same as a shape in ShEx, while others it looks like something else.

Shape Addressability

It should be possible to encapsulate a group of constraints (a Shape) into a named entity, so that the Shape can be reused in multiple places, also across the Web.

  • Status: Under consideration
  • Derived from: S7, S16, S28
  • Votes: HK, pfps, SSt:+1, KC+1, labra: +1
  • Comment (labra): This requirement is very similar to the Addressability requirement that has already been approved.

Function and Property Macros

In order to support maintainable and readable constraints, it should be possible to encapsulate recurring patterns into named entities such as functions and dynamically computed properties. This requirement is orthogonal to almost every user story.

  • Status: Under consideration
  • Derived from: S5, S16, S28
  • Votes: HK, pfps, SSt:+1, KC+1, labra: -0.5, ericP: -0.5
  • Objection (labra): It is not clear from the name of the requirement and the description what is the recurring pattern that is encapsulated, it looks similar to a Shape as in ShEx, but it could be clarified, maybe with some example.
  • Answer (HK): Here I was trying to abstract away the concept of user-defined SPIN/LDOM functions to extend the expressivity and maintainability of SPARQL queries.
  • Objection (ericP): Why do we need this and #Constraint_Macros?</span>
  • Answer (HK): Constraint macros would only allow users to reuse a complete constraint (e.g. complete SPARQL query). The user-defined functions are useful for any SPARQL query, including direct SPARQL constraints. Furthermore, having functions allows us to define more readable queries ourselves, e.g. when we formalize the semantics of ldom:maxCount in SPARQL (using a function lik ldom:valueCount). Finally, functions allow users to express certain recursive use cases.

Constraint Macros

Some constraint patterns are recurring with only slight modifications. Example: SKOS constraints that multiple properties must be pairwise disjoint. The language should make it possible to encapsulate such recurring patterns in a parameterizable form. Examples include SPIN/LDOM Templates.

Nested Constraint Macros

It should be possible to combine the high-level terms of the constraint language into larger expressions using nested constraints. Examples of this include ShEx, Resource Shapes' oslc:valueShape and owl:allValuesFrom.

  • Status: Under consideration
  • Derived from: S32, S33
  • Votes: HK, pfps, SSt:+1, KC+1, labra: +0.5
  • Comment (pfps): This doesn't look much like a macro facility, just the ability to have nesting within constraints.
  • Answer (HK): The reason why I have organized this as a Macro feature is that this is about reusing high-level macros (e.g. implemented as SPIN Templates) instead of writing low-level constraints (e.g. in SPARQL). SPARQL already has "nesting" built-in, so this ticket is about reusing higher level terms such as min/max cardinality in a ShEx-like data structure.
  • Comment (KC): but I think this duplicates some of the requirements for modularity, so I think we should clarify this and eliminate duplicates.
  • Comment (labra): I agree with the concept of Nested definitions but I think the requirement should be called "nested shapes". In that case, I think it is important to be able to describe nested definitions like having a "Course" shape with a property ":student" that links to a "Student" shape, and a "Student" shape with a property ":course" that links to Courses.
  • Answer (HK): This requires a firmer definition of what s Shape is. To me it's a collection of constraints at a given node. But the shape of something can also be expressed via SPARQL queries (BGPs etc). So in this ticket I wanted to be clear that we are allowing Templates to be nested to build :valueShape-like expression trees.


ericP edited to here

Inheritance of Constraints

According to the general semantics of RDF Schema (and the intuition of most users), any instance of a subclass is also an instance of a superclass. Therefore, any constraints on the superclass also need to apply to the subclasses. Subclasses can only further constrain, i.e. narrow down, inherited constraints.

  • Status: Proposed
  • Derived from: S2, S5, S10, S11, S19, S20, S24, S25, S27, S28, S29
  • Votes: HK, KC+1, labra: -0.5, SSt: +1
  • Comment (pfps): I think that we need to be careful when talking about inheritance. We could instead just say that constraints on classes apply to all instances of the class, particularly including instances of subclasses.
  • Answer (HK): You may need to convince me about this one. I think the interpretation of rdfs:subClassOf (e.g. in OWL 2 RL) is quite consistent with the usual meaning of inheritance in an OO sense: all instances of subclasses are also instances of superclasses. Why do you think the term inheritance is problematic? I am not talking about the non-inheritance of rdfs:domain here.
  • Comment (pfps): Inheritance is generally used to mean the sort of inheritance that you get in OO languages, where overriding and priority become important. If these aspects of inheritance are not wanted, then I think a different word should be used.
  • Answer (HK): Looking at http://en.wikipedia.org/wiki/Inheritance_%28object-oriented_programming%29 I think this matches quite well, while "subtyping" does not match well (we want behavior to be inherited). Technically, our approach would not have a syntactic means to "override" something. Instead, the system would always execute all constraints from superclasses, and instances need to fulfill the intersection of all constraints. As a consequence, people can use de-facto overriding to narrow down the cardinality or value type of a property in a subclass.
  • Comment (KC): I vote +1 for this, but substituting Peter's suggested worded.
  • Objection (labra): After reading the description I think this requirement is more about a set of constraints (or shape) that extends another set of constraints (another shape), rather than about classes and subclasses as in OWL. As there is another issue pending about separating types/classes from shapes, I think we should change the description to reflect that difference.
  • Comment (SSt): +1 for the general intent of this requirement.

Global Constraints

It should be possible to specify constraint conditions that need to be checked "globally" for a whole graph, without referring to a specific set of resources or class. In programming languages such global entities are often called "static", but "global" is probably better known.

  • Status: Under consideration
  • Derived from: S35
  • Votes: HK, pfps, KC +1, labra: 0, SSt: +1
  • Comment (labra): I think this requirement is more about the process of how to select which nodes are being validated (all the nodes vs the nodes that belong to some class). Maybe, we could have a different requirement about how to select the nodes that will be validated.

Vocabulary for Constraint Violations

Instead of just reporting yes/no, the language needs to be able to return more meaningful messages including severity levels, human-readable error descriptions and pointers at specific patterns in the graph.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S3, S34 (and almost every other User Story)
  • Votes: HK, DK, pfps, SSt:+1, KC, labra: 0.5
  • Comment (DK): I am not sure if the group header is representative, would something like "Supported Decorations for constraints" fit better? Vocabulary is too general and can cover many other requirements.
  • Comment (DK): minor description revision suggestion: "The constraint language should facilitate the reporting of more meaningful [...]"
  • Comment (labra): Maybe, we should rewrite this requirement to state that instead of yes/no, the language needs to return more meaningful information about the validation process. That "meaningful information" could be something like the PSVI (Post-schema validation infoset) from XML Schema which could later be converted to human-readable messages. We could generate a PSVG (Post Schema Validation Graph) which could be the input RDF graph enriched where the nodes could contain properties about their shapes.
  • Answer (HK): The language should be easy to use, so adding a layer of complexity between user message and the engine may backfire. But for nested shapes (e.g. something is violated deep within a tree), then a path to that should certainly be possible. In my current SPIN experiments, I added a property that links a parent constraint violation with children that provide details.

Severity Levels

The language should include at least the following severity levels: Warning, Error, Fatal Error (Fatal means evaluation can stop) but should allow the creation of other levels as desired, through subclassing. Maybe we also need Info/Debug?

  • Status: Under consideration
  • Derived from: S3
  • Votes: HK, DK+1, SSt:+0.5, KC:+1, labra: -0.5
  • Comment (DK): I have a few existing use cases for "info" e.g. in an owl:SymmetricProperty we check that statements about the involved resources are made in both directions.
  • Comment (KC): I think it would be better to indicate that there will be the capability for severity levels; that declaration of these will be part of the development of the validation profile; and perhaps that a small number will be included as defaults (but I'm not sure why that is necessary).
  • Answer (HK): The design that I have in mind would have a hard-coded class :ConstraintViolation and any number of subclasses, including :Error, :Warning etc. Users are free to add more subclasses, but it would make everyone's life easier if we agreed on a fixed vocabulary for the most commonly needed levels.
  • Objection (labra): I think the constraint validation process can be a yes/no process in order to simplify the solution. However, we could model security levels with different shapes or adding some meta-information about the constraints which could indicate if not satisfying them would imply a Warning, Error, Fatal error, etc.
  • Answer (HK): Yes/no would not be sufficient. In the LDOM draft, each constraint may have a field ldom:level that indicates the severity. Maybe this would address your requirement. However I feel you are already talking about a solution while here we are just collecting requirements.
  • Comment (labra): I was trying to avoid talking about a solution and in fact, I think what I proposed was the same as you: To add some meta-information (the field ldom:level, for example) that indicates the severity. But at the end, the validation process returns a yes/no on that constraint. What I meant is that the validation process could remain as yes/no, while the constraints are the ones that have the meta-information about their severity.
  • Comment (HK): Some constraints require dynamically picking the property that caused the violation, dynamically constructing error message, dynamically identifying the root and end node that caused the violation. Therefore LDOM supports ASK, SELECT or CONSTRUCT constraints, and only having YES/NO is not sufficient. With ASK and SELECT, the level can be declared for all constructed violations, but for CONSTRUCT this is not easily possible.
  • Comment (labra): I think you are now talking about a solution :), anyway, I think what you propose is precisely what I called before as a Post-Schema-Validation-Graph, i.e. some information that is dynamically generated by the validation processor and that can be returned by it. So the validation process returns YES/NO with a data structure that contains information about the validation. That information should be machine processable and could easily be converted to human-readable messages.

Human-readable Violation Messages

The language should make it possible for constraint checks to create human-readable violation messages that can be either created explicitly by the user or generated dynamically from constraint definition. It should be possible to create such messages in multiple languages.

  • Status: Under consideration
  • Derived from: S3
  • Votes: HK, DK+1, pfps, SSt:+1, KC, labra: -0.5
  • Objection (labra): I would prefer that the result of the validation process was some structure that could be processed by machines. This structure could later be converted to human-readable violation messages in multiple languages. I am not sure if we should provide such a conversion as a requirement.
  • Answer (HK): The machine-processable bit is covered by the other requirements, e.g. the ability to point at a path and a specific node. That should be enough to, for example, highlight a property value on a form. The ability to create a human-readable message however is still needed and cannot be generalized because they are often ontology-specific (e.g. "Company Z is not the operator of wellbore XY" in EPIM. How could that be machine-generated, and why the overhead if we can simply concatenate a string?
  • Comment (labra): To clarify my point of view, I would vote in favour of a separation of concerns here. We could define the data structure that the validation process returns (I would vote +1 for this), and the conversion from that data structure to human-readable messages (I would vote 0 for this)
  • Answer (HK): It is clear that from a violation data structure, a tool can provide generic human readable messages. For example: ldom:root=ex:MyPerson, ldom:path=ex:child, ldom:value=ex:MyPerson could produce: "Constraint violation at ex:MyPerson: Invalid value "ex:MyPerson" for property "ex:child". This is easy. However, some details cannot be machine generated, e.g. "A person cannot have itself as a child". However, such things can be implemented in a reusable way using Templates (here: a template to check irreflexivity). Does this clarify it?

Constraint Violations should point at Specific Nodes

The language should make it possible for authors of constraint checks to produce pointers at specific nodes and graph fragments that caused the violation. Typical examples of such information includes the starting point (root node), a path from the root, and specific values that caused the problem.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S3
  • Votes: HK, DK+1, KC:+1, SSt:+1, labra: +1
  • Comment: (KC) This sounds to me like a requirement for expressivity of error messages - that each error needs to be able to return a message specific to the violation found. I don't see this as limited to naming a node.
  • Answer (HK): Yes, the ability to produce context-specific error messages is important and covered by "Human-readable Violation Messages" above. However, error messages are usually not machine-readable. This requirement here is to help automated tools to point at the specific nodes that are broken, e.g. to highlight them in user interfaces or suggest auto-corrections.

Modularization

The language should support organizing constraints in different groups, modules or graphs, and provide mechanisms to allow modules to point to each other.

  • Status: Proposed
  • Derived from: (see individual tickets below)
  • Votes: HK, KC +.5
  • Comment (KC): DCMI is contemplating whether we need to require that the language can describe sub-graphs. This may be already covered in the requirement on "complex constraints." If not, a story would be very helpful here.
  • Answer (HK): Maybe Expressivity: Named Graphs is related to your use cases?
  • Comment (SSt): Would something like: "validate to true if the group of UI constraints validates to true" be an example for that requirement?
  • Answer (HK): This is rather another grouping requirement that only exists as parent of others. We probably should not vote on this req itself.

Organizing Constraints in Named Graphs

The language should support using the standard linked data concept of named graphs (datasets) to organize constraints. Such named graphs have a URI that is resolvable in the context of the application (e.g. on the public web via HTTP). Applications may define their own look-up mechanism to resolve such named graphs (e.g. to local database graphs or files). This includes the ability to separate a domain model from constraints.

  • Status: Proposed
  • Derived from: S5, S6, S7, S13, S15, S20, S24, S28
  • Votes: HK
  • Comment (pfps): Are named graphs a standard part of LD? If this requirement can stand independently of LD then why bother to appeal to LD.
  • Answer (HK): Yes, named graphs are standard part of Linked Data in RDF 1.1 and SPARQL datasets. So maybe we do not need to repeat this requirement here. I just thought it should be explicitly covered because it is relevant to so many stories, and a potential solution to the context-sensitivity required by many user stories.
  • Comment (pfps): I don't think that this is really about named graphs at all. I think that it is about being able to organize constraints in web-accessible documents and to incorporate them by URL.
  • Answer (HK): Web-accessibility is a part of the overall named graph requirement, but not key because many applications re-direct named graph look ups, e.g. to local databases or files.

Including Named Graphs for Query Evaluation

The language should support including named graphs (similar to owl:imports) so that all constraints from the (transitively) included graphs are also applied for evaluation. Conceptually, all included graphs are a union graph that becomes the default query graph of the constraint evaluation.

  • Status: Proposed
  • Derived from: S5, S6, S13, S20, S24
  • Votes: HK
  • Comment (pfps): I don't think that this is about named graphs.
  • Answer (HK): Why not?
  • Comment (pfps): What is query evaluation and the query graph?
  • Answer (HK): These are SPARQL terms, and given that SPARQL has been recommended as the back-bone of this WG so many times I hope that people understand these terms. In OWL it would be the graph that the class expressions are evaluated (= queried) against.

Efficiency of the validation process

The efficiency and complexity of the language should be taken into account. At least, it should be possible to identify some profiles of the language with minimum complexity.

  • Status: Proposed
  • Derived from: S34 (plus many other scenarios with large graphs)
  • Votes: labra: +1, HK 0
  • (Withdrawn) Objection (HK): This risks repeating the mistakes of OWL DL. In our experience it would be a huge mistake to limit the expressivity of the language only because of theoretical worst-case performance. The whole expressivity of SPARQL should be accessible. The ability to identify subsets that offer certain performance guarantees is already covered by the Profiles requirement. I would not mind having researchers work on optimized profiles, but the overall language needs to be unconstrained. It is the responsibility of the constraint authors to make sure that queries respond in reasonable time (just like with any programming language).
  • Comment (labra): I think performance and efficiency of the solution should be taken into account. In the description I deliberately omitted a particular complexity boundary, only said that we could at least take into account the efficiency of the solution as a parameter and even if the language is converted to SPARQL, we could try to define a subset or profile that had better performance. I think this requirement is compatible with the "profiles" requirement and it in fact departs from the requirement that it should work on large databases.
  • Answer (HK): I sympathize with your view point and believe that interesting research will be possible on different profiles of the overall scope. However, I cannot live with the current wording "its complexity should be minimal". This sounds like you want to prune the expressivity (of SPARQL) for everyone, and in our experience this risks repeating the OWL DL mistake, which was to design a language based on theoretical performance without covering real-world requirements. In other words, if there is a choice between expressivity and performance guarantees, then I would always favor expressivity. Even with high expressivity, it is perfectly fine to create efficient queries.
  • Comment (labra): So I really think we agree on the issue, would you agree with it if we separate "its complexity should be minimal" by "its complexity should be taken into account" ? My main point is that we could try to define profiles which could offer better perfomance that just SPARQL. For example, if we keep only structural constraints and leave the SPARQL queries apart we could have a profile that offered good performance and at the same time was expressive enough for some application scenarios.
  • Answer (HK): I would withdraw my objection if we change the wording to something like "Some profiles of the language should have minimum complexity." This applies both to performance and to the complexity of the algorithms needed to make sense of the declarations. For example, a profile may only include plain property definitions with min/max/valueType etc that is trivial to parse by JavaScript based UI tools. In the context of LDOM I guess any such profile would exclude the full expressivity of SPARQL, but be entirely based on templates, and those templates could have a ldom:profile property to link to the URI of known profiles.
  • Answer (labra): I have changed the wording.
  • Answer (HK): Ok, I have withdrawn my objection as discussed. I suggest we leave the discussion here for historical purposes.

Execution on large databases

The language should be efficient to execute on large databases, so that the execution engine can exploit native optimizations from the database. Some data that is needed for execution (such as the constraint definitions themselves, macros and functions) may not be present on each graph on the database. Therefore, it should be possible to separate the graphs needed at constraint evaluation time from those graphs that hold the complete definition of the constraint checking context. A possible solution would be to have another kind of include mechanism that links a data graph with (macro) libraries. Another way is to have some on-demand validation system.

  • Status: Proposed
  • Derived from: S34 (plus many other scenarios with large graphs)
  • Votes: HK, labra: 0
  • Comment (pfps): Justification should be in the stories, not here, I think.
  • Answer (HK): I don't think so, some stories are just formulating a high-level requirement, while here I am trying to drill into an implementable solution.
  • Comment (labra): I think this requirement could be separated in several ones. One could be just that "The language should be efficient" and to say that the complexity of the language should be clear and minimized. I mean, we should look for some solution whose complexity could be identified and minimized. I would vote +1 for this one. Another requirement could be about the execution of the validation process against databases which could say that the validation process could be run on demand and taking into account native optimizations from the database. I would vote 0 for this one, as it would probably be out of the scope of the WG.
  • Answer (HK): Feel free to make those edits and adjust your vote.
  • Comment (labra): I separated the original requirement into two and removed my objection but kept the comments. I also changed the name, the original name was: Possible separation of Query and Library Graphs

Profiles

The language should include a notion of profiles, so that certain applications with limited features can only use certain elements of the overall language.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S11, S19, S32
  • Votes: HK, KC:+1, SSt:+1, labra: +1
  • Comment: (KC) DCMI will probably attempt to select a "core" set of constraints for simple applications. I assume this requirement means that one can implement only the aspects of the language that are needed for the function being addressed.
  • Answer (HK): Yes, a profile might be implemented as a collection of constraint templates (e.g. those under "Declaration of property..."). Each profile could have its own URI and applications could declare which profile(s) they support by pointing at those URIs, and engines could produce warnings if unsupported constraints are found in the model.
  • Comment (labra): this requirement could be separated in two. The first one would be about having the notion of profiles, and the second one about separating structural constraints from complex constraints. I would vote +1 to the notion of profiles. With regards to the separation of structural from complex constraints, I would also vote +1 if it was clear that it is independent from SPARQL.
  • Answer (HK): Please go ahead with splitting the requirement.
  • Comment (labra): Done

Grouping Constraints into Contexts

The language should make it possible to organize constraints so that they are applicable in certain contexts only. For example, application A may want to add constraints that do not apply for the more general application B. One approach would be to "tag" constraints with the URI of a context resource, and have the execution engine accept a context parameter to instruct it which constraints to ignore. Contexts could be organized into their own hierarchy and details would need to be worked out.

Separation of structural from complex constraints

The language should separate structural constraints from more complex constraints (like arbitrary SPARQL or nested constraint expressions) so that certain light-weight applications can validate the constraints without a full SPARQL processor.

  • Status: Under consideration
  • Derived from: S11, S19, S32
  • Votes: HK, KC:+1, SSt:+1, labra: +1, pfps: -1

Evaluating Constraints for a Single Node Only

It should be possible to validate constraints on a single node in a graph. This may be impossible to implement 100% correctly, because sometimes a change to a resource invalidates conditions in a very different place in the graph. However, the language could propose a framework that identifies those constraints that SHOULD be checked when a given node is evaluated, e.g. by following its rdf:type and the superclasses of that.

  • Status: Under consideration
  • Derived from: (Orthogonal to basically all stories)
  • Votes: HK, KC -1, SSt:+1, labra: +1
  • Comment (pfps): Having a story for this, or an example from a current story, would be useful.
  • Answer (HK): You could take any story about UI forms, where a given instance is being validated while other instances may exist on the client's graph.
  • Objection (kc): : I assume that one can create a "shape" that is limited to a single node without this requirement. If that's not the case, I will change my vote.
  • Answer (HK): This requirement is not about declaring constraints but how to use them at runtime. For example, a constraint may be declared for all instances of Person (attached to the class Person). Now, when an application runs the validator because it has a single Person instance on a form, it should not have to validate all instances at once. Instead it should be possible to start the evaluation process just for that single instance (much faster etc). The requirement is about a mechanism to locally scope constraints for a given resource only - as is done via the ?this variable in SPIN.
  • Comment (labra): This one and the static constraints requirement are about the selection process of which nodes to validate. I think we should group them in a new requirement or a set of requirements about how to select which node to evaluate. I added a new sub-section titled "Selection of nodes".

Selection of nodes

Votes: AR +1

There must be some mechanism to select which nodes are going to be validated/constrained. Some possibilities: global selection (all the nodes in the RDF graph), nodes by type (the instances of some class), and specifc nodes.

Global selection

It should be possible to select all the RDF nodes in a graph for validation. This is similar to the Global Constraints requirement

  • Status: Proposed
  • Derived from: S35
  • Votes: labra +1, HK +1, SSt +1
  • Comment (HK): My +1 is assuming that we are talking about "executing all constraints for a whole graph", not just nodes. Nodes may be split across multiple graphs, and furthermore we are really talking about arbitrary triple matches, not just individual subjects.

Selection by type

It should be possible to have some mechanism to select the nodes that are instances of some class for validation

  • Status: Proposed
  • Derived from: (Orthogonal to basically all stories)
  • Votes: labra 0, HK +1, SSt +1
  • Comment (labra): Although I prefer to clearly separate shapes from classes, I would not oppose to have some way to declare that the instances of some class should have some shape so the processor would select those nodes and check if they have the corresponding shape.
  • Answer (HK): The solution looks simple to me. "Shape" would be the superclass of "Class". Shapes have constraints attached to them, and by inheritance classes can also have such constraints. Another approach would be to allow the URIs of a shape to also be reused as a class, i.e. have both rdf:types ldom:Shape and rdfs:Class. In any case, the important bit is that selection needs to work by types.

Selection by single node

It should be possible to select a single RDF node for validation

Evolutionary Adoption Path

The standard should provide a reasonable evolutionary path from existing systems and data models into the new closed constraint checking world.

  • Status: Proposed
  • Votes: HK +1, DTA +1
  • Comment(SSt): You mean such a path should be covered in some documentation/document of the standard, if I'm not mistaken?
  • Answer (HK): I think this high-level requirement is more than about documentation. It should be a technical answer to the question of what to do with existing (linked) data, and existing ontologies. The goal here is to lower the cost of adoption through interoperability with existing solutions.

External

The requirements from Dublin Core are, in some sense, all unofficial requirements for the RDF Shapes Working Group. At some time, each of them might be considered by the working group.

Eric Prud'hommeaux created a tool to show a Hierarchical View of the Dublin Core Requirements.

Paper by Bosch, Eckert: Requirements on RDF Constraint Formulation and Validation (DC 2014): https://github.com/boschthomas/PhD/raw/master/publications/Papers%20in%20Conference%20Proceedings/Bosch%2C%20Eckert.%20Requirements%20on%20RDF%20Constraint%20Formulation%20and%20Validation%20(DC%202014).pdf