Warning:
This wiki has been archived and is now read-only.

Requirements

From RDF Data Shapes Working Group
Jump to: navigation, search

This page provides a (currently short) description of requirements for constraints/shapes relevant to the RDF Data Shapes Working Group and their status.

Contents

Process

Requirements can be: 1) proposed, 2) under consideration, 3) approved, or 4) rejected.

Anyone can propose a requirement. Each requirement needs a short description, the name or initials of the person who proposed it, and possibly an example in some constraint/shape technology.

Once a requirement has been proposed, it has to be endorsed by at least three persons to move to being under consideration [1]. Requirements under consideration will then qualify for full attention by the WG which will decide how to dispose of it.

The status of a requirement is indicated by one of the following:

  • Status: Proposed
  • Status: Under consideration
  • Status: Approved: <link to relevant resolution>
  • Status: Rejected

To endorse a requirement, just add your name or initials to the Votes line. If you're the third person, change the status from "Proposed" to "Under consideration" using the above markup.

Requirements

Higher-Level Language

Constraints/shapes shall be specifiable in a higher-level language with 1. definitional capabilities, such as macro rolling up and naming, and 2. control infrastructure for, e.g., recursion.

Concise Language

Constraints/shapes shall be specifiable in a concise language

  • Status: Approved: F2F1 meeting 30 October 2014.
  • Derived from: Dublin Core Requirement 184.
  • Votes: ArthurRyman +0, HaroldSolbrig +1, michel: +1
  • Comment (hs): While this is a tad vague, we have three different uses for shapes:
    • Shapes as constraints: Determine whether an RDF Graph/Dataset meets the requirements asserted in a shape
    • Shapes as a query language: Use shape definitions to return a subset of a graph that meets the shape requirements. Note that this is not intended to be the same as SPARQL, (although, arguably, SPARQL could be a "compiled" language), as the purpose is to isolate data sets that have the same form and data types -- it is quite possible that the referents may be quite different.
    • Shapes as documentation: Declare in a concise, unambiguous way the structuring rules that are in place in a given data set or triple store. It is this third requirement that leads to the "concise language"

Addressability

Collections of constraints/shapes may be addressable and discoverable. Individual constraints/shapes may be addressable and discoverable.

Annotations

Constraints/shapes may incorporate extra information that does not affect validation. It shall be possible to search for constraints/shapes with particular extra information.

Constraints/Shapes on Properties

Association of Class with Shape

There must be an "easy" way of associating a shape with a class, meaning that nodes in a graph that are instances of that class must conform with that shape

  • Status: Approved: F2F2 meeting 17 February 2015.
  • Derived from: S3, S10, S11, S12, S13, S15, S19, S20, S29, S36
  • Votes: labra: +1, Dimitris: +1, SimonSteyskal: +1, hknublau: +1, cygri: +1, kcoyle: +1, TallTed: +1, hsolbrig_: +1, ericP: +1, iovka: +1, ArthurRyman: +1, michel: +1

Property Min/Max Cardinality

The stated values for a property may be limited by minimum/maximum cardinality, with typical patterns being [0..1], [1..1], [0..*] and [1..*].

  • Status: Approved Telecon 12 February 2015
  • Derived from: S10, S11, S13, S19, S20
  • Votes: HK, DTA, pfps, SSt:+1, ericP:+1, KC:+1, labra: +1, michel: +1
  • Comment: (KC) We have cases where there is min/max on types/graphs; there must be one instance of classX in each graph. Is that covered by this requirement?
  • Answer (HK): No, but the ability to express "there must be one instance of classX" is covered by a combination of other requirements: Complex constraints, aggregations (for counting), function macros and named graphs. However, I did notice that we have not yet recorded the requirement to have "global" (aka "static") constraints. I have added such an entry as "Static Constraints" below. You may want to add another user story to support this requirement.

Property Datatype

The values of a property may be limited to be an RDF Literal with a stated datatype, such as xsd:string or xsd:date.

  • Status: Approved: F2F2 meeting 17 February 2015.
  • Derived from: S10, S11, S13, S19, S20
  • Votes: HK, pfps: -1, SSt:+1, KC:+1, ericP:-1, michel: +1
  • Objection (ericP): Class vs. Shape. PROPOSE The shapes language will include constraints on datatypes of RDF Literals
  • Comment (pfps): That's a different kind of constraint. This one would allow both "1"^^xsd:int and "1"^^xsd:byte as acceptable for xsd:integer where Eric's proposal would not.
  • Objection (pfps): A recent edit to the document has changed this requirement from one I agree with to one I don't. The two versions are elaborated below.
Property Datatype Literal

The object of triples for a property may be limited to be an RDF Literal with a stated datatype, such as xsd:string or xsd:integer. The literal "1"^^xsd:int would *not* match against xsd:integer.

Property Datatype Value

The value of triples for a property may be limited to be an RDF Literal belonging to a datatype, such as xsd:string or xsd:integer. The literal "1"^^xsd:int *would* match against xsd:integer.

Property Type

The values of a property may be limited by their type, e.g., all children have to be of type person.

Property's RDF Node Type (e.g. only IRIs are allowed)

The values of a property on instances of a class may be limited by their RDF node type, e.g. IRI, BlankNode, Literal, or BlankNodeOrIRI (for completeness we may want to support all 7 combinations including Node as parent).

  • Status: Approved: F2F2 meeting 17 February 2015.
  • Derived from: S8
  • Votes: HK, ericP:+1, pfps:-0, SSt:+1, KC:+0, hs:+1, michel: +0
  • Comment: (KC) Is the only difference between this requirement and the one above the ability to require blank nodes? Because what this seems to be to me is a requirement that allows you multiple value types, and those types are "OR"d. This could be useful in cases where the value is: "select from this list, or provide a new value not from the list."
  • Answer (HK): Node type and value type are orthogonal to each other, and restrict another dimension of the RDF data model (value type is about rdf:type triples, while node type is about the kind of RDF node). To create a union of classes, it is already possible to create a shared superclass (e.g. PersonOrOrganization which has Person and Organization as subclasses). The issue about "or provide a new value not from the list" looks to me like something that needs to be solved by the UI tool, which could create a new instance on the fly, but does not look like a structural constraint to me.
  • Comment (ericP): Propose to call this "Declarations of RDF Node Type" to clarify which the the three uses of "type" we mean..
  • Answer (HK): Yes that sounds good. I had the term "Property" in all sibling requirements, so for consistency I have changed it to "Declaration of Property's RDF Node Type" if that's OK. Maybe we can also use a different term altogether, such as "Node Kind" but I am not sure what is established.
  • Comment (pfps): I don't think that this is a good idea, but I'm not voting against it.
  • Answer (HK): Ok, let's try to get this sorted out. I believe that entailment is orthogonal to this requirement, and would impact almost any other requirement too. I believe we have an open ISSUE to discuss entailment in general. I would define the behavior of this using the SPARQL built-ins isIRI, isBlank, isLiteral. If a database decides to turn a blank node into a IRI (skolemization?) on the fly then it violates the semantics of SPARQL. Overall, I don't believe you really object to the requirement (there are User Stories about this), but rather about the solution, and we need to look into the details of that once we have a proposal.

*Defunct/Moved* (was: Property Default Value)

See Property Default Value

*Defunct/Moved* (was: Property Labels at Shape)

See Property Labels at Shape

*Defunct/Moved* (was: Property Comment in a Shape)

See Property Comment in a Shape

Datatype Property Facets

For datatype properties it should be easy to define frequently needed "facets" to drive user interfaces and validate input against simple conditions. The sub-requirements below should be voted upon separately.

  • Comment (pfps): Is this the *ability* to do this or just the ability to do it *easily*?
  • Answer (HK): We could enumerate all the facets that we want to cover, including those from XML Schema (and later OWL 2 datatypes). I wanted to keep it short for now. Whether something is easy is relative, so for now it's more about the ability itself.
  • Comment (DTA): I am confused by these requirements. It seems as if the same property will be displayed and behave differently, if it is used for a member of one class than it will when used for a member of another. This seems to be what we have normally thought of as two different properties. Am I confusing what is going on here?
  • Answer (HK): This resembles local owl:Restrictions more than global rdfs:ranges. You can have multiple owl:Restrictions with different value types in different classes, and this requirement is about similar scenarios. In other words, the same property IRI can appear in different classes.
  • Objection (ericP): I'd like to treat facets as separate reqs. I've voted +1 for min/max above; I'd vote 0 for regex and strlen.
  • Answer (HK): Split into individual requirements.
  • Comment (pfps): Is this the ability to do it or the ability to write it down easily?
  • Answer (HK): I have clarified that this is about "easy", i.e. part of the high level vocabulary. The vote here is about the general requirement, independent of whether it becomes "core" or "high-level" vocabulary in the end.
Datatype Property Facets: min/max values

Similar to xsd:minInclusive/maxExclusive

  • Status: Approved Teleconf 12 March 2015
  • Derived from: S3, S11, S12, S13, S19, S20, S29
  • Votes: HK, ericP:+1, labra: 0, KC:+0, AR: +1, DK: +1, SSt: +1, michel: +1
Datatype Property Facets: regular expression patterns

Pattern matching against regular expressions (xsd:pattern)

  • Status: Approved Teleconf 12 March 2015
  • Derived from: S3, S11, S12, S13, S19, S20, S29
  • Votes: HK, ericP:0, labra: 0, KC:+0, AR: +1, DK: +1, SSt: +1, michel: +1
Datatype Property Facets: string length
  • Status: Approved Teleconf 12 March 2015
  • Derived from: S3, S11, S12, S13, S19, S20, S29
  • Votes: HK, ericP:0, labra: 0, KC:+0, AR: +1, DK: +1, SSt: +1, michel: +1

Property Value Enumerations

Shapes will provide exhaustive enumerations of the valid values (literals and IRIs).

  • Status: Approved Telecon 12 February 2015
  • Derived from: S3, S11, S37
  • Votes: HK, DTA, KC+1, ericP:+1, pfps:+1, SSt:+1, labra: +1, hs:+1, michel: +1
  • Comment (DTA): I am supporting this with the understanding that a solution might be as simple as using something like a skos:Concept to mean a dedicated picklist for this situation (i.e., I am reading this as a requirement, not as a spec for a solution).
  • Comment (KC) Note that S37 provides cases not explicitly mentioned here.
  • Answer (HK): Yes, S37 has additional requirements that are covered elsewhere. If you have uncovered requirements from S37, please feel free to add them as new entries.
  • Comment (KC): The "exhaustive enumeration" bothers me, because we work with "enumerations" in the hundreds of thousands. Can I indicate that the value space must come from one or more vocabularies as defined by their URI pattern? The DTA use of skos:Concept can sometimes help, but not all lists are SKOS lists. (e.g. Geonames, DBpedia)
  • Answer (HK): Restricting by graph or namespace are not covered by this requirement here. They could be expressed using the Complex constraints (assuming they are in SPARQL). All of the "Declarations of..." requirements are simply syntactic sugar for the most commonly needed patterns. If you believe the by graph or namespace is common enough then please feel free to record them as additional requirements on the same category as this ticket here.
  • Warning (pfps): This requirement is underspecified for objects that are literals. Is it about values or literals (i.e., does "1"^^xsd:int match "1"^^xsd:integer)? The wording can be read either way ("valid value" vs "literal").

Properties Used in Inverse Direction

Shapes can have constraints where the tested node is the object of a triple.

  • Status: Approved Telecon 12 February 2015
  • Derived from: S36
  • Votes: HK, pfps, SSt:+1, ericP:+1, KC:0, labra: +1, hs:+1, michel: +1

Primary Keys

It is often useful to state that a given (datatype) property is the "primary key" of a class, so that the system can enforce uniqueness and also automatically build URIs from user input and data imported from relational databases or spreadsheets.

  • Status: Proposed
  • Derived from: S5, S25
  • Votes: HK, SSt:+0.5, ericP:0, labra: -0.5, KC:0, DK: +0.5, michel: +0
  • Comment (pfps): What is a primary key? Is there any constraint that derives from this, or should this just be for any key?
  • Commment (pfps): Is there a constraint component of the building URIs part of this requirement?
  • Answer (HK): This is all detailed at https://www.w3.org/2014/data-shapes/wiki/Primary_Keys_with_URI_Pattern (see the SPIN snippet at the end) and I would not want to repeat all this info here.
  • Comment (pfps): What makes a primary key different from a key?
  • Answer (HK): I am not sure if the term "key" is clear enough, but sure this could be changed.
  • Comment (ericP): The use cases I've seen involve quite a lot of complexity because the key is not only unique to a particular graph, but instead to a business process which spans many graphs and datasets, e.g. medical terminology codes, UPC codes, etc.
  • Answer (HK): Yes there are complex cases where no simple design pattern is sufficient. Those would need to be expressed as "complex constraint". If a key must be unique across multiple graphs then the graph declaring the constraint needs to reference the other graphs (e.g. via an include).
  • Objection (labra): Although I think adding primary keys may be interesting for simple cases, doing it right may increase the complexity of the solution as we should consider cases like compound keys, uniqueness, null values, key-references, etc.
  • Answer (HK): Yes, this ticket here would only cover the basic case of a single property. More complex cases can be represented using new templates that anybody can create and publish.

Language Tags

SHACL must have a mechanism to constrain the language tags of RDF literals.

  • Status: Approved [Telecon 23 March 2015]
  • Derived from: Requirement 6.4

Complex Constraints

The language should allow users to implement constraints that check complex conditions, with an expressivity as covered by the following sub-requirements (e.g. basic graph patterns, string and mathematical operations and comparison of multiple values).

  • Status: Approved Telecon 12 February 2015
  • Derived from: S5, S21, S22, S23, S25, S26, S27, S30
  • Votes: HK, SSt:+1, KC:+1, labra: 0
  • Comment (labra): I agree with some of the "complex constraints", but I am not so sure about others

Expressivity: Patterns

Many constraints require matching patterns within the graph, often represented via linked triple patterns (SPO) and property paths. Requires variable bindings for matching, so that multiple values can be compared with each other.

  • Status: Proposed
  • Derived from: S1, S5, S17, S21, S22, S23, S26, S27, S30
  • Votes: HK, DTA, SSt:+1, pfps:0.5, KC+0, labra: -0.5, ericP: -1
  • Comment (pfps): Is there a constraint component of the variable bindings part of this requirement?
  • Answer (HK): I wanted to express that many constraints require comparing multiple values (added).
  • Comment (pfps): Is "variable binding" the requirement, or just being able to compare multiple values?
  • Answer (HK): I guess it's the ability to pick a node from one place in the graph and compare it with another node in another place in the graph.
  • Comment (pfps): As long as the requirement is not directly for variable bindings, but instead for being able to compare then I'm fine with this.
  • Comment (SSt): So e.g. a constraint should be able to check whether two particular values of ex:name are identical?
  • Answer (HK): Yes, checking that two instances have the same value is a classical use case here, but many others too.
  • Objection (labra): The requirement looks too SPARQL-specific. Is there some simple motivating example? I was looking at the list of user stories that are mentioned and it was not clear.
  • Comment (pfps): If pattern is not read as "SPARQL BGP" then pattern is probably OK, but I agree that some more neutral language would be better.
  • Answer (HK): I have no suggestion for a more neutral language. I really just want the equivalent of SPARQL BGPs, and we have plenty of practical examples almost everywhere.
  • Objection (ericP): is this a requirement for the core language or an extension? I beleive we should only be expressing requirements on the core language and this is beyond any proposals.
  • Answer (HK): If you mean SPARQL with "extensions" then this requirement is in the extensions. But to me, SPARQL is a core feature, even if not all engines have to fully support it. This catalogue of requirements here is about *everything*, not just specific profiles of the overall language.
  • Comment (HK): I have changed the title from Basic Graph Pattern to Pattern, due to lack of better words. Does this address the concern that this is too SPARQL-specific?

Expressivity: Non-Existence of Patterns

Many constraints require that a certain pattern does not exist in the graph.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S1, S2, S22, S23
  • Votes: HK, pfps, SSt:+1, ericP:+1, KC+1, labra: +1, hs:+1, michel: +1
  • Comment (pfps): I'm reading this as being about the absence of information, such as having no fillers, or at most one filler.
  • Answer (HK): It's basically FILTER NOT EXISTS in SPARQL.

Expressivity: String Operations

Some constraints require building new strings out of other strings, and building new URIs out of other values.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S5, S23
  • Votes: HK, SSt:+1, DK+1, labra: 0, KC:0, michel: +1
  • Commment (pfps): If S5 is going to be appealed to in this requirement, the story should provide an example of string operations in the story list.
  • Answer (HK): The EPIM_ReportingHub already had some examples, but I have added details

Expressivity: Language Tags

Some constraints require comparing language tags of RDF literals, e.g. to check that no language is used more than once per property.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S21
  • Votes: HK, SSt:+1, KC+1, labra:+1, DK:+1, michel: +1
  • Comment (labra): I voted +1 to have mechanisms to compare language tags although I think the last phrase "Also to produce multi-lingual error messages" should be a separate requirement.
  • Answer (HK): Yes good point, I have added a sentence about multi-lingual constraint messages to the corresponding ticket below, and removed the sub-sentence on error messages.

Expressivity: Mathematical Operations

Some constraints require mathematical calculations and comparisons, e.g. area = width * height.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S5
  • Votes: HK, SSt:+1, DK+1, KC+1, labra: 0, hs:0, michel: +0
  • Commment (pfps): If S5 is going to be appealed to in this requirement, the story should provide an example of mathematical operations in the story list.
  • Answer (HK): Added an example of the + operator to the EPIM ReportingHub page. If you want, I can request adding a new story on QUDT unit conversion or the Rectangle example mentioned above. But the need for these things should be obvious.

Expressivity: Literal Value Comparison

Some constraints require operators such as <, <=, != etc, either against constants or other values that are dynamically retrieved at query time. Includes date/time comparison.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S5, S21, S22, S23, S27
  • Votes: HK, SSt:+1, KC+1, labra: 0, michel: +1

Expressivity: Logical Operators

The language should make it possible to express the basic logical operators intersection, union and negation of conditions.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S5, S26, S33
  • Votes: HK, pfps, SSt:+1, KC+1, labra: 0, hs:+1, michel: +1

Expressivity: Transitive Traversal of Properties

Some constraints need to be able to traverse a property transitively, such as parent-child or partOf relationships.

  • Status: Approved: F2F2 meeting 17 February 2015.
  • Derived from: S16, S23, S26
  • Votes: HK, pfps, SSt:+1, KC+1, labra: 0, ericP: 0, DK: +1, michel: +1

Expressivity: Aggregations

Some constraints require aggregating multiple values, especially via the SPARQL operations COUNT, MIN and MAX or equivalents in other languages.

  • Status: Proposed
  • Derived from: S22
  • Votes: HK, SSt:+0.5, DK, KC-0, labra: -0.5, ericP: -1, michel: -0.5
  • Commment (pfps): If S22 is going to be appealed to in this requirement, the story list should have an example of the needed expressivity.
  • Answer (HK): The linked page http://www.w3.org/TR/vocab-data-cube/#wf-rules already has examples for MIN and COUNT (scroll down to IC-12 and IC-17).
  • Comment (SSt): It's maybe difficult to link to values that should be aggregated without having some sort of nested querying?
  • Answer (HK): Nesting is, for example, covered by user-defined functions (like in SPIN).
  • Objection (labra): Adding Aggregations would probably increase the complexity of the language, specially if they have to be covered by user-defined functions
  • Answer (HK): Aggregations are needed to count property values, so if we want the language to be self-describing then we need to support them. Also, they are part of SPARQL already so all we need to agree on would be to allow any SPARQL query, adding zero implementation costs.
  • Objection (ericP): The justifications for this requirement focus on SPARQL and not on the core language.
  • Answer (HK): Did my rewording resolve this? Otherwise, please make a suggestion on how to rewrite this.
  • Answer (eric): I propose annotations to distinguish functionality that is available in the core vocabulary from what available by e.g. a SPARQL or SPIN extension.
  • Answer (HK): I'd be fine with having such annotations. My understanding was that these would be everything under "Complex constraints", but this has been reorganized in the meantime.

Expressivity: Named Graphs

Some constraints require looking up information from other named graphs, for example to verify that certain values exist in a controlled vocabulary or background knowledge. This information is usually not explicitly imported into the query graph, and having all sub-graphs in the default query graph would be too inefficient.

  • Status: Proposed
  • Derived from: S5
  • Votes: HK, SSt:+0.5
  • Comment (pfps): This may be about graphs that are accessible via URLs, not named graphs.
  • Answer (HK): No, it's about named graphs, independently of where they are retrieved from. In the EPIM use case, a named graph of background knowledge (NPD Fact Pages) lived in a separate file on the server's hard drive, but that's not an important consideration.
  • Comment (KC): I would support a requirement for graphs that are accessible via URLS, named or not. This is needed in my area.
  • Answer (HK): The mechanisms to look up graphs are already covered by general Linked Data principles and the SPARQL SERVICE keyword in particular.

Expressivity: Closed Shapes

Some data recipients will not act as generic triple stores. "Closed shapes" identify triples not matched by a property constraint in a shape. A few uses of closed shapes: a client tests that every triple being sent to a server will be accepted/processed; a server rejects any document with unexpected triples; a server accepts and ignores unexpected triples and returns a list of dropped triples to the client. (The control can probably be applied to the whole schema rather than individual shapes. At least, there's no use case or implementation experience to the contrary.)

  • Examples: - simple - issue report.
  • Status: Approved: Telcon 30 April 2015.
  • Derived from: S4
  • Votes: ericP, SSt:+1, labra: +1, pfps: -1, AR: -1, Ted: +1, michel: +1
  • Comment(SSt): So basically letting the validation fail, if (crucial) information is missing? Maybe we would then need some sort of mechanism to indicate "must have" info and "if present than has shape xyz" information.
  • Comment (HK): I think this requirement can already be represented by "Non-Existence of Patterns" (FILTER NOT EXISTS).
  • Answer (ericP): I propose that this be in the core vocabulary.
  • Comment (HK): I did not vote yet, but I am not very positive about this requirement. One of the main features of RDF is extensibility. Not allowing to add extra annotation properties etc is probably taking the close world a bit too far. Why can't the tools that process the incoming data just ignore any triples that have no meaning to it? Most of the "closing off" already happens on a per-property level (e.g. maxCardinality). To make a stronger point, maybe you can point at more user stories.
  • Answer (ericP): Most services available to the public at large control which triples they will accept. Examples include LDP services backed by conventional stores or state in some process, social web applications, health and disease monitoring sites, etc. These services should be able to both publish and validate data against shapes that accept only certain triples. Deciding that RDF is is about extensibility writes off these use cases.
  • Comment (AR): REST APIs should be designed to ignore unknown content so that systems can evolve gracefully, and not require all parts of system to be upgraded in lock step.
  • Objection (pfps): There needs to be a definition of this. As far as I know, the only viable definition is for the algebraic version of shape expressions, which is very different from the current proposals for SHACL.
  • Comment (AR): I would reverse my vote if the Requirement was reworded to say that "unknown" content should be reported, rather than rejected
  • Comment (HK): We first need a precise definition of how this is supposed to work.
  • Comment (AR): +1. The revised wording is fine with me.

Expressivity: Checking for well-formed rdf:Lists

RDF's list vocabulary (rdf:List and the associated terms rdf:first, rdf:rest, rdf:nil) provides a mechanism for expressing closed, ordered collections in RDF. But it also provides many opportunities for creating “malformed” structures that are unlikely to be processed as intended and consistently by recipients, such as lists with gaps, branches, and cycles, or missing tail. Such “malformed” lists may be occasionally useful, but data consumers may want to express that they are unable to handle them. There shall be a concise construct for expressing that a list must be well-formed. (As a starting point for discussion, let us assume that only the kind of lists produced by the SPARQL/Turtle (a b c) construct are well-formed.)

  • Status: Approved: Telecon 14 May 2015.
  • Derived from: S26, S42
  • Votes: RC +1, KC +1, …
  • Discussion: …

Expressivity: Placing constraints on the values of rdf:Lists

RDF's list vocabulary (rdf:List and the associated terms rdf:first, rdf:rest, rdf:nil) provides a mechanism for expressing closed, ordered collections in RDF. Many applications of rdf:List require that the members of the list meet certain conditions. There shall be a way of applying the constraints that we can express for normal properties (require a certain rdf:type, require a certain shape, require a certain datatype, require a certain node kind, etc.) to the members of rdf:Lists.

  • Status: Approved: Telecon 14 May 2015.
  • Derived from: S26, S42
  • Votes: RC +1, KC +.5…
  • Discussion: …

Macro-Language Features

The language should enable the definition of macros as shortcuts to recurring patterns, and to enable inexperienced users define rich constraints. Macros should be high-level terms that improve overall readability, separation of concerns and maintainability. This overlaps with the already approved "Higher-Level Language".

  • Status: Approved Telecon 12 February 2015
  • Derived from: S5, S7, S16, S21, S27, S28, S32
  • Votes: HK, pfps, SSt:+1, KC+1, labra: 0
  • Comment (KC): with the caveat that we need to determine if these macro features are duplicated in modularization.
  • Comment (labra): It should be more clear what you mean by "macros", sometimes it seems a macro is the same as a shape in ShEx, while others it looks like something else.

Named Shapes

It should be possible to encapsulate a group of constraints (a Shape) into a named entity, so that the Shape can be reused in multiple places, also across the Web.

  • Status: Approved Telecon 12 February 2015
  • Derived from: S7, S16, S28
  • Votes: HK, pfps, SSt:+1, KC+1, labra: +1, ericP: +1, michel: +1
  • Comment (labra): This requirement is very similar to the Addressability requirement that has already been approved.
  • Comment (ericP): Note to editors, please keep this right next to #Addressability
  • Comment (pfps): Addressability does not imply the ability to reuse.
  • Comment (ericP): I guess, but it's hard to see how one would avoid it with a schema written as a graph.

Function and Property Macros

In order to support maintainable and readable constraints, it should be possible to encapsulate recurring patterns into named entities such as functions and dynamically computed properties. This requirement is orthogonal to almost every user story. It includes a vocabulary to share function definitions.

  • Status: Approved: Telecon 09 April 2015
  • Derived from: S5, S16, S28
  • Votes: HK, SSt, DTA:+1, KC+1, labra: -0.5, ericP: -0.5, pfps: -0, TEd +1
  • Objection (labra): It is not clear from the name of the requirement and the description what is the recurring pattern that is encapsulated, it looks similar to a Shape as in ShEx, but it could be clarified, maybe with some example.
  • Answer (HK): Here I was trying to abstract away the concept of user-defined SPIN/LDOM functions to extend the expressivity and maintainability of SPARQL queries.
  • Objection (ericP): Why do we need this and #Constraint_Macros?</span>
  • Answer (HK): Constraint macros would only allow users to reuse a complete constraint (e.g. complete SPARQL query). The user-defined functions are useful for any SPARQL query, including direct SPARQL constraints. Furthermore, having functions allows us to define more readable queries ourselves, e.g. when we formalize the semantics of ldom:maxCount in SPARQL (using a function lik ldom:valueCount). Finally, functions allow users to express certain recursive use cases.
  • Comment (ericP): This seems to be raising the expressivity of the core vocabulary a lot.
  • Answer (HK): Yes, and raising the expressivity is a very good thing. Note that these LDOM functions are only relevant from a SPARQL point of view, so they would not be in (your current definition of) the "core" language.
  • Comment (pfps): Is this the inclusion of some function or property macro-expansion vocabulary, or a macro-expansion language that allows the arbitrary definition of things like functions?
  • Answer (HK): The former. A possible design is in the current SHACL spec draft. My main goal is to help create more maintainable SPARQL queries.
  • Comment (pfps): The SHACL spec draft appears to be the second, i.e., a macro-expansion language that allows for arbitrary definition of things like functions. This appears to be of marginal utility and introduces a large amount of complexity even if the expansions are just SPARQL.
  • Answer (HK): I would obviously be OK if we limit this macro facility to SPARQL functions only, although we do have examples where people want to define SPARQL functions backed by JavaScript code. Would this change your opinion?
  • Comment (RC): I'm trying to understand how this is different from the “template mechanism” widely discussed elsewhere. The “template mechanism” encapsulates a validation rule, written in an extension language such as SPARQL, that makes a pass/fail decision. The mechanism requested here, on the other hand encapsulate a computation rule, written in an extension language such as SPARQL, that returns a computed value for later use. Is that correct?
  • Question (RC): This requests the ability to encapsulate a computation into a reusable entity. Where would that entity be used/invoked? As a (custom) SPARQL function in an extension template? As a function in some other, non-SPARQL place? In a computed property? All of the above?
  • Answer (HK): The functions here could be used in the WHERE clauses of any SPARQL-based constraint or template. Their role is similar to any structured programming language, where you encapsulate reusable building blocks into functions. We and our customers use such functions all over the place. The declaration of functions as currently drafted is a bit more general than SPARQL, because some platforms may want to support SPARQL functions that encapsulate JavaScript (in TopBraid we do). However the latter aspect is not important and would compromise interoperability, so I'd be happy to make this a SPARQL-only feature for now.

Constraint Macros

Some constraint patterns are recurring with only slight modifications. Example: SKOS constraints that multiple properties must be pairwise disjoint. The language should make it possible to encapsulate such recurring patterns in a parameterizable form. Examples include SPIN/LDOM Templates.

Nested Constraint Macros

It should be possible to combine the high-level terms of the constraint language into larger expressions using nested constraints. Examples of this include ShEx, Resource Shapes' oslc:valueShape and owl:allValuesFrom.

  • Status: Approved Telecon 12 February 2015
  • Derived from: S32, S33
  • Votes: HK, pfps, SSt:+1, KC+1, labra: +0.5
  • Comment (pfps): This doesn't look much like a macro facility, just the ability to have nesting within constraints.
  • Answer (HK): The reason why I have organized this as a Macro feature is that this is about reusing high-level macros (e.g. implemented as SPIN Templates) instead of writing low-level constraints (e.g. in SPARQL). SPARQL already has "nesting" built-in, so this ticket is about reusing higher level terms such as min/max cardinality in a ShEx-like data structure.
  • Comment (KC): but I think this duplicates some of the requirements for modularity, so I think we should clarify this and eliminate duplicates.
  • Comment (labra): I agree with the concept of Nested definitions but I think the requirement should be called "nested shapes". In that case, I think it is important to be able to describe nested definitions like having a "Course" shape with a property ":student" that links to a "Student" shape, and a "Student" shape with a property ":course" that links to Courses.
  • Answer (HK): This requires a firmer definition of what s Shape is. To me it's a collection of constraints at a given node. But the shape of something can also be expressed via SPARQL queries (BGPs etc). So in this ticket I wanted to be clear that we are allowing Templates to be nested to build :valueShape-like expression trees.

Specialization of Shapes

It should be possible to specialize/extend shapes so that the constraints defined for a more general (super) shape also apply to the specialized (sub) shape. Sub-shapes can only narrow down, i.e. further constrain.

  • Status: Approved Teleconf 26 March 2015
  • Derived from: S2, S5, S10, S11, S19, S20, S24, S25, S27, S28, S29
  • Votes: HK +1, KC+1, labra: -0.5, SSt: +1, pfps: -0.5, michel: +1
  • Comment (HK): The following comments are about a previous version of this requirement, which talked about OO-like inheritance.
  • Comment (pfps): I think that we need to be careful when talking about inheritance. We could instead just say that constraints on classes apply to all instances of the class, particularly including instances of subclasses.
  • Answer (HK): You may need to convince me about this one. I think the interpretation of rdfs:subClassOf (e.g. in OWL 2 RL) is quite consistent with the usual meaning of inheritance in an OO sense: all instances of subclasses are also instances of superclasses. Why do you think the term inheritance is problematic? I am not talking about the non-inheritance of rdfs:domain here.
  • Comment (pfps): Inheritance is generally used to mean the sort of inheritance that you get in OO languages, where overriding and priority become important. If these aspects of inheritance are not wanted (and I think that they are not) then I think a different word should be used.
  • Answer (HK): Looking at http://en.wikipedia.org/wiki/Inheritance_%28object-oriented_programming%29 I think this matches quite well, while "subtyping" does not match well (we want behavior to be inherited). Technically, our approach would not have a syntactic means to "override" something. Instead, the system would always execute all constraints from superclasses, and instances need to fulfill the intersection of all constraints. As a consequence, people can use de-facto overriding to narrow down the cardinality or value type of a property in a subclass.
  • Comment (KC): I vote +1 for this, but substituting Peter's suggested worded.
  • Objection (labra): After reading the description I think this requirement is more about a set of constraints (or shape) that extends another set of constraints (another shape), rather than about classes and subclasses as in OWL. As there is another issue pending about separating types/classes from shapes, I think we should change the description to reflect that difference.
  • Comment (SSt): +1 for the general intent of this requirement.
  • Comment (HK): I have changed the wording to talk about "Specialization of Shapes" instead of "Inheritance of Constraints".

Abstract Shapes

It should be possible to mark certain shapes as "abstract" to indicate that the Shape shall not be referenced/"instantiated" directly. Abstract shapes may serve as super-shape of other shapes only.

  • Status: Proposed
  • Votes: HK +1, pfps -1
  • Objection (pfps): This requires that shapes be instantiable, which I don't agree with. It also requires a generalization relationship between shapes that I also do not agree with.

Final Shapes

It should be possible to mark certain shapes as "final" to indicate that the Shape should not have any sub-shapes.

  • Status: Proposed
  • Votes: HK +1, pfps -1
  • Objection (pfps): This requires that shapes have a generalization taxonomy which I do not agree with. Even if there was a generalization taxonomy for shapes, I do not think that this is useful.

Private Shapes

It should be possible to mark certain shapes as "private" to indicate that the Shape should not be used outside of its defining namespace.

  • Status: Proposed
  • Votes: HK +1, pfps -1
  • Objection (pfps): This requires a lot of machinery that I do not agree with. Even if this machinery was present I do not think that access notions like "private" are useful. I also don't see a supporting user story for this requirement.

Global Constraints

It should be possible to specify constraint conditions that need to be checked "globally" for a whole graph, without referring to a specific set of resources or class. In programming languages such global entities are often called "static", but "global" is probably better known.

  • Status: Approved Telecon 12 February 2015
  • Derived from: S35
  • Votes: HK, pfps, KC +1, labra: 0, SSt: +1, michel: +1
  • Comment (labra): I think this requirement is more about the process of how to select which nodes are being validated (all the nodes vs the nodes that belong to some class). Maybe, we could have a different requirement about how to select the nodes that will be validated.

Vocabulary for Constraint Violations

Instead of just reporting yes/no, the language needs to be able to return more meaningful messages including severity levels, human-readable error descriptions and pointers at specific patterns in the graph.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S3, S34 (and almost every other User Story)
  • Votes: HK, DK, pfps, SSt:+1, KC, labra: 0.5, michel: +1
  • Comment (DK): I am not sure if the group header is representative, would something like "Supported Decorations for constraints" fit better? Vocabulary is too general and can cover many other requirements.
  • Comment (DK): minor description revision suggestion: "The constraint language should facilitate the reporting of more meaningful [...]"
  • Comment (labra): Maybe, we should rewrite this requirement to state that instead of yes/no, the language needs to return more meaningful information about the validation process. That "meaningful information" could be something like the PSVI (Post-schema validation infoset) from XML Schema which could later be converted to human-readable messages. We could generate a PSVG (Post Schema Validation Graph) which could be the input RDF graph enriched where the nodes could contain properties about their shapes.
  • Answer (HK): The language should be easy to use, so adding a layer of complexity between user message and the engine may backfire. But for nested shapes (e.g. something is violated deep within a tree), then a path to that should certainly be possible. In my current SPIN experiments, I added a property that links a parent constraint violation with children that provide details.

Severity Levels

The language should allow the creation of error responses that can include severity levels as desired.

  • Status: Approved: F2F2 meeting 17 February 2015.
  • Derived from: S3
  • Votes: HK, DK+1, SSt:+0.5, KC:+1, labra: -0.5, ericP: -0.5, michel: +1
  • Comment (DK): I have a few existing use cases for "info" e.g. in an owl:SymmetricProperty we check that statements about the involved resources are made in both directions.
  • Comment (KC): I think it would be better to indicate that there will be the capability for severity levels; that declaration of these will be part of the development of the validation profile; and perhaps that a small number will be included as defaults (but I'm not sure why that is necessary).
  • Answer (HK): The design that I have in mind would have a hard-coded class :ConstraintViolation and any number of subclasses, including :Error, :Warning etc. Users are free to add more subclasses, but it would make everyone's life easier if we agreed on a fixed vocabulary for the most commonly needed levels.
  • Objection (labra): I think the constraint validation process can be a yes/no process in order to simplify the solution. However, we could model security levels with different shapes or adding some meta-information about the constraints which could indicate if not satisfying them would imply a Warning, Error, Fatal error, etc.
  • Answer (HK): Yes/no would not be sufficient. In the LDOM draft, each constraint may have a field ldom:level that indicates the severity. Maybe this would address your requirement. However I feel you are already talking about a solution while here we are just collecting requirements.
  • Comment (labra): I was trying to avoid talking about a solution and in fact, I think what I proposed was the same as you: To add some meta-information (the field ldom:level, for example) that indicates the severity. But at the end, the validation process returns a yes/no on that constraint. What I meant is that the validation process could remain as yes/no, while the constraints are the ones that have the meta-information about their severity.
  • Comment (HK): Some constraints require dynamically picking the property that caused the violation, dynamically constructing error message, dynamically identifying the root and end node that caused the violation. Therefore LDOM supports ASK, SELECT or CONSTRUCT constraints, and only having YES/NO is not sufficient. With ASK and SELECT, the level can be declared for all constructed violations, but for CONSTRUCT this is not easily possible.
  • Comment (labra): I think you are now talking about a solution :), anyway, I think what you propose is precisely what I called before as a Post-Schema-Validation-Graph, i.e. some information that is dynamically generated by the validation processor and that can be returned by it. So the validation process returns YES/NO with a data structure that contains information about the validation. That information should be machine processable and could easily be converted to human-readable messages.
  • Objection (ericP): Having levels could be nice but I don't see it as a requirement. Attaching requirement levels to specific properties doesn't address cases where one attaches different levels to a missing property vs. a property with a deprecated datatype or pairs of mutually exclusive properties whining at some severity. All of these are better met by having whole schemas associated with different warning or conformance levels.
  • Answer (HK): Nobody said that levels are attached to properties - they are part of each constraint violation that gets reported.
  • Question (ericP): So these are not specified in the core shapes vocabulary?
  • Answer (HK): The core vocabulary should include the macro mechanism, e.g. the ability to define new LDOM templates (regardless of their executable ldom:sparql implementation). When you define a template, you can specify the warning level using ldom:level. This allows engines to check in advance whether they need to execute the template at all, and to execute any FatalErrors first etc. So it must be part of the core vocabulary.

Human-readable Violation Messages

The language should make it possible for constraint checks to create human-readable violation messages that can be either created explicitly by the user or generated dynamically from constraint definition. It should be possible to create such messages in multiple languages.

  • Status: Approved: Telecon 2 April 2015
  • Derived from: S3
  • Votes: HK, DTA, DK+1, pfps, SSt:+1, KC, labra: -0.5, ericP: -1, Ted: +1, Arthur: +1, michel: +1
  • Objection (labra): I would prefer that the result of the validation process was some structure that could be processed by machines. This structure could later be converted to human-readable violation messages in multiple languages. I am not sure if we should provide such a conversion as a requirement.
  • Answer (HK): The machine-processable bit is covered by the other requirements, e.g. the ability to point at a path and a specific node. That should be enough to, for example, highlight a property value on a form. The ability to create a human-readable message however is still needed and cannot be generalized because they are often ontology-specific (e.g. "Company Z is not the operator of wellbore XY" in EPIM. How could that be machine-generated, and why the overhead if we can simply concatenate a string?
  • Comment (labra): To clarify my point of view, I would vote in favour of a separation of concerns here. We could define the data structure that the validation process returns (I would vote +1 for this), and the conversion from that data structure to human-readable messages (I would vote 0 for this)
  • Answer (HK): It is clear that from a violation data structure, a tool can provide generic human readable messages. For example: ldom:root=ex:MyPerson, ldom:path=ex:child, ldom:value=ex:MyPerson could produce: "Constraint violation at ex:MyPerson: Invalid value "ex:MyPerson" for property "ex:child". This is easy. However, some details cannot be machine generated, e.g. "A person cannot have itself as a child". However, such things can be implemented in a reusable way using Templates (here: a template to check irreflexivity). Does this clarify it?
  • Objection (ericP): I don't see why we'd require this in the core language, nor how we'd provide it.
  • Answer (HK): But this is already solved in SPIN/LDOM. The SPARQL queries can produce an ldom:message string dynamically, or the constraint itself has an ldom:message triple attached to it. That's all that is needed, it's easy to implement and greatly improves the user experience. You seem to be talking about "core language" often, but it's unclear what you mean by that. Is this some lite dialect of the full spec?
  • Answer (ericP): I believe we share an understanding that not every implementation needs SPARQL. Core is the stuff that every implementation needs.
  • Answer (HK): Again, you are voting -1 assuming that we only vote for the core language. But we vote for the complete language. And here, even the core language requires ldom:message because it can be directly attached to ldom:Templates which will be in the core language.
  • Comment (cygri): https://lists.w3.org/Archives/Public/public-data-shapes-wg/2015Mar/0249.html
  • Answer (RC): To ericP: This can be provided by a mechanism that substitutes placeholders such as ?root and ?path (or {resource} and {property}, as in my linked real-world example) in a string template.

Constraint Violations should point at Specific Nodes

The language should make it possible for authors of constraint checks to produce pointers at specific nodes and graph fragments that caused the violation. Typical examples of such information includes the starting point (root node), a path from the root, and specific values that caused the problem.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S3
  • Votes: HK, DK+1, KC:+1, SSt:+1, labra: +1, michel: +1
  • Comment: (KC) This sounds to me like a requirement for expressivity of error messages - that each error needs to be able to return a message specific to the violation found. I don't see this as limited to naming a node.
  • Answer (HK): Yes, the ability to produce context-specific error messages is important and covered by "Human-readable Violation Messages" above. However, error messages are usually not machine-readable. This requirement here is to help automated tools to point at the specific nodes that are broken, e.g. to highlight them in user interfaces or suggest auto-corrections.

Constraint Violations Reporting Details

The language should make it possible to provide different levels for constraint violation reporting details. Typical examples of such levels are violations of specific nodes (rec 2.10.3), violations aggregated at the shape or shape facet level or with the option to provide the number of violations per shape or shape facet.

  • Status: Under consideration
  • Derived from: S34, S46
  • Votes: DK+1, pfps -0.5, HK +1, michel: +1
  • Also see email discussion: https://lists.w3.org/Archives/Public/public-data-shapes-wg/2015Feb/0344.html
  • Comment (pfps): I would like to see what sort of mechanism would be proposed to satisfy this requirement.
  • Comment (DK) : For the high level vocabulary it can be done transparently by the validation engine, the problem rises in the custom SPARQL queries. In that case the only requirement is to annotate at least the variable corresponding in the erroneous RDF node so that the validation engine can convert the query to e.g. "SELECT (COUNT (DISTINCT ?variable))" and just get the number of violations and not the complete list of errors. For a more complete proposal see https://lists.w3.org/Archives/Public/public-data-shapes-wg/2015Feb/0359.html
  • Comment (pfps): It seems to me that the only issue here is the "efficiently". Any reporting back of individual violations (e.g.., using SPARQL SELECT) can be post-processed to produce aggregated or reduced reports. It further seems to me that this can only be a minor improvement, and thus not something that needs to be added to the standard. As well, custom SPARQL queries can easily include the aggregation.
  • Comment (DK): In my opinion it is a huge improvement in reporting flexibility and speed especially on big datasets. Another issue is that we should decouple the results from the shapes. Hard-coding aggregations in the shape will make the shape applicable only one specific case and one reporting format, the same applies for all CONSTRUCT queries.
  • Comment (HK): I think this is worth thinking about and should therefore at least reach the state "Under Consideration", although we may only get to the details of that design later in the process.
  • Comment (DK): I removed 'efficiently' from the text and placed it in the following requirement (violations must be countable). So this is about the expressiveness of the reporting vocabulary.

Constraint Violations must be Countable

The language should make it possible to efficiently provide the number of violations for a shape or a shape facet.

Modularization

The language should support organizing constraints in different groups, modules or graphs, and provide mechanisms to allow modules to point to each other.

  • Status: Proposed
  • Derived from: (see individual tickets below)
  • Votes: HK, KC +.5, pfps 0, michel: +1
  • Comment (KC): DCMI is contemplating whether we need to require that the language can describe sub-graphs. This may be already covered in the requirement on "complex constraints." If not, a story would be very helpful here.
  • Answer (HK): Maybe Expressivity: Named Graphs is related to your use cases?
  • Comment (SSt): Would something like: "validate to true if the group of UI constraints validates to true" be an example for that requirement?
  • Answer (HK): This is rather another grouping requirement that only exists as parent of others. We probably should not vote on this req itself.
  • Comment (pfps): Is this supposed to be a group of requirements? It doesn't seem to me that the member requirements are all related to this requirement.

Organizing Constraints in Named Graphs

The language should support using the standard linked data concept of named graphs (datasets) to organize constraints. Such named graphs have a URI that is resolvable in the context of the application (e.g. on the public web via HTTP). Applications may define their own look-up mechanism to resolve such named graphs (e.g. to local database graphs or files). This includes the ability to separate a domain model from constraints.

  • Status: Proposed
  • Derived from: S5, S6, S7, S13, S15, S20, S24, S28
  • Votes: HK, pfps 0, michel: +1
  • Comment (pfps): Are named graphs a standard part of LD? If this requirement can stand independently of LD then why bother to appeal to LD.
  • Answer (HK): Yes, named graphs are standard part of Linked Data in RDF 1.1 and SPARQL datasets. So maybe we do not need to repeat this requirement here. I just thought it should be explicitly covered because it is relevant to so many stories, and a potential solution to the context-sensitivity required by many user stories.
  • Comment (pfps): I don't think that this is really about named graphs at all. I think that it is about being able to organize constraints in web-accessible documents and to incorporate them by URL.
  • Answer (HK): Web-accessibility is a part of the overall named graph requirement, but not key because many applications re-direct named graph look ups, e.g. to local databases or files.

Including Named Graphs for Query Evaluation

The language should support including named graphs (similar to owl:imports) so that all constraints from the (transitively) included graphs are also applied for evaluation. Conceptually, all included graphs are a union graph that becomes the default query graph of the constraint evaluation.

  • Status: Proposed
  • Derived from: S5, S6, S13, S20, S24
  • Votes: HK, DTA+1, pfps -1
  • Comment (pfps): I don't think that this is about named graphs.
  • Answer (HK): Why not?
  • Comment (pfps): What is query evaluation and the query graph?
  • Answer (HK): These are SPARQL terms, and given that SPARQL has been recommended as the back-bone of this WG so many times I hope that people understand these terms. In OWL it would be the graph that the class expressions are evaluated (= queried) against.
  • Objection (pfps): owl:imports goes out and gets documents identified by a URL. These documents are conceptually available by using web-accessing mechanisms, not named graphs.
  • Comment (HK): Peter, are you able to propose a re-wording that would work for you? I think this is an important requirement to allow people to point from their instance data to a file containing the SHACL definitions. "Named Graph" may indeed not be the right term, yet many systems keep local copies of URL-based files in their workspaces and these workspaces typically form the dataset.
  • Comment (pfps): Just use owl:imports. Don't say anything about it in SHACL.
  • Comment (HK): I think this issue requires an official solution. It may work to reuse the IRI of the owl:imports property, but I would not want to make SHACL depend on OWL because there it has different semantics. I believe there is also a need to distinguish between imports into the query graph and to graphs that only contain shape declarations, for performance considerations. I therefore suggested sh:include and sh:library for these two kinds of graph references. Maybe it's too early to talk about this topic yet - it will only become clear when we look at larger examples.

Efficiency of the validation process

The efficiency and complexity of the language should be taken into account. At least, it should be possible to identify some profiles of the language with minimum complexity.

  • Status: Under consideration
  • Derived from: S34 (plus many other scenarios with large graphs)
  • Votes: labra: +1, HK 0, ericP: +1, iovka: +1
  • (Withdrawn) Objection (HK): This risks repeating the mistakes of OWL DL. In our experience it would be a huge mistake to limit the expressivity of the language only because of theoretical worst-case performance. The whole expressivity of SPARQL should be accessible. The ability to identify subsets that offer certain performance guarantees is already covered by the Profiles requirement. I would not mind having researchers work on optimized profiles, but the overall language needs to be unconstrained. It is the responsibility of the constraint authors to make sure that queries respond in reasonable time (just like with any programming language).
  • Comment (labra): I think performance and efficiency of the solution should be taken into account. In the description I deliberately omitted a particular complexity boundary, only said that we could at least take into account the efficiency of the solution as a parameter and even if the language is converted to SPARQL, we could try to define a subset or profile that had better performance. I think this requirement is compatible with the "profiles" requirement and it in fact departs from the requirement that it should work on large databases.
  • Answer (HK): I sympathize with your view point and believe that interesting research will be possible on different profiles of the overall scope. However, I cannot live with the current wording "its complexity should be minimal". This sounds like you want to prune the expressivity (of SPARQL) for everyone, and in our experience this risks repeating the OWL DL mistake, which was to design a language based on theoretical performance without covering real-world requirements. In other words, if there is a choice between expressivity and performance guarantees, then I would always favor expressivity. Even with high expressivity, it is perfectly fine to create efficient queries.
  • Comment (labra): So I really think we agree on the issue, would you agree with it if we separate "its complexity should be minimal" by "its complexity should be taken into account" ? My main point is that we could try to define profiles which could offer better perfomance that just SPARQL. For example, if we keep only structural constraints and leave the SPARQL queries apart we could have a profile that offered good performance and at the same time was expressive enough for some application scenarios.
  • Answer (HK): I would withdraw my objection if we change the wording to something like "Some profiles of the language should have minimum complexity." This applies both to performance and to the complexity of the algorithms needed to make sense of the declarations. For example, a profile may only include plain property definitions with min/max/valueType etc that is trivial to parse by JavaScript based UI tools. In the context of LDOM I guess any such profile would exclude the full expressivity of SPARQL, but be entirely based on templates, and those templates could have a ldom:profile property to link to the URI of known profiles.
  • Answer (labra): I have changed the wording.
  • Answer (HK): Ok, I have withdrawn my objection as discussed. I suggest we leave the discussion here for historical purposes.

Execution on large databases

The language should be efficient to execute on large databases, so that the execution engine can exploit native optimizations from the database. Some data that is needed for execution (such as the constraint definitions themselves, macros and functions) may not be present on each graph on the database. Therefore, it should be possible to separate the graphs needed at constraint evaluation time from those graphs that hold the complete definition of the constraint checking context. A possible solution would be to have another kind of include mechanism that links a data graph with (macro) libraries. Another way is to have some on-demand validation system.

  • Status: Proposed
  • Derived from: S34 (plus many other scenarios with large graphs)
  • Votes: HK, labra: 0, pfps 0, DK +1
  • Comment (pfps): Justification should be in the stories, not here, I think.
  • Answer (HK): I don't think so, some stories are just formulating a high-level requirement, while here I am trying to drill into an implementable solution.
  • Comment (labra): I think this requirement could be separated in several ones. One could be just that "The language should be efficient" and to say that the complexity of the language should be clear and minimized. I mean, we should look for some solution whose complexity could be identified and minimized. I would vote +1 for this one. Another requirement could be about the execution of the validation process against databases which could say that the validation process could be run on demand and taking into account native optimizations from the database. I would vote 0 for this one, as it would probably be out of the scope of the WG.
  • Answer (HK): Feel free to make those edits and adjust your vote.
  • Comment (labra): I separated the original requirement into two and removed my objection but kept the comments. I also changed the name, the original name was: Possible separation of Query and Library Graphs

Profiles

The language should include a notion of profiles, so that certain applications with limited features can only use certain elements of the overall language.

  • Status: Approved: Telecon 29 January 2015
  • Derived from: S11, S19, S32
  • Votes: HK, KC:+1, SSt:+1, labra: +1, michel: +1
  • Comment: (KC) DCMI will probably attempt to select a "core" set of constraints for simple applications. I assume this requirement means that one can implement only the aspects of the language that are needed for the function being addressed.
  • Answer (HK): Yes, a profile might be implemented as a collection of constraint templates (e.g. those under "Declaration of property..."). Each profile could have its own URI and applications could declare which profile(s) they support by pointing at those URIs, and engines could produce warnings if unsupported constraints are found in the model.
  • Comment (labra): this requirement could be separated in two. The first one would be about having the notion of profiles, and the second one about separating structural constraints from complex constraints. I would vote +1 to the notion of profiles. With regards to the separation of structural from complex constraints, I would also vote +1 if it was clear that it is independent from SPARQL.
  • Answer (HK): Please go ahead with splitting the requirement.
  • Comment (labra): Done

Grouping Constraints into Contexts

The language should make it possible to organize constraints so that they are applicable in certain contexts only. For example, application A may want to add constraints that do not apply for the more general application B. One approach would be to "tag" constraints with the URI of a context resource, and have the execution engine accept a context parameter to instruct it which constraints to ignore. Contexts could be organized into their own hierarchy and details would need to be worked out.

  • Status: Proposed
  • Derived from: S7 (S4?), S13, S20, S24
  • Votes: HK +1, DK +1, pfps -0.5
  • Comment (HK): One version of this idea was proposed in an email by Jerven Bolleman
  • Comment (DK): This idea was actually proposed earlier from the RDFUnit constraint discovery mechanism
  • Comment (pfps): This doesn't seem like a good idea to me. I expect that it will lead to confusion when reading documents containing multiple contexts and multiple constraints.
  • Comment (HK): Some of the use cases may be covered by a scoping mechanism that allows to declare pre-conditions.

Separation of structural from complex constraints

There shall be a core language or SHACL profile that excludes any support for constraints defined via embedded SPARQL queries or other complex lower-level expressions. This is so that lightweight applications can validate constraints without requiring a SPARQL processor or similar subsystem.

  • Status: Approved Teleconf 5 March 2015
  • Derived from: S11, S19, S32
  • Votes: HK, KC:+1, SSt:+1, labra: +1, pfps: 0, AR: +1, michel: +1
  • Prior Objection (pfps): I don't see a reason why the split is between structural and complex constraints. Is there any indication that structural constraints will be easy or that complex constraints will be hard? The revised version is OK by me.

Evaluating Constraints for a Single Node Only

It should be possible to validate constraints on a single node in a graph. This may be impossible to implement 100% correctly, because sometimes a change to a resource invalidates conditions in a very different place in the graph. However, the language could propose a framework that identifies those constraints that SHOULD be checked when a given node is evaluated, e.g. by following its rdf:type and the superclasses of that. This would include validating shacl:valueShape but not shacl:valueType.

  • Status: Approved Teleconf 26 March 2015
  • Derived from: (Orthogonal to basically all stories)
  • Votes: HK, KC -1, SSt:+1, labra: +1, pfps: 0.5, michel: +0.5
  • Comment (pfps): Having a story for this, or an example from a current story, would be useful.
  • Answer (HK): You could take any story about UI forms, where a given instance is being validated while other instances may exist on the client's graph.
  • Comment (kc): : I assume that one can create a "shape" that is limited to a single node without this requirement. If that's not the case, I will change my vote.
  • Answer (HK): This requirement is not about declaring constraints but how to use them at runtime. For example, a constraint may be declared for all instances of Person (attached to the class Person). Now, when an application runs the validator because it has a single Person instance on a form, it should not have to validate all instances at once. Instead it should be possible to start the evaluation process just for that single instance (much faster etc). The requirement is about a mechanism to locally scope constraints for a given resource only - as is done via the ?this variable in SPIN.
  • Comment (labra): This one and the static constraints requirement are about the selection process of which nodes to validate. I think we should group them in a new requirement or a set of requirements about how to select which node to evaluate. I added a new sub-section titled "Selection of nodes".
  • Prior Objection (pfps): If this is different from specifying the scope of a constraint then I think that it is unnecessary.
  • Comment (RC): Proposed rewording: “Where a constraint expression is applied to multiple nodes (e.g., to all members of a class), the design of the language must make it possible that an implementation may isolate a single node and perform validation only for that node, even if other nodes are present in scope. For example, in a data editing UI, when a user has edited a single instance in a large graph, the system should not be required to revalidate all instances.”
  • Comment (pfps): It appears to me that this could just be "keep track of changes to the graph and only rerun the constraints that might have a different result from last time". This would be fine by me.

Selection of nodes

Votes: AR +1, pfps +1, michel: +1

There must be some mechanism to select which nodes are going to be validated/constrained. Some possibilities: global selection (all the nodes in the RDF graph), nodes by type (the instances of some class), and specifc nodes.

  • Comment (pfps): I'm assuming that these requirements are about specifying scope for shapes. I am against providing two mechanisms, so if this is in addition to scope then I object to the entire section.

Select whole graph

It should be possible to select all the RDF nodes in a graph for validation. This is similar to the Global Constraints requirement

  • Status: Approved Teleconf 12 March 2015
  • Derived from: S35
  • Votes: labra +1, HK +1, SSt +1, pfps +1, michel: +1
  • Comment (HK): My +1 is assuming that we are talking about "executing all constraints for a whole graph", not just nodes. Nodes may be split across multiple graphs, and furthermore we are really talking about arbitrary triple matches, not just individual subjects.
  • Comment (pfps): All nodes, or all nodes excluding literals?
  • Answer (HK): I guess it would be excluding literals, and both rdf:type and sh:nodeShape could only work with IRIs or blank nodes anyway.

Selection by type

It should be possible to have some mechanism to select the nodes that are instances of some class for validation. The scope-based version of this is required by story S2.

  • Status: Approved Teleconf 12 March 2015
  • Derived from: (Orthogonal to basically all stories)
  • Votes: labra 0, HK +1, SSt +1, pfps +1, DTA +1, DK +1, michel: +1
  • Comment (labra): Although I prefer to clearly separate shapes from classes, I would not oppose to have some way to declare that the instances of some class should have some shape so the processor would select those nodes and check if they have the corresponding shape.
  • Answer (HK): The solution looks simple to me. "Shape" would be the superclass of "Class". Shapes have constraints attached to them, and by inheritance classes can also have such constraints. Another approach would be to allow the URIs of a shape to also be reused as a class, i.e. have both rdf:types ldom:Shape and rdfs:Class. In any case, the important bit is that selection needs to work by types.

Selection by single node

It should be possible to select a single RDF node for validation

Selection by expression

It should be possible to select the nodes that satisfy an expression for validation. For example, it should be possible to select the nodes that are the subject of a particular property as the scope of a shape, i.e., selecting the nodes that have a child. The limits of the supported expressive power are open here, but this could cover having the scope be the nodes that belong to a shape.

  • Status: Proposed
  • Votes: pfps +1, KC +1, SSt +1, HK -1
  • Comment (HK): I don't like the term "selection" in this context. If a generic shape matching mechanism was allowed to select a shape for resources, in the same way that class and instance-based selectors are used, then we may run into serious performance and scoping problems. It would basically require to walk through every resource in a graph to see if it matches the pre-condition. A more realistic solution would be to treat these scope shapes as a filter condition that can be used in conjunction with the other selectors, e.g. for all instances of class X, apply certain constraints only to those instances that also happen to match a certain pre-condition. My suggestion is to replace "select" with "filter" for this requirement.
  • Comment (KC) Holger, I think you have moved from requirement to implementation in your comment. The requirement seems suitably general to me, but making it even more general we could say that SHACL allows you to designate the nodes that...etc, which then doesn't imply any particular implementation method. Personally, I'm ok with how it is, but wouldn't want it to be tied to a specific method of implementation.
  • Comment (HK) Karen, yes we could approve rather general requirements and then hope for the best that they can actually be implemented. The problems are then delayed for now, but WG members may come back saying "but we approved this and your draft doesn't cover it". I'd rather be careful with this particular one, because there is quite a difference between "selecting" and "filtering" and my draft intentionally only covers filtering (via sh:scopeShape). Selecting is an active process that the engine needs to do, while filtering is narrower and only applies to a previous selection. It also has impact on how these various selectors could be combined, and I believe Dimitris' ISSUE-49 touches on the same problems. If someone can explain how this is supposed to work, I'd be much happier - this isn't about implementation but I want to know how can a system find the applicable shapes if the starting point is a given resource to validate. The only possibility that I see here is to walk through all defined shapes and validate their scope, and this is quite an overhead. Meanwhile we should IMHO only approve a less ambitious version of this requirement.
  • Comment (pfps): In a SPARQL-based SHACL, the expressions would be shapes and the implementation would just add the shape (or its translation to SPARQL) to the generated SPARQL query that returns the violations. This is just like the situation with class-based selection, so for a simple translation instead of SELECT ?this WHERE { ?this rdf:type ex:class. MINUS [shape translation] } you would generate SELECT ?this WHERE { [selection translation] MINUS [shape translation] }. For more complex translations a similar technique should suffice.
  • Objection (HK): I cannot support this general solution because I don't see how this can work. See https://www.w3.org/2014/data-shapes/track/issues/62 . I would be OK with making this a filter, not a selector.

Evolutionary Adoption Path

The standard should provide a reasonable evolutionary path from existing systems and data models into the new closed constraint checking world. In particular, users should be able to use information from their already existing data, esp. the existing rdf:type triples, to determine which constraints need to apply.

  • Status: Proposed
  • Votes: HK +1, DTA +1, pfps -1, iovka -1
  • Comment(SSt): You mean such a path should be covered in some documentation/document of the standard, if I'm not mistaken?
  • Answer (HK): I think this high-level requirement is more than about documentation. It should be a technical answer to the question of what to do with existing (linked) data, and existing ontologies. The goal here is to lower the cost of adoption through interoperability with existing solutions.
  • Objection (pfps): This requirement is too vague in its current form. It could be satisfied vacuously - there is this new constraints checking mechanism that is a new thing that can be done. No one is obligated to use it; it doesn't change any data; the evolutionary path is to do nothing.
  • Answer (HK): I have tried to clarify what I meant with this requirement - mostly to make sure that people are not forced to duplicate their rdf:type triples with something like sh:nodeShape. I can probably withdraw this requirement once "Selection by type" is approved.
  • Comment (pfps): If all this is about is the ability to use existing RDF *data* (i.e., not SPIN or Stardog ICV or ShExC or whatever, but definitely including rdf:type triples and RDFS ontologies) then I'm all for it.
  • Answer (HK): Yes it's only about the ability to be compatible to existing instance data and RDFS class hierarchies. Does this change your vote (we need +3 to make this under consideration)?
  • Objection (iovka): this requirement is too vague and impossible to satisfy, except if it is given a scope: what are the "existing systems", the "users", and the "data" targeted ? It is impossible to do this for all existing systems, all users and all their data.

Non-Validation Requirements

  • Commment (pfps): Looking at all this is making me very sorry that I did not object to having UI concerns in the charter of the working group.

Property Default Value

It should be possible to provide a default value for a given property, e.g. so that input forms can be pre-populated. This requirement is not about using default values as "inferred" triples at run-time.

  • Status: Approved Teleconf 19 March 2015
  • Derived from: S11, S19, S20
  • Votes: HK, SSt:+0.5, ericP:+.5, pfps:0, KC+1, labra: 0
  • Comment (pfps): This appears to be about non-validation stuff and thus needs to be moved.
  • Comment (SSt): What's the issue with its current form?
  • Comment (pfps): I'm against constraints that are not constraints. How would this fail?
  • Answer (HK): I agree this is not a condition that can fail. However, I also think the user stories and the discussions that lead to the formation of this group (e.g. the Resource Shapes stories) have expressed the desire to not only have constraints as conditions, but also as ways to express structural information that can be used to populate input forms, call web services etc. Filling in a default value if no other value has been given sounds like a very useful thing to have from this perspective, and is a requirement that we have encountered very frequently in customer situations. For example SPIN itself uses spl:defaultValue internally, to fill in unspecified arguments of template calls. (Based on your -1 I have changed the status back to Proposed).
  • Comment (SSt): I second pfps concerns with being "non-failable" but also agree with HKs answer. Maybe we should discuss the more general problem of having useful requirements which are not able to fail per se in the next call.
  • Comment (labra): Adding default values to the input graph that we are validating means that we are changing that input graph. I think it would increase the complexity of the solutions, because we should consider in which moment we add those default values and it would be more difficult to have a declarative way to reason about the whole process of validation. Anyway, if people think it is really necessary, I would not oppose to it.
  • Answer (HK): Absolutely: this requirement is only about using default values to inform things like input forms or web services, but not to automatically "infer" any new triples from this info while constraints are evaluated. This would indeed require a difficult infrastructure and have serious performance issues. I have clarified this above. If you are satisfied, please remove your comment and my response, and possibly adjust your vote.
  • Prior Objection (pfps): As previously stated this would be used to add missing values at web service invocation time. This is actually adding inferred triples at run time. The two parts of the description need to be reconciled.
  • Answer (HK): I have removed the sub-sentence about web services from the synopsis of this requirement. It is now only about suggested default values on input forms. Custom applications may use this info for other purposes, but these are outside of the scope here. I hope this addresses your objection.

Property Labels at Shape

It should be possible to provide human-readable labels of a property in the context of a shape, intended for human consumption such as documentation or UI, not just globally for the rdf:Property. Multiple languages should be supported.

  • Status: Approved Teleconf 19 March 2015
  • Derived from: S11, S19
  • Votes: HK, ericP:0, pfps:-0.5, SSt:0, KC+1, AR: +1
  • Comment (pfps): This does not seem to be helpful. What good would such labels be?
  • Comment (HK): One advantage is useability: the same constraint can be used to fully describe a property, without having to fall back to a global property triple in some other part of the document. This is important, e.g. when such shape/class declarations are serialized in Turtle and JSON. Another advantage is that sometimes the label of a property at a shape/class is different from any potentially global label. We have examples for this in the EPIM project (npd:id sometimes should be displayed as "well bore id" and "facility id").

Property Comment in a Shape

It should be possible to provide human-readable descriptions of the role of a property in the context of a shape, not just globally using triples that have the rdf:Property as subject. Multiple languages should be supported.

  • Status: Approved Teleconf 19 March 2015
  • Derived from: S11, S19
  • Votes: HK, pfps:-0.5, KC+1, AR: +1
  • Comment (HK): A good example of why this is needed is DCAT. This reuses many properties from external namespaces such as Dublin Core, but needs to override the global rdfs:comment at those properties with local text. Currently these local comments only live in the HTML spec of DCAT, but cannot be used in form builders etc. Schema.org is another example where the same property URI is used in several classes. Not having such a facility would limit what form builders can do.
  • Comment (pfps): An example showing why DCAI needs to override rdfs:comment values is needed here.
  • Answer (HK): For example look at the description of dct:title at http://www.w3.org/TR/vocab-dcat/#class-catalog, which is "A name given to the catalog". There are many more examples if you scroll down on that page.

External

The requirements from Dublin Core are, in some sense, all unofficial requirements for the RDF Shapes Working Group. At some time, each of them might be considered by the working group.

Eric Prud'hommeaux created a tool to show a Hierarchical View of the Dublin Core Requirements.

Paper by Bosch, Eckert: Requirements on RDF Constraint Formulation and Validation (DC 2014): https://github.com/boschthomas/PhD/raw/master/publications/Papers%20in%20Conference%20Proceedings/Bosch%2C%20Eckert.%20Requirements%20on%20RDF%20Constraint%20Formulation%20and%20Validation%20(DC%202014).pdf