Warning:
This wiki has been archived and is now read-only.

User Stories

From RDF Data Shapes Working Group
Jump to: navigation, search

Contents

Status of this Document

Per decision 18 Dec 2014, new user stories are added only by WG decision. These are neither final nor approved; the WG will continue to refine them.

Stakeholders

  • Data modelers
  • software developers (because they may be doing ontology-driven applications)
  • clinical informaticions
  • data creators
  • data stewards
  • systems analysts/analysts
  • data scientists
  • data re-users
  • ontology modelers
  • user interface (UI) developers
  • anyone on the web creating web pages?
  • API designers
  • API consumers (not necessarily RDF knowledgeable)
  • tool developers (consume shapes as metadata)
  • Business Analysts
  • Devices/tools/services in the IoT
  • system security engineers
  • people with non-RDF legacy systems
  • data aggregators
  • data migration engineers
  • test engineers (software testers but also application conformance testers)
  • integration test engineers
  • W3C standards creators
  • data quality engineers
  • reference data managers

Themes

T1: Recursive Structures (S4, S9, S21)

T2: Model/Data Validation (S1, S2, S21)

T3: Dataset Partitions (S17, S18)

T4: Compliance/Governance (S21)

T5: Closed-world recognition for access control (S5), for partial ontology import (S6)

T7: Value Validation (S11)

T8: Interoperability (S12, S14)

T9: Nuanced validation (S3)

Stories

S1: The model's Broken!

Created by: Dean Allemang

Validate RDFS (maybe also OWL) models

The basic issue here is to ensure that the right kind of information is given for each property (or class) in the model, for example, to require that each property has to have a domain, or that classes have to be explicitly stated to be under some decomposition.

Input data: the RDF representation of an RDFS (or OWL) ontology

Input ontology: the ontology that represents RDFS (or OWL) syntax

OWL constraints (Stardog ICV)

Each property has to have a specified domain that is a class:

  rdf:Property <= exists rdfs:domain rdfs:Class

Each class has to be specified to be under the top-level decomposition:

  rdfs:Class <= { rdfs:Class, [and the other built-in classes] } union fills rdfs:subClassOf { ex:Endurant, ex:Perdurant }

Note: Because this story works with the built-in RDF, RDFS, and OWL vocabulary, the prohibition of using this vocabulary in OWL axioms would have to be lifted.

SPIN

Example: Each property has to have a domain

   rdf:Property
       spin:constraint [
           sp:text "ASK { NOT EXISTS { ?this rdfs:domain ?anyDomain } }"
       ]

SHACL-SPARQL

Example: Each property has to have a domain.

 [ sh:classScope rdf:Property ;
   sh:shape [ sh:predicate rdfs:domain ; sh:minCardinality 1 ] ] 

S2: What's the name of that person?

Created by: Peter F. Patel-Schneider

For a tool that builds a list of names for named entity resolution of people to work correctly, every person has to have one or more names specified, each of which is a string. Constraints can be used to verify that particular sets of data have such names for each person.

Before an RDF graph was fed into the tool the constraints would be run over the graph. Only if the constraints were not violated would the tool actually be run. The constraints would be separate from the ontology involved, as other uses of the same ontology would not necessarily have the requirement that all instances of person have their names specified.

The envisioned calling setup would be something like

verify(<graph containing information about people>,<person ontology>,<constraints>)

This is probably the simplest user story, both in terms of setup and requirements. It illustrates the basic idea of checking to see whether for some RDF graph all nodes that belong to some type have the right kind of information specified for them. The same story can be repeated for other tools that need particular information specified for the tool to work correctly.

This is probably the simplest user story, both in terms of setup and requirements. It illustrates the basic idea of checking to see whether for some RDF graph all nodes that belong to some type have the right kind of information specified for them. The same story can be repeated for other tools that need particular information specified for the tool to work correctly.

OWL constraints (Stardog ICV)

 Person <= exists name xsd:string & all name xsd:string

SPIN

   ex:Person
       spin:constraint [
           sp:text "ASK { FILTER NOT EXISTS { ?this ex:name ?anyName } }" 
       ] ;
       spin:constraint [
           sp:text "ASK { ?this ex:name ?name . FILTER (datatype(?name) != xsd:string) }" 
       ] This is probably the simplest user story, both in terms of setup and requirements.  It illustrates the basic idea of checking to see whether for some RDF graph all nodes that belong to some type have the right kind of information specified for them.  The same story can be repeated for other tools that need particular information specified for the tool to work correctly.;

or using Resource Shapes as a SPIN template

   ex:Person spin:constraint [  # or oslc:property 
       a oslc:Property ;
       oslc:propertyDefinition ex:name ;
       oslc:occurs oslc:One-or-many ;
       oslc:valueType xsd:string ;
   ]

ShExC

 my:PersonShape { ex:name xsd:string }

Many different mechanisms may initiate validation by a user interface instruction or SPARQL query selecting nodes by matching e.g. a oslc:describes or oslc:resourceShape property:

 my:PersonShape oslc:describes rdf:Person .

SHACL-SPARQL

 [ sh:classScope ex:Person ;
   sh:shape [ sh:and ( [ sh:predicate ex:name ; sh:valueType xsd:string ]
                       [ sh:predicate ex:name ; sh:minCardinality 1 ] ) ] ]

S3: Communicating back to users, kindly

Rather than rejecting or having yes/no, and discouraging users and rejecting a lot of data, have a number of responses that inform users of ways they could improve their data, while still accepting all but the truly unusable data. This requires levels of "validation".

SPIN

In TopBraid EVN (a web-based data entry tool), we have instance edit forms with an OK button. When OK is pressed, a server callback is made to verify all if constraints have been violated. If violations exist, they are presented to the user and depending on the severity and server settings, the user may continue without fixing the errors. SPIN can represent constraints in various severity levels (see also http://spinrdf.org/spin.html#spin-constraint-construct):

  • spin:Fatal: We can stop checking immediately, no way to continue
  • spin:Error: Something should really be fixed
  • spin:Warning: Just report it back to the user but don't block him
  • spin:Info: Just to print something out, e.g. for debugging.

Here is an example in SPIN, using the CONSTRUCT notation to produce a constraint violation - additional properties could be attached to each report, including pointers at the triple that is causing the issue. Not shown here, SPIN even has the ability to point at an INSERT/DELETE update query to fix a violation.

   kennedys:Person
      spin:constraint
         [ a       sp:Construct ;
           sp:text """
               CONSTRUCT {
                   _:violation a spin:ConstraintViolation ;
                        spin:violationRoot ?this ;
                        spin:violationPath kennedys:spouse ;
                        spin:violationValue ?spouse ;
                        spin:violationLevel spin:Warning ;
                        rdfs:label "Same-sex marriage not permitted (in this model)"
               }
               WHERE {
                   ?this kennedys:spouse ?spouse .
                   ?this kennedys:gender ?gender .
                   ?spouse kennedys:gender ?spouseGender .
                   FILTER (?gender = ?spouseGender) .
               }"""
         ] .

S4: Issue repository

Created by: Eric Prud'hommeaux

An LDP Container <http://PendingIssues> accepts an IssueShape with a status of "assigned" or "unassigned". The LDP Container is an interface to a service storing data in a conventional relational database. The shapes are "closed" in that the system rejects documents with any triples for which it has no storage. The shapes validation process (initiated by the receiving system or a sender checking) rejects any document with "extraneous" triples.

Any node in the graph may serve multiple roles, e.g. the same node may include properties for a SubmittingUser and for an AssignedEmployee.

Later the issue gets resolved and is available at <http://OldIssues> without acquiring new type arcs. The constraints for <http://PendingIssues> are different from those for Issues at <http://OldIssues>

ShEx

An LDP Container <http://PendingIssues> accepts an IssueShape with a status of "assigned" or "unassigned".

 <http://mumble/Issue1> ex:status ex:assigned . # not resolved

Later the issue gets resolved and is available at <http://OldIssues> without acquiring new type arcs. The constraints for <http://PendingIssues> Issues

 <PendingIssuesShape> { ex:status (ex:unassigned ex:assigned) }

are different from those for Issues at <http://OldIssues>

 <OldIssuesShape> { ex:status (ex:resolved) }

OWL constraints (Stardog ICV)

 issue & status="assigned" <= [constraints for assigned issues]
 issue & status="resolved" <= [constraints for resolved issues]

SPIN

In SPIN, such scenarios can be expressed by injecting pre-conditions into the constraint, e.g.

   ex:Issue
       spin:constraint [
           sp:text """
               # This constraint applies with status "assigned" only
               ASK {
                   ?this ex:status "assigned" .
                   ... the actual tests
               }"""
       ]

S5: Closed-world recognition (EPIM ReportingHub)

Created by: Holger Knublauch

EPIM Project - petroleum operators on the Norwegian continental shell need to produce environment reports of what chemicals were dumped into the sea and gases to the air. There is a need for access rules on what operators can see what data from what oil and gas fields, and for complex constraints to run during import of XML files.

This is an example of very complex constraints that require many features that are present in SPARQL to represent model-specific scenarios, including the comparison of incoming values against a controlled fact base, transformations from literal values to URIs, string operations, date comparisons etc.

Details: EPIM ReportingHub

SPIN was used to represent and evaluate those constraints. User-defined SPIN functions were used to make those complex queries maintainable.

S6: Partial ontology import - (withdrawn)

Originally created by Ralph Hodgson, edited by HK

Note: This story was not about constraints, but described a solution to the "different shapes in different contexts" scenario. Since this solution does not require new features, the story was regarded as out of scope from a requirements perspective and was moved into its own Wiki page

S7: Different shapes at different times, or different access at the same time. - withdrawn

Created by: Eric Prud'hommeaux

<need more here>; For me it sounds like tying shapes to metainformation (i.e. annotating), so certain constraints are only valid e.g. in certain time periods/intervals. Definitely a nice to have but most likely not formalizable in a "light-weight" manner. If shapes are not explicitly bound to owl/rdfs it is easier to decouple them.

S8: Checking RDF node type

Created by: Holger Knublauch

Status: Suggested

It is often necessary or desirable to check whether certain property values (or more general: RDF nodes) are of a specific node type (IRI, BlankNode or Literal and all combinations thereof). Often the intention is to state that a given property shall only have IRIs but not BlankNodes.

Two examples from the VOID namespace, void:dataDump and void:exampleResource declare an rdfs:range of rdfs:Resource, but the intention is to only support IRI resources.

The DCAT namespace has similar examples, where only IRI nodes are permitted: dcat:landingPage, dcat:accessURL and dcat:downloadURL. The declared ranges are foaf:Document or rdfs:Resource. The foaf:Document case is interesting, shows that people might want to specify both the value’s class (foaf:Document) and node type (URI only).

SPARQL includes the built-ins isIRI, isBlank, isLiteral for those checks.

PFPS: This appears to not be about the values for properties but is instead about which kind of RDF node appears at particular points in an RDF graph. This kind of restriction could be very fragile, as triple stores might make meaning-preserving changes (or changes that only make a minor difference) to RDF graphs but that would change the validity of this kind of restrictions.

Arthur Ryman: This is useful. OSLC Resource Shapes has a similar capability. You can specify if a node is inline, oslc:Inline, (i.e. its triples are in the graph, which means it could be a blank node or URI) or external, oslc:Reference (which implies the node is a URI).

S9: Contract time intervals

Created by: Dean Allemang

OMG time ontology adopted by FIBO. end date *exists* but may not be specified. Some contracts (bonds) have an end date.

Note that the restriction refinement is in a subclass.

pfps: Having a date that exists but might not be specified does not appear to be a constraint. The OWL constraints below require that all contracts have exactly one time interval provided. For bonds, this time interval has to have an end date.

Possible change (pfps): An ontology may state that instances of a class have a value for a property. Subclasses may be associated with a constraint that requires that there is a provided value for the property. For example, in the OMG time ontology adopted by FIBO every contract has to have an end date. A set of constraints may require that bonds (a subclass of contracts) have specified end dates without requiring that all contracts have specified end dates.

Accepted change (Tthibodeau): An ontology may state that instances of a class have a value for a property. Subclasses may be associated with a constraint that requires that there is a provided value for the property. For example, in the OMG time ontology adopted by FIBO every contract has to have an end date. A shape (set of constraints) may require that bonds (a subclass of contracts) have specified end dates without requiring that all contracts have specified end dates.

OWL constraints (Stardog ICV)

RDFS ontology:

 ex:Bond rdfs:subClassOf ex:Contract .
 ex:valid rdfs:domain ex:Contract .
 ex:valid rdfs:range ex:TimeInterval .
 ex:endTime rdfs:domain ex:TimeInterval .
 ex:endTime rdfs:range xsd:date .

Constraints:

ex:Contract <= =1 ex:valid ex:TimeInterval
ex:Bond <= all ex:valid ( exists ex:endTime xsd:date )

The first constraint says that every instance of ex:Contract has to have a provided value for ex:valid that is an ex:TimeInterval. The second constraint says that every provided value for ex:valid for every instance of ex:Bond has to have a provided value for ex:endTime that is an xsd:date.

(HK: Looks easy to represent in SPIN but I do not understand the syntax above, so I cannot provide an example at this stage)

DTA: The ICV seems to me to address the spirit of this story - that is, for Contract, it is required that there be a TimeInterval, whereas for Bond, there is (also) a requirement that this interval have an end date (for every such interval). This seems to me to address the example in the story. This thing supposes that the constraint from the top class (Contract) hold for the subclass (Bond), which is also in the spirit of the story.

What does this mean for system behavior? That a bond with an interval that lacks an end date would signal an error, or a form created from this shape would have a required "end date" field for Bond, whereas the form for Contract would have the field, but it would not be required.

S10: card >= 0

Created by: Dean Allemang

Mention a property in a card>= 0 restriction, just to indicate an expectation that it will (or might) be there without requiring that it be there

There is a class in FIBO called IncorporatedCompany, which is a subclass of a bunch of restrictions. Many of them are of the form:

fibo-be-oac-cpty:hasControllingInterestParty min 0 fibo-be-oac-cctl:VotingShareholder

i.e., a qualified cardinality of min 0.


What exactly does this mean? (logically, it is meaningless. Right?) I have an email in to some other FIBO ontologists, but here are some things I think it should mean:

1) If we build a form for an IncorporatedCompany, there should be a field in that form for hasControllingInterestParty. The field should be pre-populated (e.g., with a drop-down) with known VotingShareholder s. We won't draw any inferences about the things here (as we would have done, if we had said min=1 or more)

2) (this one is still too vague, I'm asking for some clarification from others) If we receive a payload describing an IncorporatedCompany, and it has values for hasControllingInterestParty, then at least one of them should be known to be a VotingShareholder. I'm not sure what the appropriate behavior is if this doesn't hold. If we wanted a hard error, we should have said min 1.





OWL constraints (Stardog ICV)

Ontology:

ex:name rdfs:domain ex:Person .
ex:Person rdf:type rdfs:Class .

Constraint:

ex:Person <= >=0 name

SPIN

In SPIN with the OSLC ontology, this could look like the following (note that oslc:property is a sub-property of spin:constraint):

   ex:Person
       oslc:property [
           a oslc:Property ; 
           oslc:propertyDefinition ex:name ;
           oslc:occurs oslc:One-or-more
       ]

ShExC

ShExC uses regex chars '?', '*', '+' to indicate cardinality (the '.' means we don't care about the object type):

 my:PersonShape { ex:name . + }

S11: Model-Driven UI constraints

It is useful to build input forms and perform validation of permissible values in user interfaces via a model-driven approach, where the model provides information about the possible values for properties.

The major requirement here is a declarative model of

  • which properties are relevant for a given class/instance
  • what is the value type of those properties
  • what is the valid cardinality (min/maxCount)
  • what is the interval of valid literal values (min/maxValue)
  • any other metadata typically needed to build forms with input widgets

A meta-requirement here is to be able to make use of the information above without having to run something like SPARQL queries, i.e. the model should be sufficiently high level so that all kinds of tools can use that information. However, at the same time there are many advanced constraints that need to be validated (either on server or client) before a form can be submitted. These constraints are not necessarily "structural" information, but rather executable code that returns error messages.

A number of solutions and applications have been deployed which use SPIN to check constraints on permissible values to user interfaces. This overcomes the software debt that comes from using javascript that can readily become out-of-sync with the underlying models.

Details about an existing implementation in TopBraid: Ontology-Driven Forms.

S12: App Interoperability

Created by: Sandro Hawke

To briefly rephrase the user story: there is one application (eg Cimba) which stores application state in RDF. It currently queries and modifies that state using HTTP GET and PUT operations on RDF Sources, but we have another version being developed that uses SPARQL to query and modify the data. The question is, how do we communicate the shape of the data this application reads and writes to other developers who want to make compatible applications? We want to say: as long as your data is of this form, Cimba will read it properly. We also want to say: Cimba may write data of any of these forms, so to be interoperable, your application will need to read and correctly process all of them.

For example, the query Cimba should use to show the 10 most recent posts by people being followed looks something like this:

SELECT ?date ?content ?owner WHERE {
[USER] space:storage ?storage.
?storage ldp:contains ?chanSpace.
?chanSpace a mblog:ChannelSpace.
?chanSpace ldp:contains ?subList.
?subList a mblog:SubscriptionList.
?subList ldp:contains ?subscr
?subscr a mblog:Subscription.
?subscr mblog:toChannel ?chan.
?post a mblog:Post.
?chan ldp:contains ?post.
?post mblog:content ?content;
      dct:created ?date;
      mblog:owner ?owner.
}
ORDER BY ?date
LIMIT 10.

S13: Specification and validation of metadata templates for immunological experiments

Created by: HIPC Consortium

Contributed by: Michel Dumontier

Systems Biology is playing an increasingly important role in unraveling the complexity of human immune responses. A key aspect of this approach involves the analysis and integration of data from a multiplicity of high-throughput immune profiling methods to understand (and eventually predict) the immunological response to infection and vaccination under diverse conditions. To this end, the Human Immunology Project Consortium (HIPC) was established by the National Institute of Allergy and Infectious Diseases (NIAID) of the US National Institutes of Health (NIH). This consortium generates a wide variety of phenotypic and molecular data from well-characterized patient cohorts, including genome-wide expression profiling, high-dimensional flow cytometry and serum cytokine concentrations. The adoption and adherence to data standards is critical to enable data integration across HIPC centers, and facilitate data re-use by the wider scientific community.

In collaboration with ImmPort, we have developed a set of spreadsheet-based templates to capture the metadata associated with experimental results such as Flow Cytometry results and Multiplex Bead Array Assay (MBAA) results. These templates contain metadata elements that are either required or optional, but importantly, define the value of the field to specific datatypes (e.g. string, integer, decimal, date) that may be restricted by length or to a regular expression pattern, and limited to specific categorical values or terminology trees/class expressions of a target ontology, especially those drawn from existing ontologies such as Cell Ontology (CL) and Protein Ontology (PO). Once filled out, these spreadsheets are programmatically validated. The values are then stored in a database and are used to power web applications and application programming interfaces.

Given the rapid change in the kinds of experiments performed and the evolving requirements concerning relevant metadata, it is crucial that a language to define these metadata constraints enable us to define different sets of metadata fields and values sets in a modular manner. In addition to HIPC, there are other immunology consortia that might involve different requirements as to how data templates should be defined according to specific needs. It should be relatively straightforward to substitute one set of shape expressions for another. It is also important that the shapes themselves are versioned and the results of validation record the version of the shape expression. It should be possible to validate data using any set of developed shapes.

Ideally, the shapes language should be readable by computers in order to automatically generate template forms with restriction to specified values. Moreover, libraries and tools to construct and validate templates and their instance data should be readily available.

S14: Quality assurance for object reconciliation

Contributors: David Martin and Richard Cyganiak

(David Martin's original text, as well as the discussion that led up to my revision, are archived here: User:Rcygania2/S14 Revision —cygri)

In data integration activities, tools such as Silk or Limes may be used to discover entity coreferences. Entity coreferences are pairs of different identifiers, often in different datasets, that refer to the same entity. Detected coreferences are often recorded as owl:sameAs triples. This may be a step in an object reconciliation pipeline.

It would be nice if shapes could flexibly state conditions by which to check that identity of objects has been correctly recorded; that is, check conditions under which a same-as link should be present between two identifiers, or conversely, check conditions for mis-identified same-as links.

For example (movies domain):

  • If source1.movie.title is highly similar (by some widely adopted string similarity function, perhaps plugged in through an extension interface) to </code>source2.film.title</code> and source1.movie.release-date.year is identical to source2.film.initial-release, then a owl:sameAs triple should be present
  • If source1.movie.title is identical to source2.film.title and source1.movie.release-date.year is within two years of source2.film.initial-release, then a owl:sameAs triple should be present
  • If source1.movie.directors has the same set of values as source2.film.directed-by AND source1.movie.title is highly similar to source2.film.title, then a owl:sameAs triple should be present

The intent here is not that the validation process should produce the expected owl:sameAs triples. We assume that some other tool or process has already produced these triples. The purpose of these validation rules is to perform quality assurance, or sanity checks, on the output of these other tools or processes. Thus, the quality or completeness of the generated linkset could be assessed.

We note however that object reconciliation tools could be driven by constraints like those given above. So potentially, an object reconciliation tool and a validator could use the same input constraints. Thus, this story straddles the boundaries between constraint checking and inference.

S15: Validation of Dataset Descriptions

Created by: Michel Dumontier

Access to consistent, high-quality metadata is critical to finding, understanding, exchanging, and reusing scientific data. The W3C Health Care and Life Sciences Interest Group (HCLSIG) has developed consensus among participating stakeholders on key metadata elements and their value sets for description of HCLS datasets. This specification [1], written as a W3C note, meets key functional requirements, reuses existing vocabularies, is expressed using the Resource Description Framework (RDF). It provides guidance for minimal data description, versioning, provenance, statistics. We would like to use RDF Shapes to specify these constraints and validate the correctness of HCLS dataset descriptions.

The specification defines a 3 component model for summary,versioning, and distribution-level descriptions. Each component has access to a specific set of metadata elements and these are specified as MUST, SHOULD, MAY, and MUST NOT. As such there are different conformance criteria for each level. Metadata values are either unrestrained rdfs:Literals, constrained rdfs:Literals, URIs with a specified URI pattern, or instances of a specified URI-identified type, or a disjunction of URI-specified types.

keywords: context-sensitive constraints, cardinality constraints,

[1] http://tinyurl.com/hcls-dataset-description

Cardinalities and ranges are covered by all existing proposals, so I guess the interesting bit here is how to represent that certain constraints only apply in certain contexts ("levels: summary, version, distribution").

SPIN

In SPIN this could be represented in several ways, but the easiest might be to put multiple class definitions into different named graphs, e.g. have a different named graph for summary level than version level. It is unclear how the notion of "level" can be sufficiently generalized. We could also introduce a meta-property spin:context that can be attached to any spin:constraint to define pre-conditions that need to be met before the constraint is evaluated. This context could also be a SPARQL expression, e.g. to call a SPIN function that looks at trigger triples in the current query graph.

S16: Constraints and controlled reasoning.. We need both! (and macro- or rule mechanisms)

Created by: Simon Steyskal and Axel Polleres

A use-case we were facing recently and have discussed in [1], was revolving around the integration of distributed configurations (i.e. object-oriented models) with RDFS and SPARQL. By using Semantic Web technologies for achieving this task we:

  1. aimed to provide a convenient way to perform certain tasks on the global view such as:
    1. querying it (and thus all underlying local schemas)
    2. perform constraint checks (i.e. checking integrity constraints)
    3. perform reasoning or consistency checks.
  2. wanted to leverage the use of SWT in configuration management.

[1] https://ai.wu.ac.at/~polleres/publications/sche-etal-2014ConfigWS.pdf

Assuming UNA and CWA

For this particular use-case we had to assume both Unique Name Assumption (UNA) and Closed World Assumption (CWA) for our ontologies, since the models (i.e. configurations) they were derived from were generated by product configurators which impose both UNA and CWA.

Since neither RDFS or OWL impose UNA/CWA we had to come up with some workarounds which were basically:

UNA 2.0
all URIs are treated as different, unless explicitly stated otherwise by owl:sameAs (UNA 2.0 because in general, if two URIs are different and the ontology they are contained in is assumed to obey the UNA then they cannot be connected via owl:sameAs).
CWA
we assumed to know every existing individual of local configurations and directly connected individuals from other local configurations, thus an absence of a certain individual in the local configuration means that it does not exist.

SPARQL and UNA

As mentioned earlier, we used SPARQL to perform query tasks on the global schema as well as to check simple integrity constraints by translating e.g. cardinality restrictions into ASK queries.

One major problem which arose based on our workaround to impose UNA was that SPARQL is unaware of the special semantics of owl:sameAs. Which means that especially if one wants to use counting aggregates, one usually wants to count the number of real-objects and not the number of URIs referring to it.

As an example we defined two SPARQL queries which should count the number of subnets of a certain system [p.5 Figure 8,1]:

Listing 1: Query without special treatment of sameAs:

SELECT (COUNT(DISTINCT ?subnet) AS ?numberofsubnets)
WHERE {
 ?subnet a ontoSys:Subnet .
}
# result: numberofsubnets = 3 

Listing 2: Query with special sameas treatment (chooses the lexicographic first element as representation of the equivalence class):

SELECT (COUNT(DISTINCT ?first) AS ?numberofsubnets)
WHERE {
 ?subnet a ontoSys:Subnet .
 # first subquery
 { SELECT ?subnet ?first
   WHERE {
     ?subnet ((owl:sameAs|^owl:sameAs)*) ?first .
   OPTIONAL {
    ?notfirst ((owl:sameAs|^owl:sameAs)*) ?first .
    FILTER (STR(?notfirst) < STR(?first))}
    FILTER(!BOUND(?notfirst))}
 }
}
# result : numberofsubset = 1

Obviously Listing 2 is way more ugly than Listing 1, especially due to some nasty path expressions which are necessary to traverse through potential owl:sameAs chains. Other approaches such as replacing those chains with pivot-identifiers in a potential pre-processing step are not feasible since we actually want to keep the different identifiers separate in the data for particular use-cases.

Some thoughts...

  1. A macro- or rule mechanism for certain paths or parts reusable in constraints might help to keep constraint expressions tight and clean.
  2. We have to consider both constraints AND controlled reasoning!
  3. We also note that referring to some controlled reasoning, specifying specific inference rules that should be considered, this may be reviewed to relate to the (postponed) SPARQL feature http://www.w3.org/2009/sparql/wiki/Feature:ParameterizedInference


SPIN

SPIN provides a mechanism to declare new "magic properties" (http://spinrdf.org/spin.html#spin-magic) that can encapsulate complex logic into a more maintainable syntax. One such magic property could represent the owl:sameAs logic you mention above. However, magic properties are not part of the SPARQL standard, so we would need to officially extend SPARQL for this to be workable.

(SS: Those magic properties seem to fit the request of a "macro- or rule mechanism for certain paths" quite nicely. Thanks for the hint!)

S17: Specify subsets of data

Created by: Dean Allemand (for Dave McComb)
Heavily edited by: --Harold Solbrig (talk) 18:37, 17 March 2015 (UTC)

The medical community has an interest in the notion of "archetypes" that are expressed as abstract constraints on a reference model. The reference model describes the largest set of possible possible instances of a given collection of data and the archetypes then constrain this set of instances by restricting cardinality, types, value ranges, etc. One way to implement archetype models would be through RDF and SHACL, where the reference model would be viewed as the "constraints" -- the set of rules that are used to validate incoming data and to document data set validity.

The archetypes, however, would serve the additional purpose of defining "instance subsets". The archetypes identify filters/queries that would allow a user to return the a set of shapes that met certain criteria such as abnormal values, co-occurence, etc. They could also act as filters, funneling incoming instances to secondary processes where necessary.

It should be noted that the primary representation for archetypes in the medical community will probably not be SHACL -- they will be using Archetype Definition Language (ADL) (or the UML equivalent, AML) and/or profiles, with SHACL being a translation.


S18: Scope of Export - (withdrawn)

Created by: David Martin

Starting from a given KB object (individual), I want to export a bunch of related stuff. Use shapes to specify the paths / conditions by which the stuff to be exported can be selected

(HK: Is this identical to S17? Needs more details).

(DM: Yes, I agree. This can be viewed as a special case of S17 – the case where one or more objects are given, which can be used as starting-points for determining the desired subset of a KB. (And in fact, the examples given in S17 already apply to this special case, except I was thinking of instances whereas those examples refer to classes.) So yes, S17 and S18 should be merged. Also, I grant this is not very constraint-like. But it is a very common use case, in my experience, whose solution would have excellent practical value.

S19: Query Builder

Created by: Nick Crossley

Various tools are contributing data to a triple store. A Query Builder wants to know the permitted or likely shapes of the data over which the generated queries must run, so that the end user can be presented with a nice interface prompting for likely predicates and values. Since the data is dynamic, this is not necessarily the same as the shape that could be reverse engineered from the existing data. The Query Builder and the data-producing tools are not provided by the same team - the Query Builder team has very limited control over the data being produced. The source of the data might not provide the necessary shape information, so we need a way for the Query Builder team (or a third party) to be able to provide the shape data independently. See also Ontology-Driven Forms and S11.


PFPS: Having an example would be useful.

S20: Creation Shapes

Created by: Nick Crossley

A client creating a new resource by posting to a Linked Data Platform Container [2] wants to know the acceptable properties and their values, including which ones are mandatory and which optional. Note that this creation shape is not necessarily the same as the shape of the resource post-creation - the server may transform some values, add new properties, etc. [2] http://www.w3.org/TR/ldp/#ldpc

See the ongoing discussion at http://lists.w3.org/Archives/Public/public-data-shapes-wg/2014Nov/0160.html with hints at a solution based on named graphs. Other solutions with stand-alone shapes have been proposed as well as an option to select constraints based on decorations (annotations)

pfps: Having an example here would be useful.

S21: SKOS Constraints

Created by: Holger Knublauch

The well-known SKOS vocabulary defines constraints that are outside of the expressivity of current ontology languages. They can be expressed using SPARQL built-ins, e.g. via SPIN. Examples include

  • make sure that a resource has at most one preferred label for a given language
  • preferred labels and alternative labels must be disjoint

Details: SKOS Constraints

(DCMI doubles down on this one; we have exactly these constraints, specifically on SKOS.)

S22: RDF Data Cube Constraints

Created by: Holger Knublauch

The Data Cube Vocabulary provides a means to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts. While the bulk of the vocabulary is defined as an RDF Schema, it also includes integrity constraints:

http://www.w3.org/TR/vocab-data-cube/#wf-rules

Each integrity constraint is expressed as narrative prose and, where possible, a SPARQL ASK query or query template. If the ASK query is applied to an RDF graph then it will return true if that graph contains one or more Data Cube instances which violate the corresponding constraint.

Using SPARQL queries to express the integrity constraints does not imply that integrity checking must be performed this way. Implementations are free to use alternative query formulations or alternative implementation techniques to perform equivalent checks.

A discussion thread can be found here: http://lists.w3.org/Archives/Public/public-rdf-shapes/2014Aug/0055.html


Example: "Every qb:DataStructureDefinition must include at least one declared measure"

   ASK {
       ?dsd a qb:DataStructureDefinition .
       FILTER NOT EXISTS { ?dsd qb:component [qb:componentProperty [a qb:MeasureProperty]] }
   }

In SPIN:

   qb:DataStructureDefinition
       spin:constraint [
           a sp:Ask ;
           sp:text "ASK { FILTER NOT EXISTS { ?this qb:component [qb:componentProperty [a qb:MeasureProperty]] } }"
       ]

S23: schema.org Constraints

Created by: Holger Knublauch

Developers at Google have created a validation tool for the well-known schema.org vocabulary for use in Google Search, Google Now and Gmail. They have found that what may seem like a potentially infinite number of possible constraints can be represented quite succinctly using existing standards like the SPARQL query language and serialized as RDF.

http://www.w3.org/2001/sw/wiki/images/0/00/SimpleApplication-SpecificConstraintsforRDFModels.pdf

Example: Boarding passes will only be shown in Google Now for flights which occur at a future date:

Solution from the Google Paper (JSON-LD), replacing boardingTime with departureTime:

   {
       "@context": {...},
       "@id": "schema:FlightReservation",
       "constraints": [
           {
               "context": "schema:reservationFor",
               "constraint": "ASK WHERE {?s schema:departureTime ?t. FILTER(?t > NOW())}",
               "severity": "warning",
               "message": "A future date is required to show a boarding pass.",
           }
       ]
   }

In SPIN this would look similar (in Turtle, using syntactic sugar supported from TopBraid 4.6 onwards so that ASK can be used instead of CONSTRUCT):

   schema:FlightReservation
       spin:constraint [
           a sp:Ask ;
           sp:text "ASK { ?this schema:reservationFor/schema:departureTime ?t . FILTER (?t <= NOW()) }" ;
           spin:violationPath schema:reservationFor ;
           spin:violationLevel spin:Warning ;
           rdfs:label "A future date is required to show a boarding pass." ;
       ] .

Other example constraints for schema.org:

  • On schema:Person: Children cannot contain cycles, Children must be born after the parent, deathDate must be after birthDate
  • On schema:GeoCoordinates: longitude must be between -180 and 180, latitude between -90 and 90
  • On various: email address must match a certain regular expression
  • On schema:priceCurrency, currenciesAccepted: Currency code must be from a given controlled vocabulary
  • On schema:children, colleagues, follows, knows, parents, relatedTo, siblings, spouse, subEvents, superEvents: Irreflexitity

S24: Open Content Model

Created by: Arthur Ryman

See Open Content Model Example for a detailed example.

Suppose there is a need to integrate similar information from multiple applications and that the application owners have gotten together and defined an RDF representation for this information. However, since the applications have some differences, the application owners can only agree on those data items that are common to all applications. The defined RDF representation includes the common data items, and allows the presence of other undefined data items in order to accommodate differences among the applications. In this situation, the RDF representation is said to have an open content model. In fact, one of the attractive features of RDF technology is that it readily enables open content models.

For example, the OSLC Change Management (CM) specification specifies a very minimal representation for Change Requests (e.g. bug reports). A large software development organization may use several different change management tools, e.g. Bugzilla, Jira, and ClearQuest, each with their own proprietary resource format. The OSLC CM specification provides a way to process change management information in a uniform way, independently of the tool that hosts it. However, there may also be interesting differences in the type of information hosted by each tool. OSLC therefore specifies an open content model which allows implementations to extend the base representation with additional content. This content is represented as additional RDF properties on the resources. Furthermore, it is very common for change management tools to partition their resources into defined projects which restrict who can access the resources and which define custom attributes on the resources. Here the term custom attribute refers to an attribute that is not defined out-of-the-box in the tool. The tool administrators customize the tool by defining custom attributes, typically on a per-project basis. For example, one project might add a customer reference number while another might add a boolean flag indicating if there is an impact to the online documentation. These custom attributes also appear as additional RDF properties of the resources.

OSLC specifications typically define one or more RDF types. For example, the RDF type for change requests is oslc_cm:ChangeRequest where the prefix oslc_cm is <http://open-services.net/ns/cm#>. The RDF representation of an OSLC change request contains a triple that defines its type as oslc_cm:ChangeRequest, triples that define RDF properties as described in the OSLC CM specification, and additional triples that correspond to tool-specific or project-specific custom attributes. Note that the addition of custom attributes does not require the definition of a new RDF type. Furthermore the RDF properties used to represent custom attributes may come from any RDF vocabulary. In fact, tool administrators are encouraged to reuse existing RDF properties rather than define synonyms.

Since the shape of a resource may depend on the tool that hosts it, or the project that hosts it within a tool, but the RDF type of the resource may not depend on the tool or project, there is in general no way to navigate to the shape given only its RDF type. The OSLC Resource Shapes specification provides two mechanisms for navigating to the appropriate shape. First, the RDF property oslc:resourceShape where oslc: is <http://open-services.net/ns/core#> may be used to link a tool or project description to a shape resource. Second, the RDF property oslc:instanceShape may be used to link a resource to its shape.

PFPS: I'm having a hard time determining just what is supposed to come from this story. Taken literally, it appears to be stating a need for a particular RDF property, by name, but that's not something that should be coming from these stories. A more natural need here is the ability to get to different constraints or shapes depending on something other than the type of resources. If so, then this story is subsumed by S4.

Arthur: @PFPS yes, S4 is relevant. However it lacks details about how shapes are associated with REST APIs. I am going to describe how OSLC achieves this. The W3C shape specification must not preclude this, or better still, it should define a way to achieve it. I am creating a detailed example, Open Content Model Example.

kcoyle: Arthur's case sounds like cases that we have in the library/cultural heritage world. We can't rely always on rdf:type because different applications may take very different views of the same data. One application may consider title + author + subject to be conceptually "of class work". Another application may consider title + author + subject + language + musicalKey to be a work. These are a:Work and b:Work. However, adhering to the ontology definitions means not being able to operate over combinations of this data. Therefore, the graphs defined by classes, as intended in RDF, may not be the best entry points for this data. Instead, graphs will need to be derived "opportunistically" in order to allow communities with different views to share data. This may mean violating each other's "data integrity" in order to share; and using profiles rather than RDF/OWL ontologies to "read" the data. [Arthur, let me know if I'm completely missing your point here.]

Arthur: @kcoyle, you get my point. In an environment of diverse stakeholders there is always a tension between conformity, which has the benefit of promoting interoperability, versus differentiation, which enables satisfying local requirements or competing on the basis of value-added features. Some central body defines a standard to the extent required for interoperability, but allows for local customization. However, I disagree with you on the point of violating each other's data integrity. That would defeat interoperability. To be concrete, suppose you need to collect data from multiple sources and query it. The data from each source should still conform to the standard so that queries, aggregation, etc. are possible and meaningful. e.g. If the standard says that the value of a length property must be metres, you better not give it in yards. Concerning rdf:type, I like the Linked Data viewpoint which says that to get information about a thing, you dereference its URI. Therefore it does not seem aligned with Linked Data to :import a definition of a type. The authoritative definition of a type should be obtained from its creator host via dereferencing its type URI. Uses of the type should be consistent with its authoritative definition. Therefore, in order to be widely usable, the definition must not bring in a lot of baggage. The same goes for properties. Applications should be able to compose types and properties into information resources, and the constraints on those resources should be expressible through an orthogonal mechanism such as a shape.

S25: Primary Keys with URI Patterns

Created by: Holger Knublauch

It is very common to have a single property that uniquely identifies instances of a given class. For example, when you import legacy data from a spreadsheet, it should be possible to automatically produce URIs based on a given primary key column. The proposed solution here is to define a standard vocabulary to represent the primary key and a suitable URI pattern. This information can then be used both for constraint checking of existing instances, and to construct new (valid) instances. One requirement here is advanced string processing, including the ability to turn a partial URI and a literal value into a new URI.

Details: Primary Keys with URI Pattern

S26: rdf:Lists and ordered data

Created by: Axel Polleres Modified by: Karen Coyle

Can we express validating rdf:Lists a in our framework? This is more than just a stresstest but a variation of this can be used to check whether all members of a list have certain characteristics.

Libraries have a number of resources that are issued in ordered series. Any library may own or have access to some parts of the series, either sequential or with broken sequences. The list may be very long, and it is often necessary to display the list of items in order. The order can be nicely numerical, or not. Another ordered list use case is that of authors on academic journal articles. For reasons of attribution (and promotion!), the order of authors in article publishing can be significant. This is not a computable order (e.g. alphabetical by name). There are probably other cases, but essentially there will definitely be a need to have ordered lists for some data. Validation could be: the list must have a beginning and end; there can be/cannot be gaps in the list.

(kcoyle: Aside: I have great trepidation at tackling lists because of the complexity of the use cases in my community. I'd like feedback on this issue.)

Details: rdf:List Stresstest

PFPS: I like this as a stress test, but it's not a story. Perhaps someone can turn it into a story, but otherwise it should be moved elsewhere (maybe the end of this document).

HK: A variation of this is very well a real story. We often have the requirement to formalize that a given rdf:List should only have values of certain types in it. It's a bit like with Java generics, where you can write List<Person> to parameterize a generic List class. This is currently missing from the RDF syntax, but could be represented as an additional constraint on a property that has rdf:List value type.

PFPS: How about someone then put the story information at the beginning of the section?

HK: Ok, done.

S27: Relationships between values of multiple properties

Created by: Holger Knublauch

It is quite a common pattern to have relationships between multiple properties. A typical example is "Start date must be before end date" or "All children must be born after their parents". This information can be use to validate user input on forms and incoming data in web services.

Story: (kc) Cultural heritage data is created in a distributed way, so when data is gathered together in a single aggregation, quite a bit of checking must be done. One of the key aspects of CH data is the identification of persons and subjects, in particular relating them to historical contexts. For persons, a key context is their own birth and death dates; for events, there is often a date range representing a beginning and end of the event. In addition, there are cultural heritage objects that exist over a span of time (serial publications, for example). In each of these cases, it is desirable to validate the relationship of the values of properties that have temporal or other ordered characteristics.

Details: Relationships between values of different properties

pfps: kc's story appears to fit the bill here.

S28: Self-Describing Linked Data Resources

Created by: Holger Knublauch

In Linked Data related information is accessed by URI dereferencing. The information that is accessible this way may represent facts about a particular resource, but also typing information for the resource. The types can themselves be used in a similar way to find the ontology describing the resource. It should be possible to use these same mechanisms to find constraints on the information provided about the resource.

For example, the ontology could include constraints or could point to another document that includes constraints. Or the first document accessed might include constraints or point to another document that includes constraints.

(Old version: This is probably the default requirement from a Linked Data perspective: Given a resource URI, tell me all you know about it. The standard procedure is to look up the URI to retrieve the triples for this URI. The next step in RDF/OWL is to look for rdf:type triples and then follow those URIs to look up the class definitions. In OWL, those class definitions often carry owl:Restrictions. In SPIN, those class definitions would carry spin:constraints.)

DCMI story: For some properties there is a requirement that the value IRI resolve to a resource that is a skos:Concept. The resource value is not limited to a particular skos:Concept scheme.

S29: Describing interoperable, hypermedia-driven Web APIs (with Hydra)

Created by: Holger Knublauch

Hydra http://www.hydra-cg.com/ is a lightweight vocabulary to create hypermedia-driven Web APIs. By specifying a number of concepts commonly used in Web APIs it enables the creation of generic API clients. The Hydra core vocabulary can be used to define classes and "supported properties" which carry additional metadata such as whether the property is required and whether it is read-only.

This feels very similar to the OSLC Resource Shapes story and uses similar constructs. It is also possible to express the supported properties as a SPIN constraint check, as implemented here: http://topbraid.org/spin/spinhydra

S30: PROV Constraints

Created by: Holger Knublauch

The PROV Family of Documents http://www.w3.org/TR/prov-overview/ defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. One of these documents is a library of Constraints http://www.w3.org/TR/2013/REC-prov-constraints-20130430/ which defines valid PROV instances. The actual validation process is quite complex and requires a normalization step that can be compared to rules. Various implementations of this validation process exist, including a set of SPARQL INSERT/SELECT queries sequenced by a Python script (https://github.com/pgroth/prov-check/blob/master/provcheck/provconstraints.py), an implementation in Java (https://provenance.ecs.soton.ac.uk/validator/view/validator.html) and in Prolog (https://github.com/jamescheney/prov-constraints). Stardog also defines an "archetype" for PROV, which seems to be implemented in SPARQL using their ICV engine (http://docs.stardog.com/admin/#sd-Archetypes).

PFPS: It would be useful to pull out a few examples from this story to show what expressive power is needed for this story.

S31: LDP: POST content to Container of a certain shape

Similar to S29

Created by: Steve Speicher

Some simple LDP server implementations may be based on lightweight app server technology and only deal with JSON(-LD) and Turtle representations for their LDP RDF Sources (LDP-RS) on top of an existing application, say Bugzilla. As a client implementer, I may have a simple JavaScript application that consumes and produces JSON-LD. I want to have a way to programmatically provide the end-user with a simple form to create new resources and also a way to potential auto-prefill this form based on data from current context.

LDP defines some behavior when a POST fails to a ldp:Container, by outlining expected status codes and additional hints that could be found in either the response body of the HTTP POST request or a response header (such as: Link relation of "http://www.w3.org/ns/ldp#constrainedBy". A client can proactively request headers (instead of trying the POST and it fails) by performing an HTTP HEAD or OPTIONS request on the container URL and inspecting the link relation for "constrainedBy". Typical constraints are: a) not necessarily based on type b) sometimes limited to the action of creation and may not apply to other states of the resource.

Current gap is whatever is at the end of the "constrainedBy" link, could be anything: HTML, OSLC Resource Shapes, SPIN. The LDP WG discussed a need to have something a bit more formalized and deferred making any recommendation looking to apply these requirements unto the Data Shapes work. Once it matures, and meets the requirements, LDP could provide a recommendation for it then.

PFPS: This appears to be similar to S11 and S29. However, this does talk about particular surface forms of the RDF graph, which may go beyond what constraints or shapes are supposed to do.

S32: Non-SPARQL based solution to express constraints between different properties

Created by: Anamitra Bhattacharyya

Consider the case of clients consuming RDF resources, interfacing with an LDP container, needs to work in a disconnected mode (the client being a Workers mobile device where the work zone has no connectivity). The client needs to allow workers to create entries locally in the device to mark completion of different stages of the work. These entries will get synched up with the LDP container at a later time, when the device gets connectivity back. Prior to that, when the client is in disconnected mode, the client software needs to perform a range of validations on the users entries to reduce the probabilty of an invalid entry.

In addition to the basic data type/required/cardinality "stand alone" validations, the client needs to validate constraints between different properties:

  1. start time less than end time
  2. If end time is not specified, the status of the "work" should be "In Progress"
  3. if status is "Complete" end time is required.

The client side does not have access to any triple store/LDP container. If these validations can be expressed in a higher level language which makes it simpler for clients to implement them constraint systems will be useful in more places.

S33: Structural validation for queriability

Created by: Eric Prud'hommeaux

Patient data (all data) is frequently full of structural errors. Statistical queries over malformed data leads to misinterpretation and inaccurate conclusions. Shapes can be used to sequester well-formed data for simpler analysis.

Consider a schema where a medical procedure should have no more than one outcome. Accidental double entry occurs when e.g. a clinician and her assistant both enter outcomes into the database:

 _:Bob :hadIntervention [
     :performedProcedure [ a bridg:PerformedProcedure ;
                           :definedBy [ :coding term:MarrowTransplant ; :location terms:Manubrium ] ];
     :assessmentTest     [ a bridg:PerformedObservation ;
                           :definedBy [ :coding term:TumorMarkerTest ; :evaluator <LabX> ] ;
                           :result    [ :coding term:ImprovedToNormal ; :assessedBy clinic:doctor7 ],
                                      [ :coding term:ImprovedToNormal ; :assessedBy clinic:doctor7 ]
                         ]
 ] .

The obvious SPARQL query on this will improperly weight this as two positive outcomes:

 SELECT ?location ?result (COUNT(*) AS ?count)
 WHERE {
   ?who :hadIntervention [
       :performedProcedure [ :definedBy [ :coding term:MarrowTransplant ; :location ?location ] ];
       :assessmentTest     [ :definedBy [ :coding term:TumorMarkerTest ] ;
                             :result    [ :coding ?result ] ]
                         ]
 } GROUP BY ?result ?location

(This is a slight simplification for the sake of readability. In practice, an auxilliary hierarchy identifies multiple codes as positive outcomes, e.g. term:ImprovedToNormal and term2:ClinicalCure, but the effect is the same as described here.)

Shapes can be used to select the subset of the data which will not introduce erroneous results.

ShExC schema

 my:Well-formedPatient {
     :hadIntervention {
         :performedProcedure { :definedBy { :coding IRI , :location IRI } } ,
         :assessmentTest     { :definedBy { :coding IRI , :evaluator IRI} ,
                               :result    { :coding IRI } }
     }
 }

Resource Shapes

 ex:Well-formedPatient a rs:ResourceShape ;
     rs:property [
         rs:occurs rs:Exactly-one ;
         rs:propertyDefinition :hadIntervention ;
         rs:valueShape [ a rs:ResourceShape ;
             rs:property [
                 rs:occurs rs:Exactly-one ;
                 rs:propertyDefinition :performedProcedure ;
                 rs:valueShape [ a rs:ResourceShape ;
                     rs:property [
                         rs:occurs rs:Exactly-one ;
                         rs:propertyDefinition :definedBy ;
                         rs:valueShape [ a rs:ResourceShape ;
                             rs:property [
                                 rs:propertyDefinition :coding ;
                                 rs:valueType shex:IRI ;
                                 rs:occurs rs:Exactly-one ;
                             ] ;
                             rs:property [
                                 rs:propertyDefinition :location ;
                                 rs:valueType shex:IRI ;
                                 rs:occurs rs:Exactly-one ;
                             ]
                         ] ;
                     ]
                 ] ;
             ] ;
             rs:property [
                 rs:propertyDefinition :assessmentTest ;
                 rs:valueShape [ a rs:ResourceShape ;
                     rs:property [
                         rs:occurs rs:Exactly-one ;
                         rs:propertyDefinition :definedBy ;
                         rs:valueShape [ a rs:ResourceShape ;
                             rs:property [
                                 rs:propertyDefinition :coding ;
                                 rs:valueType shex:IRI ;
                                 rs:occurs rs:Exactly-one ;
                             ] ;
                             rs:property [
                                 rs:propertyDefinition :evaluator ;
                                 rs:valueType shex:IRI ;
                                 rs:occurs rs:Exactly-one ;
                             ]
                         ] ;
                     ] ;
                     rs:property [
                         rs:occurs rs:Exactly-one ;
                         rs:propertyDefinition :result ;
                         rs:valueShape [ a rs:ResourceShape ;
                             rs:property [
                                 rs:occurs rs:Exactly-one ;
                                 rs:propertyDefinition :coding ;
                                 rs:valueType shex:IRI ;
                             ]
                         ] ;
                     ]
                 ] ;
             ]
         ] ;
     ] .

(HK: Slightly reformatted, dropped rs:name which isn't specified by ShExC example either).

SPIN

(HK: a SPIN syntax could look almost exactly like the Resource Shapes example above, because RS can be expressed in a SPIN-compliant way).

S34: Large-scale dataset validation

Created by: Dimitris Kontokostas

A publisher has a very large RDF Database (in terms of millions or billions of triples) and wants to define multiple shapes for the data that will be checked at regular intervals. To make this process effective the validation must be able to run within a reasonable time-span and the validation engine must be flexible enough to provide different levels of the violation result details. The different levels can range from specific nodes that are violating a shape facet, the success or fail of a shape facet or aggregated violations per shape facet, possibly along with an error prevalence.

Applying a shape in a large database can return thousands or millions of violations and it is not efficient to look at all erroneous RDF nodes one by one. In addition, many times all violations for a specific facet can be attributed to a specific mapping or source code function. An expected workflow in this case is that the maintainer runs a validation asking aggregated violations per shape facet along with a sample of (i.e. 10) specific nodes. Having the higher overview along with the sample data the maintainer can choose the order she will address the errors.

S35: Describe disconnected graphs

Created by: Arthur Ryman

Response to ISSUE-19

This user story is motivated by Linked Data and how information resources are created (e.g. via HTTP POST) or modified (e.g. via HTTP PUT). In these situations, the body of the HTTP request has an RDF content type (RDF/XML, Turtle, JSON-LD, etc.). The server typically needs to verify that the body of the request satisfies some application-specific constraints. If the request does not satisfy the constraints them it will fail the request and respond with 400 Bad Request or some similar response.

This user story draws attention to the fact that RDF content is in general a graph. The concept of RDF graph is defined in http://www.w3.org/TR/rdf11-concepts/#section-rdf-graph. A general RDF graph may not be connected and in fact disconnected RDF graphs do appear in real-world Linked Data specifications. Therefore, the output of this workgroup must support the description of constraints on general RDF graphs, connected or not.

Response to ISSUE-18

Some of the proposed solutions (Resource Shapes, ShEx, SPIN) appear to have an implicit assumption that the only RDF graphs of interest to this workgroup are like programming language data structures in the sense that there is a distinguished root node which is the subject of triples that define either literal properties or links to other subjects, which may in turn have literal properties or links to further subjects, or so forth. The implication is that all the nodes of interest are connected to the root node. Therefore, these proposals are incapable of describing disconnected graphs. The point of this user story is to provide evidence that disconnected graphs are of interest. It also attempts to make the point that the output of this workgroup should be applicable to general RDF graphs and not just some subset of graphs that follows some popular design pattern.

The example is taken from a specification related to access control. A conformant access control service must host an access control list resource that supports HTTP GET requests. The response to an HTTP GET request have a response body whose content type is application/ld+json, i.e. JSON-LD. An example is given below. In this example, there is a distinguished root node, i.e. the node of type acc:AccessContextList, but it is not connected to the other nodes of interest, i.e. the nodes of type acc:AccessContext.

An informal specification for valid RDF graphs is as follows: "Let X be the URI of an access control list information resource. Its RDF graph must must contain X as a resource node. X must have type acc:AccessContextList. X must have a string-valued dcterms:title property and a string-valued dcterms:description property. In addition, the graph may contain zero or more other resource nodes (URIs) of type acc:AccessContext. Each of these other nodes must have a string-valued dcterms:title property and a string-valued dcterms:description property. The graph may contain other triples."

This user story does not propose that a shape language must be able to distinguish between connected and disconnected graphs.

Original Text of User Story

In general, the RDF representation of an information resource may be a disconnected graph in the sense that the set of nodes in the graph may be partitioned into two disjoint subsets A and B such that there is no undirected path that starts in A and ends in B. The shape language must be able to describe such graphs. For example, consider the following JSON-LD representation of the Access Context List resource specified in OSLC Indexable Linked Data Provider Specification V2.0:

{
  "@context": {
    "acc": "http://open-services.net/ns/core/acc#",
    "id": "@id",
    "type": "@type",
    "title": "http://purl.org/dc/terms/title",
    "description": "http://purl.org/dc/terms/description"
  },
  "@graph": [{
     "id": "https://a.example.com/acclist",
     "type": "acc:AccessContextList"
    }, {
     "id": "https://a.example.com/acclist#alpha",
     "type": "acc:AccessContext",
     "title": "Alpha",
     "description": "Resources for Alpha project"
    }, {
     "id": "https://a.example.com/acclist#beta",
     "type": "acc:AccessContext",
     "title": "Beta",
     "description": "Resources for Beta project"
  }]
}
 

There is no path from the acc:AccessContextList node to either of the acc:AccessContext nodes. There is an implicit containment relation of acc:AccessContext nodes in the acc:AccessContextList by virtue of these nodes being in the same information resource. However, the designers of this representation were attempting to eliminate clutter and appeal to Javascript developers, so they did not define explicit containment triples.

Question (ericP): Is this an example?

S36: Support use of inverse properties

Created by: Arthur Ryman

In some cases the best RDF representation of a property-value pair may reuse a pre-existing property in which the described resource is the object and the property value is the subject. The reuse of properties is a best practice for enabling data interoperability. The fact that a pre-existing property might have the opposite direction should not be used as a justification for the creation of a new inverse property. In fact, the existence of both inverse and direct properties makes writing efficient queries more difficult since both the inverse and the direct property must be included in the query.

For example, suppose we are describing test cases and want to express the relations between test cases and the requirements that they validate. Further suppose that there is a pre-existing vocabulary for requirements that defines the property ex:isValidatedBy which asserts that the subject is validated by the object. In this case there is no need to define the inverse property ex:validates. Instead the representation of test case resources should use ex:isValidatedBy with the test case as the object and the requirement as the subject.

This situation cannot be described by the current OSLC Shapes specification because that specification has a directional bias. OSLC Shapes describe properties of a given subject node, so inverse properties cannot be used. The OSLC Shape submission proposes a possible solution. See http://www.w3.org/Submission/shapes/#inverse-properties

[HK: Thanks for this story - this is common indeed. You propose a flag "isInverse" on oslc:Property but I think this isn't the best solution as the facets for an inverse property are different from those in the forward direction (e.g. they can only be object properties so all datatype facets don't apply). Instead, I would introduce a new system property :inverseProperty in addition to :property.]

Arthur: @HK, I agree that :inverseProperty is better than adding :isInverse to :property. Since :isInverse would be an extension to the OSLC shape spec, there is a danger that some clients would ignore it and do the wrong thing silently. It is better to have them complain that :property is missing.

S37 Defining allowed/required values

by Karen Coyle

The cultural heritage community has a large number of lists that control values for particular properties. These are similar to the DCMItypes, but some are quite extensive (>200 types of roles for Agents in relation to resources). There is also a concept of "authorities" which control the identities of people, places, subjects, organizations and even resources themselves. Many of these lists are centralized in major agencies (Library of Congress, Getty Art & Architecture Archive, National Library of Medicine, and national libraries throughout the world). Not all have been defined in RDF or RDF/SKOS, but those that have can be identified by their IRI domain name and pattern. Validation tools need to restrict or check usage according to the rules of the agency creating and sharing the data. Some patterns of needed validation are:

1) must be an IRI (not a literal)

2) must be an IRI matching this pattern (e.g. http://id.loc.gov/authorities/names/)

3) must be an IRI matching one of >1 patterns

4) must be a (any) literal

5) must be one of these literals ("red" "blue" "green")

6) must be a typed literal of this type (e.g. XML dataType)

7) literal must have a language code

Some of these are conditional: for resources of type:A, property:P has allowed values a,b,c,f.

S38 Describing and Validating Linked Data portals

by Jose Emilio Labra Gayo

status: Approved on 5 February 2015 Telecon

A small company is specialized in the development of linked data portals. The contents of those portals are usually from statistical data that comes from Excel sheets and can easily be mapped to RDF Data Cube observations.

The company needs a way to describe the schema of the RDF graphs that need to be generated from the Excel sheets which will also be published as an SPARQL endpoint. Notice that those linked data portals could contain observations which will usually be instances of qb:Observation but can contain different properties.

Some constraints could be, for example, that any observation has only one floating point value, or that any observation refers to one geographical area, one year, one indicator and one dataset. That those datasets refer to organizations and those organizations have one rdfs:label property in English, another in French, and another in Spanish, etc.

In this context, the company is looking for a solution that can be easily understood by the team of developers which are familiar work with OO programming languages, relational databases, XML technologies and some basic RDF knowledge, but they are not familiar with other semantic web technologies like SPARQL, OWL, etc.

The company also wants some solution that can be published and understood by external semantic web developers so they can easily know how to query the SPARQL endpoint.

There is also a need that the solution can be machine processable, so the contents of the linked data portal can automatically be validated.

Finally, the company would like to compare the schemas employed by the different linked data portals so they can check which are the differences between the RDF nodes that appear in those portals and they can even create new applications on top of the data aggregated by those portals.

The company would also like to promote third party companies to be able to reuse the data available in those data portals so there could be third-party applications on top of them which could, for example, visualize or compare the different observations, create faceted browsers, search engines, etc. To that end, those third party companies need some way to query the schemas available in those partals and build those applications from those schemas.

S39 Arbitrary Cardinality

by Eric Prud'hommeaux

status: Approved on 26 March 2015 Telecon

Some clinical data requires specific cardinality constraints, e.g.

  • zero or one (optional) birth date.
  • zero or more lab tests.
  • one active patient marker.
  • one or more emergency contact.
  • two biological parents.

S40 Describing Inline Content versus References

status: Approved on 30 April 2015 Telecon

Created by: Karen Coyle

This example is from the library world. Many of the object values that we use are URIs that point to nationally or internationally shared lists (of person names, of subjects, of document types....). The URIs are carried in the instance data, but the data referenced by the URI (the graph with that URI as the subject) is either online, or stored locally in a separate file (with a GET capability). In some cases the URI must be de-referenced to perform validation; in other cases, de-referencing isn't needed or is considered too costly for a low-value property. If nothing else, we need to be able to indicate which URIs need de-referencing.

Previous Version

Created by: Arthur Ryman

As a shape author I want to be able to describe some properties as being remote references to other documents as opposed to describing content of the current document so that users and tools may use this information without incurring an obligation for a validator to actually validate the remote documents.

A web document may link to other web documents. For example, a bug report may link to failing test cases, e.g. using the property ex:relatedTestCase. Suppose that there is a shape for bug reports and a shape for test cases. The shape for bug reports should state that documents referred to by ex:relatedTestCase links are described by the test case shape since this information is useful for query builders and other tools. However, when validating that a bug report satisfies the constraints stated in the bug report shape, the validator should not be required to GET the test cases since that could be costly, require authentication, etc.

Suppose an RDF graph contains a triple (S, P, O) where O is the URI of a resource. Sometimes O is itself the subject of other triples, i.e. these other triples are inline in the graph, and sometime it is not, i.e. O is only a reference to another graph.

For example, consider an RDF graph that describes a table of data. The graph contains nodes that correspond to the table, its rows, and its cells, i.e. the contents of the table is inlined in the graph.

In contrast, suppose an RDF graph describes a failed test case and it contains a triple that links the failed test case to a bug report, but no further information about the bug report is contained in the graph. The URI of the bug report is a reference to another graph.

In the case of a reference, it is up to the application to associate a graph with a URI. In Linked Data this association is done by sending an HTTP GET request for the URI. In a SPARQL RDF dataset, the association may be done by using the URI as the name of a named graph in the dataset. The specific mechanism used to associate a referenced URI with an RDF graph should be outside the scope of this working group.

(FYI, The OSLC Resource Shape submission describes this situation using the property oslc:representation which has the allowed values oslc:Inline, oslc:Reference, and oslc:Either.)

  • Comment: I read this as saying that client and server won't be able to agree on a method of establishing the graph where the description of a referenced resource resides when oslc:Reference is used. To me this means that oslc:Reference is mostly useless for validation, and seems only useful as human-readable documentation. Is that correct? —Richard
  • (Arthur) @Richard, Not totally useless. oslc:Reference says that there SHOULD NOT be additional triples about the referenced resource in the graph. It would therefore be an error a warning or informational message if the graph contained additional triples. A validator could detect this. For example, suppose a graph described a collection of things (e.g. a Linked Data Platform Container) and each member of the collection was given by its URI, but had no other properties in the graph. Or suppose the graph described a bug report and had a set of links to related bugs, but did not contain any other properties of the related bugs. In terms of expressive power, SHACL could achieve this by giving the sh:valueShape of each linked resource, and not including any properties in that shape.
    • Objection: If the spec says there SHOULD NOT be any additional triples, then the presence of additional triples is most definitely not an error. That is the difference between MUST NOT and SHOULD NOT. Also, what exactly is an “additional triple”? If a SHACL processor is supposed to detect and act on such triples, then this notion requires a proper definition (which, I presume, would be something akin to closed shapes). I would have no objection if the only conformance requirement for a referenced shape is that SHACL processors MAY check the reference using some unspecified mechanism. —Richard
    • (Arthur) @Richard you are correct. The processor should report a lower severity condition, e.g. a warning or an informational message. I am a big fan of robustness. I have updated the text.
  • Comment: (pfps) If there are no constraint aspects of this story then it needs to be connected to one of the other purposes of SHACL.
  • (Arthur) @pfps There are constraint aspects. For Inline content the referenced node may have a Shape associated with it and that Shape must be validated on the content present in the graph. For Referenced content, the content must not be in the graph.
  • Comment: (pfps) I'm still confused as to what it supposed to be happening. Is it something like if the there is special tag then there should not be (other) information on a node?
  • (Arthur) @pfps Yes. Suppose G is a graph, N is a node in G, S is a shape, and that we are checking if (G,N) satisfies S. Suppose that S describes a property P as being "inline" and as pointing to a node that has shape S'. Suppose that G contains the triple (N,P,N'). Then (G,N') must satisfy S', i.e. the triples required by S' must also be in G. However, if S describes P as being a "reference", then it is not required that (G,N') satisfy S'. In this case S asserts that if you get the graph G' that is associated with N' (e.g. by HTTP GET or the local filesystem or as a named graph in a dataset) then (G',N') satisfies S'. There is no requirement that a processor actually get G' and validate it, although it might choose to do so. The ability to annotate P as "reference" is useful when describing a Linked Data API since it tells API users what the application expects but does not require the application to actually validate the referenced resource since that might be expensive. If an API user lies about the referenced resource then this might lead to a failure later, but the application should be designed robustly because the Web is not guaranteed to be globally consistent.
  • Comment (pfps): The original description had nodes be inline or reference. Now this appears to have switched to properties being inline or reference. Which is it?
  • Comment (pfps): So inline has no effect, and the sole effect of reference is to turn off checking?
  • Comment (pfps): Where did the requirement that a reference resource have no local information go?
  • Comment: (kc) I'll toss in an example from the library world. Many of the object values that we use are URIs that point to nationally or internationally shared lists (of person names, of subjects, of document types....). The URIs are carried in the instance data, but the data referenced by the URI (the graph with that URI as the subject) is either online, or stored locally in a separate file (with a GET capability). In some cases the URI must be de-referenced to perform validation; in other cases, de-referencing isn't needed or is considered too costly for a low-value property. If nothing else, we need to be able to indicate which URIs need de-referencing. I think this fits with Arthur's story. If not... nevermind.
  • (Arthur) @kc Yes, this is exactly what I'm talking about.
  • (Arthur) @pfps, Inline has an effect on validation. It states that the object of a link is expected to have triples associated with it in the current graph, i.e. those triples are inlined in the graph. If there is a shape associated with the object, then the associated triples must satisfy the shape. Reference also has an effect an validation. Reference states that the triples associated with the object are not expected to be in the current graph. Therefore, any shape associated with the object should not be validated on the current graph. If there are triples associated with the graph, then they are "unknown" content.
  • Comment (pfps): But that's just saying that inline does the normal thing, i.e., it has no effect, and reference does that weird thing. So, inline has no effect.
  • Comment (pfps): Karen has proposed a story that somewhat matches up with the original non-story. Is Karen's input going to be the story? I'm fine with that.
  • Comment (Arthur): I've moved Karen's example into the main text as the story and have made my example read more like a user story.

S41: Validating schema.org instances against model and metamodel

Created by: Richard Cyganiak

Status: Approved on 26 March 2015 Telecon

This user story focuses on the validation of schema.org instances against the constraints expressed in the schema.org model and metamodel. (The related user story, “S23: schema.org Constraints”, as well as Google's submission to the workshop, focus on domain-specific constraints attached to specific schema.org classes and properties, and not on the model and metamodel.)

A processor for our validation language should be able to accept a schema.org instance as well as the schema.org model, expressed in an RDF syntax, as inputs (perhaps as separate named graphs), and validate the instance against the model.

  • domainIncludes/rangeIncludes: In schema.org, properties can be associated with multiple types via the “domainIncludes” and “rangeIncludes” properties. The semantics is that the domain/range consist of the union of these types (rather than the intersection, as with the “domain” and “range” properties in RDFS). Validation requires that the subject and object of a triple can be compared against a set of types given in the model graph, and a validation error would be raised if the subject/object is not an instance of one of these types, or of one of their subtypes.
  • Datatypes and plain literals: In schema.org, properties may be associated with datatypes, but literals in instance data are always plain (string) literals. In other words, a property may be typed as a date property, but the date would be given as a plain literal, not as a xsd:date typed literal. Examples of named datatypes in schema.org include: ISO 8601 dates and datetimes; xsd:time; boolean “True” and “False”; integers. For validation, the language should be able to make use of annotations on the properties. For example, if we have { :thing schema:date "value" }, it should be possible to write a validation rule that depends on a “rangeIncludes” annotation on the schema:date property. As each named datatype is used many times throughout the model, it would also be good if the regular expression (or similar mechanism) for the datatype wouldn't have to be repeated for each property that uses the datatype, but could be referred to by reference, or by rule.
  • Conformance levels: Processing of schema.org by the major search engines tends to be quite permissive. For example, often, where an “Organization” instance is expected according to the model, a “Text” literal with the organization's name is sufficient. This could be treated as a warning/notice. Also, some literal properties contain markup recommendations such as for the “price” property: Putting “USD” into a separate currency property is preferred over sticking “$” into the numeric price literal. Again, values like “$99” could be treated as warnings.

S42: Constraining RDF graphs for better mapping to JSON

Created by: Richard Cyganiak

Status: Approved on 2 April 2015 Telecon

In client-side application development and in integrating between RDF-based systems and JSON-based APIs, the problem of mapping between the RDF data model and the JSON data model recurs.

In the unconstrained RDF data model, there are too many variations to map arbitrary RDF graphs cleanly to JSON. By selecting an RDF vocabulary that covers the desired JSON structure, and using Shapes to express constraints over the vocabulary, the mapping could be made sensible and predictable. In other words, Shapes could be used to constrain RDF graphs in a way that gives them a well-defined isomorphic mapping to some JSON model. As a side effect, we also get better UI for these constrained RDF graphs.

This gives rise to a number of requirements:

  • MaxCardinality 1: Properties with a maximum cardinality of 1 can be mapped easily to keys in JSON objects.
  • Support for RDF lists: Ordered lists are a standard feature of JSON. A clean mapping requires the ability to declare that the value of a property must be an rdf:List, and the ability to place constraints on the members of the list (e.g., be of a certain class or have a certain shape). This has UI advantages too. Knowing that an RDF property is an ordered multi-valued property calls for specific UI widgets. (Partially already covered in S26)
  • Maximum one string literal per language: For i18n-capable applications, “one string per language” is an important kind of value. In RDF, this shows up simply as a multi-valued string property. An example is skos:prefLabel. In JSON, the natural representation of this is an object with language codes as keys and the string literals as values. This pattern has special support in JSON-LD for example (“language maps”). Again, if we can declare this constraint on a property, we can use better UI widgets and better API access.

Arthur Ryman: Richard, I agree that JSON is important. However, there are already a couple of representations of RDF as JSON, including the W3C recommendation JSON-LD. This requirement sounds like yet a third representation, with the difference that it only applies to a subset of RDF. Can you precisely define this subset of RDF? Is this really the way to go, or would it be better to use an explicit RDF-JSON mapping language along the lines of R2RML? fyi, i IBM I worked on an XML-RDF mapping language inspired by R2RML.

RC: Arthur, this story is not about defining a representation of RDF in JSON, and it is not about JSON-LD. If you have an existing JSON model, and want to have a mapping between that model and RDF (using whatever mapping approach), then certain constraints are going to apply to the resulting model on the RDF side, because the natural constraints of the JSON data model will carry through that mapping process. This story is about being able to express these constraints on the RDF side. This would be orthogonal to something like “JSON-R2RML”; the story is that I want to be able to represent the constraints on the RDF output of a “JSON-R2RML” mapping (plus I want a two-way mapping, while R2RML-style mappings are only one way, to RDF).

AR: Richard, so conceptually the shape would be generated from the mapping. <tangent>You are of course the R2RML authority, but although R2RML looks one-way, don't you feel it is actually sort of two-way since it supports conversion of SPARQL into SQL?</tangent>

RC: <tangent>If you can transform data one way, then you can always transform queries the other way, assuming query languages of appropriate expressivity are available on both ends. But R2RML isn't designed to allow transforming data both ways. You can't do SPARQL Update over R2RML.</tangent>

S43: Using Property Paths for Property Value Comparison

Created by: Simon Steyskal

Status: TBD

One of the big challenges in distributed environments is to keep several local schemas that model the same domain (probably with different view-points) in sync with each other. To keep schemas in sync, there usually exists a weaving model/schema that links concepts of its LHS to concepts of its RHS, i.e. the weaving model takes care of linking overlapping concepts together. If now one of the local schemas performs changes that affect other schemas too, the weaving model has to detect those inconsistencies.

For example, consider following ontologies o1 and o2 with their weaving model wo that links individuals of type o1:Cable to ones of type o2:Connection, and their respective indiduals:

LHS (o1) Weaving Model (wo) RHS (o2)
o1:CableShape
 a sh:Shape ;
 sh:property [
   sh:predicate o1:bandwith;
   sh:datatype xsd:int;
   sh:minCount 1 ;
   sh:maxCount 1 ;
 ] .  
wo:Cable2ConnectionShape
 a sh:Shape ;
 sh:property [
   sh:predicate wo:LHS;
   sh:valueType o1:Cable;
   sh:minCount 1 ;
   sh:maxCount 1 ;
 ];
 sh:property [
   sh:predicate wo:RHS;
   sh:valueType o2:Connection;
   sh:minCount 1 ;
   sh:maxCount 1 ;
 ] .  
o2:ConnectionShape
 a sh:Shape ;
 sh:property [
   sh:predicate o2:speed;
   sh:datatype xsd:int;
   sh:minCount 1 ;
   sh:maxCount 1 ;
 ] .  
o1:Cable1 o1:bandwith 42 .
o1:Cable2 o1:bandwith 10 .

wo:Cable2Conn1 
   wo:LHS o1:Cable1 ;
   wo:RHS o2:Conn1 .
wo:Cable2Conn2 
   wo:LHS o1:Cable2 ;
   wo:RHS o2:Conn2 .

o2:Conn1 o2:speed 42 .
o2:Conn2 o2:speed 11 .

The constraints imposed on wo:Cable2ConnectionShape currently only take care of the value types and cardinality of LHS/RHS. But in some situations we might also want to state that certain values of properties must be eq/leq/geq/neq/... than others of the same focus node (e.g. in the previous example: o1:bandwith and o2:speed of cables/connections linked via individuals of type wo:Cable2ConnectionShape must have the same value). This would require the possibility to refer to that values instead of using "fixed" ones.

Maybe like this:

wo:Cable2ConnectionShape
 a sh:Shape ;
 sh:property [
   sh:predicate wo:LHS;
   sh:valueShape [
     sh:property [
       sh:predicate o1:bandwidth ;
       sh:hasValue this/wo:RHS/o2:speed ;
     ]
   ];
   sh:minCount 1 ;
   sh:maxCount 1 ;
 ];
 sh:property [
   sh:predicate wo:RHS;
   sh:valueShape [
     sh:property [
       sh:predicate o2:speed;
       sh:hasValue this/wo:LHS/o1:bandwidth ;
     ]
   ];
   sh:minCount 1 ;
   sh:maxCount 1 ;
 ] .  

where e.g. this/wo:LHS/o1:bandwidth refers to the value of o1:bandwidth of the individuals linked via wo:LHS of the focus node (I included this to allow self-referencing for S44). This functionality would allow to detect the inconsistency of individual wo:Cable2Conn2 where a bandwidth of 10 is linked to a speed of 11.

Note: I'm aware of the fact that this use case can be realized using e.g. SHACL Templates. Just wanted to start a discussion on whether this should be included into SHACL core.

Handling zero-or-more Results

I guess we can take the pragmatic approach of saying that:

  • if all results of the property path fulfill the constraint, the constraint holds
  • if not all results of the property path fulfill the constraint, the constraint does not hold
  • if no results of the property path fulfill the constraint, the constraint does not hold
  • if no results were obtainable, the constraint does not hold

S44: Including sh:notEqual as additional Datatype Property Constraint

Created by: Simon Steyskal

Status: TBD

This use case, i.e. the idea of including sh:notEqual as additional datatype property constraint, was primarily motivated by the comparison of functionalities of both OCL and SHACL.

Sometimes it might be useful/easier to state what values are not allowed for a certain property rather then defining the opposite.

E.g. a person cannot be a parent of itself:

ex:PersonShape
 a sh:Shape ;
 sh:property [
   sh:predicate ex:parent;
   sh:notEqual this;
 ] .  

or children of a person cannot be their parents:

ex:PersonShape
 a sh:Shape ;
 sh:property [
   sh:predicate ex:children;
   sh:notEqual this/ex:parent;
 ] .  

Note: See S43 for a description of property paths for value comparison.

S45: Linked Data Update via HTTP GET and PUT

Created by: Arthur Ryman

Status: Approved on 4 June 2015 Telecon

As a client of a Linked Data application, I need to know the constraints on the data so I can update resources. The data is in an RDF format. I retrieve the data via HTTP GET, edit it, validate it, then modify the resource via HTTP PUT. I need to know how to validate the data before I send the HTTP PUT request.

For example, information about the constraints that the application enforces could be provided by linking the data to the shape via a triple in the data. If the data IRI is X and the shape IRI is Y then a link such as (X sh:hasShape Y) would work. Y could be a resource hosted anywhere on the web.

  • Comment (KC): This could interact with the DCMI concept of application profiles [1], which are sets of constraints that can be exchanged between data providers and consumers. One DCMI community member states that "I'd like to go one step further, though, and say that the server can simply restrict to a specific profile, but also that clients and servers can negotiate which profile to use."
  • Reply (AR): @KC your comment sounds like 1) you understand this user story, and 2) you believe it may also be useful for DCMI application profiles. I believe the situation you are describing is not exactly the same though since you suggest that the client and server could negotiate which application profile to use. If we assume that an application profile is like a shape, then the server would have to advertise a set of available shapes. The client would have to indicate the preferred shape in the GET request by some means, e.g. an HTTP header or query parameter. The server would respond with the preferred RDF content that included a link to the actual shape.

Comment (pfps): I voted for this on the assumption that user stores are stories that are somehow related to SHACL, not that SHACL has to cover all aspects of the user story. If this assumption is not correct, I would need more information about this story to vote for it.

S46: Software regression testing with SHACL

Created by: Dimitris Kontokostas

Status: Approved on 4 June 2015 Telecon

As an RDF software & data developer I need to define constraints for the data I generate with my software. What I am interested to see is which constraints succeed or fail and store the results in a database. When a previously successful test fails is generally an indication of a software regression.

In this cases I am not interested to store detailed violation instances as most times I work with sample or mock data that are subject to change and cannot be directly comparable. What can instead be persistent are the actual constraints (shapes or shape facets) and I need a standardized way to store the status for each constraint as true/false or with a additional metadata (e.g. error count or prevalence) for a specific validation.

S47: Clinical data constraints

Created by: Eric Prud'hommeaux

Status: Approved on 4 June 2015 Telecon

Clinical information systems reuse general predicates for observations and relationships between observations. For example, a blood pressure is an observation with two constituant observations: systolic and diastolic Likewise, an APGAR observation is a constellation of nine observations. Definition of these data elements requires repeated constraints on the same predicate, analogous to OWL qualified cardinality constraints.

<X> a :Observation;
      Observation:interpretation [
         CodeableConcept:coding [ Coding:system <http://hl7.org/fhir/v2/0078>; Coding:code "L"^^fhir:code; ];
         CodeableConcept:text "low";
      ];
      Observation:identifier <blahblahblahX>;
      Observation:subject [ ResourceReference:reference "Patient/example"; ];
      Observation:related [
         Observation:related_type "has-component"^^fhir:code;
         Observation:related_target [
            ResourceReference:reference "http://acme.org/ehr/observations/34252345234-s";
         ]
      ];
      Observation:related [
         Observation:related_type "has-component"^^fhir:code;
         Observation:related_target [
            ResourceReference:reference "http://acme.org/ehr/observations/34252345234-d";
         ]
      ]
.

<http://acme.org/ehr/observations/34252345234-s>  a :Observation;
     # CT code for systolic
     # value
.

<http://acme.org/ehr/observations/34252345234-d>  a :Observation;
     # CT code for systolic
     # value
.

Another common case involves restrictions on common data elements involving components, e.g. a prescription with requested substance administration and optoinal diagnostic indictors.

Example in O-RIM:

eg:BobzPresription
  rim:Act.ClassCode "ACT" ;
  rim:Act.moodCode "RQO" ;
  rim:Act.outboundRelationship
    [ rim:ActRelationship.type "COMP" ;  # 1..1 requested substance administrations
      rim:ActRelationship.target
        [ rim:Act.classCode "SBADM" ;
          rim:Act.moodCode "RQO" ;
          # material identifier by code and codeSystem.
          # dosage instructions.  E.g. 10mg, 3 times per day
        ]
    ],
    [ rim:ActRelationship.type "RSON" ;  # 0..* diagnostic indicators
      rim:ActRelationship.target
        [ rim:Act.classCode "OBS" ;
          rim:Act.moodCode "EVN" ;
          # E.g. code/system for diagnosis of diabetes
        ]
    ]
.

This is an elaborate ruse to introduce a requirement for multi-occurrance.

S48: Capturing precise business practices

Created by: Eric Prud'hommeaux Status: Proposed

While many shared ontologies are general enough to meet many use cases, reallistic descriptions of workflows are precise enough to identify exact choices in the input. This calls for operators like OneOf to identify that for a given input, a program will respond to exactly one choice in a disjunction. For example, a search engine scriping RDFa descriptions of event information will display store hours either from specialized restaurant hours or more general event hours, but not both.

Document formatting examples