W3C Logo

RDF Validation Workshop Shared Whiteboard

This is a lightly-htmlized copy of the shared whiteboard used during the workshop. This document is for posterity; there is no expectation that it will be maintained beyond having anchors added for easy reference.

Table of Contents:

Use Cases

Description versus and Validation

Services:
Data guidance for REST APIs  
Arthur: Generate REST API Documentation - see tables in OSLC specs for expected level of detail
Orient users as to what form of data would be considered "valid".

Use case: Validation of a given RDF Dataset (Evren Sirin)
Alice has an RDF database with several named graphs and triples. She wants to make sure that the triples in this RDF dataset is valid with respect to a set of integrity constraints. She also wants to make sure that any changes (insert/delete) to the database will not cause the RDF dataset become invalid.

   * Help debug generators/exporters of RDF data.
ha
   * Auto-fill to prompt users for valid input.
   * Associate patterns with semantic actions (à la Yacc) to execute code based on input data.

Data Profile:
Save data consumers from malformed input.
Arthur: Provide metadata for query builders (available properties and values)
Arthur: Generate valid test data - I want to generate large volumes of realistic data so I can stress test and performance test an applicationh
Inform users of attributes they can search/match on (esp. for REST interfaces or contextual search forms for web interfaces) [Mark Harrison]
 - What is in my dataset/SPARQL endpoint?
 - What can others (not) add to my dataset?
 Capture the consensus of a community about the content of data
 

Use Case: Easy Data Generation by non-experts (PhilA, channeling Paul Davidson)
Paul is working on a project that requires data about the different departments and services in his local authority. In order to gather the data, he builds an online tool for his colleagues to use. It presents users with a series on input forms, each of which includes some pre-populated lists of possible answers to some questions. The output conforms to a particular data model, the definition of which is encoded in a discrete file which can be shared. 

Use case: Data Portal (PhilA) (aka Linked Data Platform SteveS)
Vassilios is responsible for a portal through which a diverse collection of documents is made available. Documents may be hosted in the portal or elsewhere but in order for the documents to be discoverable, the metadata about each one must include certain fields, the values of some of which must be from a defined SKOS Concept Scheme. Vassilios needs to be able to confirm that metadata about each document collection conforms to the data model before including it in the portal.
 * Variant of "Data Portal": Vassilios wants to map data from a diversity of sources to a common target that captures a meaningful subset of information common to the sources.
 * description of what one GETs
 * description of what one PUTs

    Helping non-experts participate in application development
Generic clients for REST APIs

Use Case: Versioning (PhilA channeling Dave Reynolds)
Dave creates data according to a defined profile that includes reference to an external SKOS Concept Scheme, hosted by a third party. At a later date, his data is apparently deemed invalid simply because the external SKOS Concept Scheme has been updated. It's important therefore to be able to provide version information about the description.

Use Case: Model Driven Architecture
Mayo Clinic has published a terminology access and update specification.  The abstract representation is UML, and a semi-automated transformation has been used to create an XML Schema and WADL implementation.  The user community wants to be able to consume and update the data using RDF syntax. Mayo needs to publish a specification that describes  the "shape" of the RDF that the users can expect and/or what is considered acceptable input.

Use Case: RDF Content Specification
The Ontology Meta Vocabulary specification describes a set of ontologies and a structure that describes how these ontologies should be used to provide a metadata set about an ontology.  The OMV currently uses UML models and text to describe what constitutes a "valid" description.  How this information is transformed into RDF and how validation test cases are generated are left as an exercise to the reader.  A formal, machine processable language is needed.

Use Case: Library catalogs (Tom)
The library community has strong ideas about the content of catalog data.  For example, Functional Requirements for Bibliographic Records (FRBR) conceptualizes a book at varying levels of generalization ranging to an abstract Work to a specific Item on the shelf.  Each level is described with specific properties. FRBR levels have been defined as disjoint OWL classes, with properties tied to these classes as domains and ranges and with exact cardinality constraints on relationships between classes, but this strong semantic commitment compromises the usefulness of the properties and classes in an LOD context.  The requirements for quality control not actually met with such a descriptive ontology could be met with unconstrained properties and non-disjoint classes used with validatable constraints.

Use Case: Combine selected constraints from multiple existing ontologies to instantiate a new validation scenario for my new triple store / application.- Tim Cole

LDP User Stories : https://dvcs.w3.org/hg/ldpwg/raw-file/default/ldp-ucr.html#user-stories SteveS
LDP Use Cases : https://dvcs.w3.org/hg/ldpwg/raw-file/default/ldp-ucr.html#use-cases  SteveS


Use Case: Statistical data (Jose Labra)
Statistical data. Publish and validate statistical data. Describe that a dataset complies with some vocabularies like RDF Data Cube or Computex. Validate the shape of the data and some computations. Ensure that the data follows a given pattern and sign the data. This is important for some organization like Government Data. Track the computations expressed in RDF to their sources. Example. A value is the mean of other values. 
More ambitious goal, validate streams of RDF data that are continuously published by government agencies. 

Use Case: Government Agency? (Jose Labra)
Needs to publish RDF data and ensure that it is valid (adding signatures and so...). [Note: This use case may be repeated or included in other use cases like Data Portal]

Use Case: Test Before Submission (PhilA)
Nikos has created some data with the intention of submitting it for inclusion in the Bubbles platform. Before submitting, he wants to ensure that his data conforms to the data description published by the Bubbles operators.

Requirements

prioritized:

  1. declarative defintion of the structure of a graph for validation and description
  2. extensible to address specialized use cases
  3. there will be a mechanism to associate descriptions with data

categories of requirements:

WHAT - Types of constraints
1. Property paths: what properties are required?
19. Ability to express mandatory vs optional vs conditional attributes/predicates
2. Cardinality constraints
3. Disjointness
4. Are values within enumerated lists? (code values)
34. options / groups .. (what does this mean? - like in xml schema i guess:))  Isn't this same as #4?
Open vs closed world considerations?
14 Literal range constraints, including membership in a controlled vocabulary [see also 4]
15. URI pattern constraints (check for typos)
15bis. - check that HTTP URIs can be dereferenced, without HTTP 404 errors
16. Datatype computations: Arithmetic (sums, averages, etc.), date/time comparisons / calcs, etc.)
17. Validate a dataset against one (or more) profiles (related with [7])
  Example of profile: RDF Data Cube
20. Constraints should work with reasoning (a la entailment regimes in SPARQL 1.1)
23. Validate for uniqueness (of what?) analogous to XML identity constraints, another kind of cardinality, ...?-Tim Cole
TRANS - 25 Restrict addition of new S/P and/or O and various combinations (Bounds document)
TRANS - 28. read-only / system-only property - app can't change O, ie dc:created
32. Capable to constrain property values according to provenance
35. Handle RDF Collections. Max/min size, type of elements, etc.
36. Ability to handle closed (only this) and open (at least this) constraints
37. Handle Named graphs
38. Express negative constraints. Example. This type of resource must not have some property...
39. Validate the literal objects based on datatype
40. Validate the object (or subject) based on rdf:type of resource (like range)

HOW - Style
6. Transformable to and from UML (w/ possible loss)
8. Transformable to and from XML Schema (w/ possible loss)
9. Representable as RDF
5. human readability
5 1/4. And concise for common kinds of constraints
5½. machine readability (for publishing interface)
7. RDF validation against domain-specific models in some formats [related with No. 4-5 ]
11. Keep simple cases simple - declarative prefered
12. extensible (to do what?)
    17a.  Declare that a profile extends another profile (related with [12])
13. Transformation from object constraint language  (or other constraint languages) to SPARQL 
[related with No. 8, and 24 ]
21. Modular
24 Leverage on existing technologies whenever possible (XML Schema, SPARQL, OWL)
26 Ability to be used to drive user interface form presentation
29. re-use schemas/ontologies in different documents/graphs with different validation requirements (implies divorcing the identifier for the validation schema from the class schema).
30. Be able to be conditional on rdf:type  
33. by-ref or by:val properties
45. compatible with SPARQL
46. Ability to combine constraint profiles by reference

Misc:
18. computable - efficiently
22. Validation Results should be RDF with: message, level, ref to rule and faulty data
           22a. Leverage on EARL vocabulary?
           22b. Identify graphs and validation rules (maybe use PROV and graph identifiers (see http://tw.rpi.edu/web/doc/mccusker2012ipaw )) 
           22c. Indication of severity level (errors vs warnings)

Posterity

-          RDF Val
-           
-          Basic Requirements
-          1. Property paths: what properties are required?
-          2. Cardinality constraints
-          3. Disjointness
-          4. Are values within enumerated lists? (code values) [Mark Harrison]
-          5. human readability
-          5½. machine readability (for publishing interface)
-          6. Transformable to and from UML (w/ possible loss)
-          7. RDF validation against domain-specific models in some formats
-          8. Transformable to and from XML Schema (w/ possible loss)
-          9. Representable as RDF
-          Open vs closed world considerations?
-          11. Keep simple cases simple - declarative prefered
-          12. extensible
-          13. Transformation from object constraint language  (or other constraint languages) to SPARQL
-          14 Literal range constraints, including membership in a controlled vocabulary
-          15. URI pattern constraints (check for typos)
-          15bis. - check that HTTP URIs can be dereferenced, without HTTP 404 errors [Mark H]
-          16. Datatype computations: Arithmetic (sums, averages, etc.), date/time comparisons / calcs, etc.)
-          17. Validate a dataset against one (or more) profiles (related with [7])
-            Example of profile: RDF Data Cube
-              17a.  Declare that a profile extends another profile (related with [12])
-          18. computable - efficiently
-          19. Ability to express mandatory vs optional vs conditional attributes/predicates [Mark H]
-          20. Constraints should work with reasoning (a la entailment regimes in SPARQL 1.1)
-          21. Modular
-          22. Validation Results should be RDF with: message, level, ref to rule and faulty data
-                     22a. Leverage on EARL vocabulary?
-                     22b. Use PROV and graph identifiers (see http://tw.rpi.edu/web/doc/mccusker2012ipaw ) 
-                     22c. Indication of severity level (errors vs warnings) [Mark H]
-          23. Validate for uniqueness (of what?)
-          24 Leverage on existing technologies whenever possible (XML Schema, SPARQL, OWL)
-           
-          25 Restrict addition of new S/P and/or O and various combinations (Bounds document)
-          26 Ability to be used to drive user interface form presentation [Mark H]
-          28. read-only / system-only property - app can't change O, ie dc:created
-          29. re-use schemas/ontologies in different documents/graphs with different validation requirements (implies divorcing the identifier for the validation schema from the class schema).
-          30. Be able to be conditional on rdf:type  
-          32. Capable to constrain property values according to provenance
-          33. by-ref or by:val properties
-          34. options / groups .. (what does this mean? - like in xml schema i guess:))
-          35. Handle RDF Collections. Max/min size, type of elements, etc.
-          36. Ability to handle closed (only this) and open (at least this) constraints
-          37. Handle Named graphs