RDF Validation Workshop: Practical Assurances for Quality RDF Data

Important Dates

7 July 2013:
Deadline for position papers

10 July 2013:
Acceptance notification sent

14 July 2013:
Program released

2 September 2013, noon ET:
Deadline (for lunches and name tags) for Registration

Host

W3C/MIT

Workshop Sponsors

Become a sponsor

While the Semantic Web has demonstrated considerable value for collaborative contributions to data, adoption in many mission-critical environments requires data to conform to specified patterns. This need for interface defintions spans domains. For instance, validation in a banking context shares many requirements with quality assurance of linked clinical data. Systems like Linked Open Data, which don't have formal interface specifications, share these validation needs. Development of standards and tools to meet these requirements can greatly increase the utility and ubiquity of Semantic Web data.

Most data representation languages used in conventional settings offer some sort of input validation, ranging from parsing grammars for domain-specific languages to XML Schema, RelaxNG or Schematron for XML structures. While the distributed nature of RDF affects the notions of "validity", tool chains need to be established to publish interface definitions and ensure data integrity.

This workshop combines discussion of use cases for data validation/interface defintion on the Semantic Web with development of technologies to enable those use cases.

Background

The open world requirements for Semantic Web vocabularies, particularly RDF Schema and OWL, lead to powerful data assertions which work with varying levels of inference and integration. These constraints have so far limited the expressivity of "schema" languages to mostly be useful for inference rather than validation. For instance, there is no standard way to assert that, for the purposes to populating a corporate directory, foaf:Persons must include a foaf:mbox or that for a health record, a person's height must be less than 193 centimeters.

Necessity has driven individuals, organizations and standards bodies to develop a variety of approaches, ranging from SPARQL ASK (example) queries to detailed description logic modeling in OWL (example). While experts in these tools may be able to easily develop such tool chains, the community as a whole, including potential Semantic Web users, will benefit greatly from validation standards and commodity tools which implement them.

While the Linked Open Data cloud keeps expanding, the inconsistency in quality of the data it is made of limits its use. Indeed, organizations cannot directly use the LOD cloud. Instead, they must shield themselves from all the inconsistencies that it contains by using a local replica which they have curated using custom made processes. The curating process is expensive, must be repeated every time the local replica is refreshed and does not benefit anyone else.

A standard way of validating RDF data would allow for a more streamlined, and therefore cheaper, curating process and make it possible to share validation rules for other to use. This could then be used to develop a corpus of validation schemas that could collectively be used to improve the LOD cloud.

Goals

The goal of this workshop is threefold:

Identify requirements, including trade-offs between expressivity and complexity.
Categorize applicable existing tools and standards, including those that aren't explicitly intended for validation.
Propose and ideally demonstrate tools which extend the state of the art, testing capabilities against identified requirements.

An accompanying Examples of RDF Validation document is intended to provide ideas and inspiration for the Workshop, but not intended to constrain the workshop scope.

The outcome of this workshop will be reported to the current related working groups, Linked Data Platform, RDF-WG, and may be used as input for chartering other work.

Workshop topics

Validation standards must address conventional requirements, as well as those brought by users who have so far not been able to adopt Semantic Web tools:

Structural Validation: presence or absence of particular predicates and graph structures.
Value Validation: testing values to see that they fit in a valid range.
and more: which the workshop will reveal.

Topics for position papers may include, but are not limited to:

usage scenarios, e.g. data deployment or input validation.
schema expressivity, e.g. SPARQL ASK compared to a grammar language like XML Schema or RelaxNG.
schema distribution.
distributed validation in collaborative environments.
performance.
management of schema evolution.

Expected participants

To ensure productive discussions, the Workshop will include sessions which are primarily technical, but grounded in business needs.

We invite representatives to submit papers that help us bring together knowledge in the following topic areas:

Use cases — areas where RDF is currently deployed or would be deployed with the advent of widely-available validation tools.
Existing solutions — uses of conventional technology and tools to address validation.
New approaches — novel ways to specify validation constraints or new tools to enforce them.