RDF validation and transformation using Shape Expressions

ESWC Tutorial proposal

Status of this document

This is a private document. Please do not publicize.

Abstract:

200 words maximum, for inclusion to the ESWC 2015 Web site.

Industrial, scientific and clinical uptake of RDF has been hampered by a lack of a high-level validation language analogous to XML Schema or JSON Schema. Shape Expressions provides an intuitive, high-level representation of an RDF graph structure. Based on patterns from regular expressions and RelaxNG Compact Syntax, ShExC is like a grammar for RDF data. Uses include:

This tutorial introduces the concepts behind RDF graph description and validation and the landscape of tools being used for validation. We will then cover use of existing ShEx implementations for:

Like the popular SPARQL by example tutorial, this tutorial includes step-by-step instructions with examples followed by exercises. Participants can download validation tools to use locally or use web-based interfaces like RDFShape or W3C ShEx Workbench.

Tutorial Description:

objectives of the tutorial and relevance to ESWC 2015; information about the scope and level of detail of the material to be covered; intended audiences; learning objectives; practical sessions.

Introduction:

RDF is growing in popularity for both data transfer (LDP) and data storage/recall (SPARQL). In both of these capacities, it is important to describe and verify conformance with a particular graph structure. While the Semantic Web is an environment where anybody can say anything about any topic, we still need to make sure that clinical, genetic, manufacturing, etc. databases capture data in a predictable way.

When we record or exchange data, programs or human operators are expected to synthesize and interpret data. In some cases, the data will not be stored in a generic triple/quad store but instead in a conventional relational database. In order to not mysteriously lose data, this additionally requires that not only does the data include a specified structure, but that all of the data be described by that structure.

Non-RDF data storage systems offer and rely on schemas both to offer quality assurance to users and to enable efficient storage and static query analysis for optimization. SQL's Data Definition Language completely constraints what may appear in an SQL database (with minor exceptions like some databases that don't ensure homogeneity in a column). XML use of W3C XML Schema and Relax NG typically involves validation on data creation and ingestion. Even JSON Schema is growing in popularity as that developer community recognizes the need for basic structural description.

RDF, and graph stores in general, don't demand the draconian restrictions like SQL, but operate more like XML, where the basic language allows any structural construct but specific applications impose further practical demands. In order to meet these needs for RDF, industry, academia, and standards bodies have devoted resources to the development of a standard, interchangeable and publishable standard for structural schema.

OSLC created Resource Shapes; Top Quadrant has constraints built into the SPARQL Inference Notation; Clark and Parsia offers a closed world interpretation of OWL called OWL ICV in their Stardog product.

W3C held a workshop on RDF Validation and convened a W3c RDF Data Shapes Working Group to unify these efforts and create a practical language for defining and enforcing RDF structural constraints.

Goals

Users will understand use cases met and not met by RDF validation. They will see how RDF validation works in Resource Shapes, OWL/ICV and SPIN. Hands-on experience will leave them comfortable using existing ShEx tools to solve practical needs in communicating schemas and verifying instance data conformance.

For use cases not met by conventional RDF validation, we will show how the Shape Expressions extension mechanism enables application-level (business logic) validation such as asserting that one date must be later than another or that a literal may be no longer than N characters. Participants will see how these same extension mechanisms enable a simple, declarative way to map RDF graphs to other formats including JSON and XML (implemented in RDFShape and W3C ShEx Workbench) and even RDF graphs in other ontologies. Time-permitting, we will dive into the details of writing these extensions and show how a trivial one for normalizing graphs can be implemented in approx. 20 lines of code.

Audience

The audience should be comfortable either with using git and a JVM or javascript VM like node, or just their web browser. A rudimentary knowledge of RDF and Turtle is expected. Like SPARQL by Example, this is intended to introduce the audience to a new (to them) language.

Logistics

Tutorial length:

Half or full day, including preferences.

Half day, perhaps following some other tutorial which introduces some data which can be used in example exercises.

Previous versions or related tutorials:

other venues of this or similar tutorials (potentially by a different team of tutors), motivation for offering the tutorial again, at the ESWC2015.

This tutorial was given once to ~25 people at SWAT4LS. promo, slides. Several people used ShEx at the hackathon two days later.

A description of ShEx language was presented at the Semantics Conference receiving the Best Paper Award. In the 1st International Workshop on Linked Data Quality, we have presented an application of Shape Expressions to describe and validate linked data portals. Finally, the complexity of the language will be presented at the International Conference on Database Theory.

Tutoring team:

short bios of the presenters, including previous training and public speaking experience (e.g., teaching in English, conference presentations, tutorials etc.)

Jose Labra Gayo
Jose Emilio Labra is an Associate Professor at the University of Oviedo, Spain. He is the main researcher of the WESO research group. He is a member of the RDF Data Shapes working group and is the chairman of the Best Practices for Multilingual Linked Open Data community Group. He implemented a Shape Expressions library in Scala called ShExcala (http://labra.github.io/ShExcala/) and maintains an online RDF validator service called RDFShape (http://rdfshape.weso.es)"
to reviewers Jose has been qualified to teach in English by the Vice-Rector Office for Research and International Excellence Campus of the University of Oviedo, and in fact, he has been teaching a Software Architecture course in English at the School of Computer Science Engineering in that University. He has also done numerous presentations in several conferences in English. Further details of his curriculum can be consulted at his web page.
Eric Prud'hommeaux
"ericP" has been the W3C staff contact for the Health Care and Life Sciences Interest Group, RDF Data Shapes (RDF Validation), LDP, RDF 1.1, SPARQL 1.1, RDB2RDF, SPARQL 1.0, SAWSDL and XML Protocol Working Groups. He has developed and designed multiple languages, including a significant contribution to SPARQL and ShExC (Shape Expressions Compact syntax). He developed the Fancy ShEx Demo to promote understanding about and exploitation of ShExC.
to reviewers Eric is a native English speaker and have given tens of presentations and tutorials, including many SPARQL tutorials. He is relatively adept at engaging audiences in their area of interest/expertise and am borderline tyrannical about making observers participate in hands-on exercises.

CVS Log

    $Log: Proposal.html,v $
    Revision 1.5  2015/01/10 09:20:12  eric
    ~ synched with pdf version

    Revision 1.4  2015-01-09 12:18:34  eric
    + to reviewers section of bio

    Revision 1.3  2015-01-09 12:18:20  eric
    + to reviewers section of bio

    Revision 1.2  2015-01-09 11:31:59  eric
    + to reviewers section of bio

    Revision 1.1  2015-01-09 09:40:36  eric
    CREATED