This is a private document. Please do not publicize.
200 words maximum, for inclusion to the ESWC 2015 Web site.
Industrial, scientific and clinical uptake of RDF has been hampered by a lack of a high-level validation language analogous to XML Schema or JSON Schema. Shape Expressions provides an intuitive, high-level representation of an RDF graph structure. Based on patterns from regular expressions and RelaxNG Compact Syntax, ShExC is like a grammar for RDF data. Uses include:
This tutorial introduces the concepts behind RDF graph description and validation and the landscape of tools being used for validation. We will then cover use of existing ShEx implementations for:
Like the popular SPARQL by example tutorial, this tutorial includes step-by-step instructions with examples followed by exercises. Participants can download validation tools to use locally or use web-based interfaces like RDFShape or W3C ShEx Workbench.
objectives of the tutorial and relevance to ESWC 2015; information about the scope and level of detail of the material to be covered; intended audiences; learning objectives; practical sessions.
RDF is growing in popularity for both data transfer (LDP) and data storage/recall (SPARQL). In both of these capacities, it is important to describe and verify conformance with a particular graph structure. While the Semantic Web is an environment where anybody can say anything about any topic, we still need to make sure that clinical, genetic, manufacturing, etc. databases capture data in a predictable way.
When we record or exchange data, programs or human operators are expected to synthesize and interpret data. In some cases, the data will not be stored in a generic triple/quad store but instead in a conventional relational database. In order to not mysteriously lose data, this additionally requires that not only does the data include a specified structure, but that all of the data be described by that structure.
Non-RDF data storage systems offer and rely on schemas both to offer quality assurance to users and to enable efficient storage and static query analysis for optimization. SQL's Data Definition Language completely constraints what may appear in an SQL database (with minor exceptions like some databases that don't ensure homogeneity in a column). XML use of W3C XML Schema and Relax NG typically involves validation on data creation and ingestion. Even JSON Schema is growing in popularity as that developer community recognizes the need for basic structural description.
RDF, and graph stores in general, don't demand the draconian restrictions like SQL, but operate more like XML, where the basic language allows any structural construct but specific applications impose further practical demands. In order to meet these needs for RDF, industry, academia, and standards bodies have devoted resources to the development of a standard, interchangeable and publishable standard for structural schema.
OSLC created Resource Shapes; Top Quadrant has constraints built into the SPARQL Inference Notation; Clark and Parsia offers a closed world interpretation of OWL called OWL ICV in their Stardog product.
W3C held a workshop on RDF Validation and convened a W3c RDF Data Shapes Working Group to unify these efforts and create a practical language for defining and enforcing RDF structural constraints.
Users will understand use cases met and not met by RDF validation. They will see how RDF validation works in Resource Shapes, OWL/ICV and SPIN. Hands-on experience will leave them comfortable using existing ShEx tools to solve practical needs in communicating schemas and verifying instance data conformance.
For use cases not met by conventional RDF validation, we will show how the Shape Expressions extension mechanism enables application-level (business logic) validation such as asserting that one date must be later than another or that a literal may be no longer than N characters. Participants will see how these same extension mechanisms enable a simple, declarative way to map RDF graphs to other formats including JSON and XML (implemented in RDFShape and W3C ShEx Workbench) and even RDF graphs in other ontologies. Time-permitting, we will dive into the details of writing these extensions and show how a trivial one for normalizing graphs can be implemented in approx. 20 lines of code.
The audience should be comfortable either with using git and a JVM or javascript VM like node, or just their web browser. A rudimentary knowledge of RDF and Turtle is expected. Like SPARQL by Example, this is intended to introduce the audience to a new (to them) language.
Instruction is grounded in examples executed in convenient tools:
Half or full day, including preferences.
Half day, perhaps following some other tutorial which introduces some data which can be used in example exercises.
other venues of this or similar tutorials (potentially by a different team of tutors), motivation for offering the tutorial again, at the ESWC2015.
This tutorial was given once to ~25 people at SWAT4LS. promo, slides. Several people used ShEx at the hackathon two days later.
A description of ShEx language was presented at the Semantics Conference receiving the Best Paper Award. In the 1st International Workshop on Linked Data Quality, we have presented an application of Shape Expressions to describe and validate linked data portals. Finally, the complexity of the language will be presented at the International Conference on Database Theory.
short bios of the presenters, including previous training and public speaking experience (e.g., teaching in English, conference presentations, tutorials etc.)
$Log: Proposal.html,v $ Revision 1.5 2015/01/10 09:20:12 eric ~ synched with pdf version Revision 1.4 2015-01-09 12:18:34 eric + to reviewers section of bio Revision 1.3 2015-01-09 12:18:20 eric + to reviewers section of bio Revision 1.2 2015-01-09 11:31:59 eric + to reviewers section of bio Revision 1.1 2015-01-09 09:40:36 eric CREATED