The provenance data model (third working draft)

The Provenance Working Group began its activities with a charter naming some 17 concepts relevant to provenance, such as resource, process execution, use, derivation, version, etc

For the first 3 months leading to our first face to face meeting, we debated definitions for these concepts. Importantly, for the social cohesion of the group, we developed a common vocabulary shared by members to communicate.

Following the first face to face meeting, editors were tasked to produce a concrete document, against which the group could formally raise issues and make concrete proposals. In October, this document was released as a first public working draft. We were aware of its limitations, but it served an important purpose: it was setting the direction and scope of the model we were proposing to standardize.

Since then, the group has worked really hard at rationalising concepts of the PROV data model. Key hilights include:

  • introduction of the notion of responsibility, which may be assigned to agents, for the activities they participated in
  • a better characterisation of derivation, which represents, for example, the transformation of a raw data set into linked data
  • ability for the model to track how collections of data evolved
  • a relation which expresses that two different descriptions relate in some way to a same thing in the world
  • definition of a set of constraints, which allow humans and reasoners to determine whether a set of provenance assertions makes sense

The third working draft includes these changes, and we feel that the data model has reached some level of stability, and that from now on any release should be synchronised with PROV ontology definition and the PROV primer.

At our second face to face meeting, we debated intensively what identifiers of the model denote. A challenge one faces with provenance (as well as any form of metadata) is that provenance may no longer be valid if the subject of provenance changes. To make provenance assertions robust, a partial state of the subject has to be characterised in terms of time and attributes, and its provenance expressed.

However, a lot of current practice simply identifies the subject of provenance with a URI where nothing is said about the identified resource state. Thus, the prov-wg has decided that it will present the data model, to support this common usage. In a separate document, an upgrade path will be proposed: to produce a more robust form of provenance, extra assertions can make explicit the extent to which provenance assertions keep an interpretation when changes in subjects occur.

Work on the fourth working draft has already begun; when complete, I will blog again about it.

New W3C Validation Service with RDFa 1.1 and microdata

W3C has launched a new HTML validation site, the Nu Markup Validation Service. From a Semantic Web point of view it is important to note that the default setting of the validator validates HTML5 with RDFa 1.1. Lite and with microdata, the two simpler syntaxes to add structured data to HTML. Furthermore, the user can also choose the option of validating with RDFa 1.1 (instead of RDFa Lite) as well as microdata.

Report: Provenance Working Group 2nd Face-to-Face Meeting

The Provenance Working Group just had its second F2F meeting where we made substantial progress on a number of issues in creating a way to interchange provenance on the Web. We wanted to let the community know where were at and where we are going.

Overall, we have a good set of first drafts of the PROV family of documents but there’s still a ways to go in getting them all in-line with each other and well presented such that they are useful to the developer and user communities. This meeting focused on the issues of how we can make rapid progress while achieving that goal.

Simplify, Simplify, Simplify
We heard the response to our first working drafts and have been simplifying the PROV Data Model. Our mantra has been simplify, simplify, simplify. You’ll see some of it in our recent 3rd Working Draft of PROV-DM. But there’s still a ways to go to get the document where we want it to be, in particular, in terms of how constructs and concepts are explained.

While most of the constructs will remain the same, at the F2F meeting, we agreed to simplify the notion of account (a container for provenance) to focus on the use case of provenance of provenance.

Two Broad Use Cases
At the meeting, it became clear that one of the hard parts of devising a common interchange format was being able to support the group’s two broad use cases:

  1. The ability to use the PROV vocabulary to make provenance statements about existing things on the Web. Think for example adding simple provenance metadata (i.e. authorship) in a web page.
  2. The ability to exchange PROV information between provenance systems where a static or fixed view of data is key. This is common in current provenance tracking systems. Think exchanging information between version control systems or two scientific workflow systems.

This realization helped the group in thinking about how to best explain PROV. Since PROV supports both use cases, we will aiming to first explain how to use it in the broad case and then describe how one can use it in use cases that require a more exacting view.

Working with Dublin Core
At the working group we are always aware of existing provenance vocabularies on the Web. In particular, we are excited that Kai Eckert will be leading a best practice document on how PROV works with Dublin Core one of the most widely use provenance vocabularies on the Web.

The Community
If you’re interested in PROV, we encourage you to first begin with the PROV-Primer. This is the best place to get an understanding of PROV. In the next two months, we’ll be producing updated working drafts. We hope to have a complete set for the community to review. In the meantime, we are always interested in your input. If your using PROV now, please let us know.

New RDFa Drafts Published

The W3C RDF Web Applications Working Group has published three Last Call Working Drafts today:

Together, these documents outline the vision for RDFa in a variety of XML and HTML-based Web markup languages. RDFa Core 1.1 specifies the core syntax and processing rules for RDFa 1.1 and how the language is intended to be used in XML documents or in HTML. RDFa Lite 1.1 provides a simple subset of RDFa for novice Web authors. XHTML+RDFa 1.1 specifies the usage of RDFa in the XHTML markup language.

A number of improvements have been made to RDFa 1.1 over the past year by working closely with Google, Microsoft, Yahoo! and the other search engine developers. Public review and comments have resulted in a number of further refinements to the language that eases the learning curve for beginner Web authors.

The release of these documents as Last Call Working Drafts is a signal to the public that the Working Group believes that all of the technical requirements, public comments and reported issues have been addressed. It is also an open invitation to the general public to review and provide feedback on the finalization of this technology via the RDF Web Applications Working Group mailing list, by 21 February.

Workshop Report: W3C Linked Enterprise Data Workshop

W3C today published the final report of the Linked Enterprise Data Workshop, hosted by W3C on the 6-7 December in Cambridge, MA, USA. This workshop provided a way for the community to meet and discuss some of the challenges when deploying application relying on the principles of Linked Data. The presentations covered many different topics, ranging from the benefits a set of additional conventions would bring to specific technical issues such as the challenges of dealing with the reality that URLs do change sometimes, as well as the need for a more robust security model, and specific gaps in the current set of standards.

Participants of the Workshop agreed that W3C should create a Working Group to define a “Linked Data Platform”. This is expected to be an enumeration of specifications which constitute Linked Data, with some small additional specifications to cover specific functionality such as pagination. We anticipate a draft charter will be available in the coming weeks.

Drafts Published by the W3C HTML Data Task Force: HTML Data Guide and Microdata to RDF transform

The HTML Data Task Force of the W3C Semantic Web Interest Group has published two documents today:

  • The HTML Data Guide aims to help publishers and consumers of HTML data. With several syntaxes (microformats, microdata, RDFa) and vocabularies (schema.org, Dublin Core, microformat vocabularies, etc.) to choose from, it provides guidance on deciding what to choose in a way that meets the publisher’s or consumer’s needs.
  • The Microdata to RDF describes processing rules that may be used to extract RDF from an HTML document containing microdata.

Both documents are Working Drafts, with the goal of publishing a final version as Interest Group Notes. Comments and feedbacks are welcome; please send them to the public-html-data-tf@w3.org mailing list.

Feedback Welcome: An Overview of the Provenance (PROV) family of specs

Knowing how, where, when and why content was produced is an important part of making a trustworthy web. However, it is often difficult to interchange this provenance information between systems. For example, it’s often difficult to locate or find provenance information for a web page. Even if the provenance information is located, it is often only available as text or if it is available in a structured way it does not use a common terminology — making it difficult to create software that can leverage this information.

The Provenance Working Group was charted to help address these limitations. The group has been working diligently to create a family of specifications (called PROV) that allow for the interchange of provenance. The group is looking for your feedback. This post provides an overview of the various working drafts that have been published and should help you find your way around.

The set of specs at this point addresses two aspects of provenance interoperability introduced above:

  • provenance access
  • provenance representation

PROV-AQ: Provenance Access and Query addresses how to both make available and retrieve provenance information for Web resources. The document specifies how to use existing Web technologies such as HTTP, link headers, and SPARQL to accomplish this. Where possible the specification attempts to be agnostic the format of the provenance being accessed.

Once some provenance is obtained, it is important for the information to be understandable in a machine interpretable fashion. The Working Group has defined a data model (PROV-DM) that provides facilities for representing the entities, people and activities involved in producing a piece of data or thing in the world. The data model is domain-agnostic and has well defined extensibility points. Importantly, the data model has a corresponding OWL ontology (PROV-O) that encodes the PROV-DM. PROV-O is envisioned to specify the serialization for exchanging provenance information.

To help orient users of PROV-O and PROV-DM, the working group has developed a primer (PROV-Primer) that introduces the core constructs of the data model and provides examples using PROV-O. It is recommended that users and reviewers of the specification begin with the primer before moving to the ontology or data model.

The group is looking for feedback of all types: Would you expose provenance using PROV-AQ? Can you represent your provenance information using the PROV-O data model? Does PROV-O integrate well with your Linked Data or other Semantic Web infrastructure?

Let us know what you think.

The PROV family of specifications:

Paul Groth and Luc Moreau on behalf of the PROV-WG

Provenance Access and Query and Provenance Primer Documents published

The W3C Provenance Working Group has published two new documents:

Both documents are First Public Working Drafts; feedbacks and comments are welcome! Please, use the public-prov-comments@w3.org mailing list to provide your comments.

Publication of the SPARQL 1.1 2nd Last Call Working Drafts

The W3C SPARQL Working Group has published the (second) Last Call Working Drafts of the following SPARQL 1.1 documents:

  • SPARQL 1.1 Update defines an update language for RDF graphs.
  • SPARQL 1.1 Service Description defines a vocabulary and discovery mechanism for describing the capabilities of a SPARQL endpoint.
  • SPARQL 1.1 Query Language adds support for aggregates, subqueries, projected expressions, and negation to the SPARQL query language.
  • SPARQL 1.1 Protocol describes a means for conveying SPARQL queries and updates to a SPARQL processing service and returning the results via HTTP to the entity that requested them.
  • SPARQL 1.1 Entailment Regimes defines conditions under which SPARQL queries can be used with entailment regimes such as RDF, RDF Schema, OWL, or RIF.

Review comments are welcome through 6 February; please use the dedicated mailing list: public-sparql-dev@w3.org.