ACTION-5: Connect with Stephen Cresswell (UK-TSO) regarding provenance
Connect with Stephen Cresswell (UK-TSO) regarding provenance
- State:
- closed
- Person:
- John Erickson
- Due on:
- September 8, 2011
- Created on:
- September 1, 2011
- Related emails:
- Provenance and GLD (Action 5) (from olyerickson@gmail.com on 2011-09-16)
 
Related notes:
* Note that Stephen's name has been misspelled in GLD minutes; corrected 15 Sep 2011
* Related resources:
** Provenance in Publication of Legislation - ESIWiki http://bit.ly/o1247B
* Additional related provenance and Stephen Cresswell resources:
** "Reasoning over provenance graphs" - OPMV Google Group http://bit.ly/oBUYiV
** W3C Provenance Working Group - http://www.w3.org/2011/prov/wiki/Main_Page
** Open Provenance Model Vocabulary Specification - http://bit.ly/r2nAbn
** Proof Markup Language (PML) Primer - http://bit.ly/qkE3iD
* Addition resources related to OPMV and modelling legislative workflow:
** "Issues arising from legislation publication" - OPMV Google Group http://bit.ly/r2lDUA
Notes from Stephen Cresswell email to John Erickson:
Cc'd Eric Stephan (prov WG connection task force)
<snip>
SC: I think the reason why my name came up was that I'm on the provenance
working group, and I approached the GLD chairs a few weeks ago as part
of the outreach effort (the connection task force of the provenance WG),
and was trying to discover other people's requirements and use cases to
feed back to the provenance WG.  That seemed like it would be easy, as I
saw that the GLD WG charter stated intention to provide those, but it
didn't get much response.
JSE: Are there more details of the specifics of the legislation project?
SC: For the provenance of legislation, there is a very brief overview here:
 http://wiki.esi.ac.uk/Provenance_in_Publication_of_Legislation
To clarify my role in this, I work for The Stationery Office (TSO),
which is a contractor to the UK government on legislation.gov.uk and
many other projects.  The requirement for provenance on the legislation
has been set out by John Sheridan at The National Archives (TNA) in the
UK (and he is a member of the GLD WG).  Jeni Tennison (who is consultant
for TSO working closely with TNA) is a key figure in the development of
legislation.gov.uk and influenced the development (by Jun Zhao of Oxford
University) and adoption of OPMV for this purpose.  My role has been to
work through the legislation workflows to find the best ways to model
this in OPMV to support the kind of queries that we want to do, which to
some extent involves specializing the OPMV concepts, and requires some
use of reasoning on the provenance graphs.  It will be at least a few
months before new legislation starts to appear with associated
provenance information.
JSE: Have there been discussions about how these processes might be
applied to govt data (esp. in the context of data.gov.uk)?
SC: I think it is quite far from being handled in a consistent way across
data.gov.uk.  OPMV is favoured among the linked data folk I work with,
but it is fairly new and could be overtaken by the new standard.  As far
as I know, on data.gov.uk it has mainly been used for describing
transformations into RDF.
At TSO, we host a number of data.gov.uk datasets in our triplestore.
Typically, these are converted to RDF from, e.g. CSV by application of
number of steps in a conversion/loading pipeline.  We aim to capture
these steps by integrating automatic provenance capture into our data
loading pipelines.  There is not much of this provenance information
coming through in the currently published data.
 http://openup.tso.co.uk/blog/our-thoughts-provenance
Apart from describing what we did to the data, there are lots of other
important things that should be said, but often are more difficult to
capture, e.g. concerning its provenance before we processed it, and when
a new release of the data is due.
JSE: How does this work relate to other standards work, including W3C's
Provenance WG <http://www.w3.org/2011/prov/wiki/Main_Page>, the OPM
Vocabulary <http://open-biomed.sourceforge.net/opmv/ns.html>, etc
SC: Our work made use of the OPMV.  That is a provenance ontology based on
the OPM abstract model, but it is a "lightweight" one.  OPMV was
developed by Jun Zhao after Jeni Tennison criticised the original OPM
ontology:
 http://www.jenitennison.com/blog/node/142
Currently, there is "heavyweight" ontology - OPMO, and "lightweight"
OPMV and they have been brought into alignment with one another.
In the future, we plan to make use of the standard (currently called
PROV) created by the provenance WG.  The very same issues which led to
separate OPMO and OPMV are still alive and running in PROV (maybe Tim
has solved this).
Note that the provenance WG adopted a running example involving
government data to illustrate the concepts:
 http://www.w3.org/2011/prov/wiki/ProvenanceExample
JSE: Would you be interested in being a guest on one of our conference
calls in the future (assuming there was a fit)?
SC: Well, I'm willing to do it in principle, but I have already been a guest
on the conference call on 18th August, and I think it might be better to
get a more eloquent spokesperson from the provenance WG, if you want the
best story on provenance in general.
However, I'm still keen to hear from anyone working with government data
with ideas about their requirements for provenance.
Stephen Cresswell
</snip>
Display change log.