Warning:
This wiki has been archived and is now read-only.

Provenance ping-backs

From Provenance WG Wiki
Jump to: navigation, search

If party D derives an entity by using an existing entity provided by an "upstream" party U, it would be useful to have a common mechanism for party D to inform party U about the derivation. Establishing this mechanism permits provenance consumers to trace forward in addition to backward in time.

Examples

Example 1 - Creating Linked Data out of a CSV

Party U: epa.gov offers facility listing for Missouri

Party D: rpi.edu offers an RDF file on http://logd.tw.rpi.edu

Party U would benefit from this information:

<http://logd.tw.rpi.edu/...dump.ttl> 
    prov:alternateOf <http://www.epa.gov/enviro/html/frs_demo/geospatial_data/state_files/state_combined_mo.zip>;
    prov:tracedTo    <http://www.epa.gov/enviro/html/frs_demo/geospatial_data/state_files/state_combined_mo.zip> .

<http://www.epa.gov/enviro/html/frs_demo/geospatial_data/state_files/state_combined_mo.zip>
    prov:alternateOf <https://explore.data.gov/download/uu7d-z828/CSV>;

<https://explore.data.gov/download/uu7d-z828/CSV>
    irw:redirectsTo <http://www.epa.gov/enviro/html/frs_demo/geospatial_data/state_files/state_combined_mo.zip> .

Example 2 - Rehosting an RDF file in a separate SPARQL endpoint

Party U: rpi.edu

Party D: ox.ac.uk loads RPI's RDF file into their own endpoint.

Party U would benefit from this information:

<http://ox.ac.uk/instances/endpointNamedGraph32> prov:tracedTo <http://logd.tw.rpi.edu/...dump.ttl>  .

Example 3 - An application is built from querying a SPARQL endpoint

Party U: ox.ac.uk

Party D: my.name/blog/2012-03-23

Party U would benefit from this information:

<http://my.name/blog/2012-03-23#chart_2> prov:tracedTo <http://ox.ac.uk/instances/endpointNamedGraph32> .

Benefits

1)

When somebody visits data.gov, they could be pointed directly to http://my.name/blog/2012-03-23 to see one existing perspective on the data being offered.

2)

data.gov, RPI, and oxford can quantify their contribution and return on investment: "1 blog(s) used results of our labor"

How to inform upstream parties about derivations

Tim's proposal

Parties that serve provenance for entities may accept HTTP POSTS to the URI of those entities. The content of the post should contain some PROV serialization (PROV-O, PROV-JSON, PROV-XML, etc) that mentions the URI that is being posted to.

The blogger (or, any crawler that finds the provenance in the blog) could tell Oxford:

POST /instances/endpointNamedGraph32 HTTP/1.1
Content-Type: text/turtle
Content-Length: 32

@prefix prov: <http://www.w3.org/ns/prov#> .
<http://my.name/blog/2012-03-23#chart_2> prov:tracedTo <http://ox.ac.uk/instances/endpointNamedGraph32> .

Then Oxford could tell RPI:

POST /...dump.ttl HTTP/1.1
Content-Type: text/turtle
Content-Length: 32

@prefix prov: <http://www.w3.org/ns/prov#> .
 <http://ox.ac.uk/instances/endpointNamedGraph32> prov:tracedTo <http://logd.tw.rpi.edu/...dump.ttl>  .


Then RPI could tell data.gov:

POST /...dump.ttl HTTP/1.1
Content-Type: text/turtle
Content-Length: 32

@prefix prov: <http://www.w3.org/ns/prov#> .
 <http://ox.ac.uk/instances/endpointNamedGraph32> prov:tracedTo <http://logd.tw.rpi.edu/...dump.ttl>  .

Then data.gov could tell any visitor to their page:

"Hey, check out http://my.name/blog/2012-03-23 because it uses this data"

Graham's proposal (example)

Resources:


Oxford pulls from RPI, gets provenance pingback address as additional Link header, in addition to the provenance link per PROV-AQ:

C: GET http://logd.tw.rpi.edu/...dump.ttl

S: Link: <http://logd.tw.rpi.edu/...dump-provenance.ttl>; rel=prov:provenance
S: Link: <http://logd.tw.rpi.edu/...dump-prov-pingback.ttl>; rel=prov:prov-pingback
 :
(resource)

Later, when the new resource has been published, Oxford POSTs to the pingback URI provided by RPI, with a provenance description that references the RPI published resource; the server response to the POST MAY include links to other provenance related information:

C: POST http://logd.tw.rpi.edu/...dump-prov-pingback.ttl
C: Content-type: text/turtle
 :
C:
C: <http://ox.ac.uk/instances/endpointNamedGraph32> prov:tracedTo <http://logd.tw.rpi.edu/...dump.ttl>  .

S: 200 Thanks!
S: Link: <http://logd.tw.rpi.edu/...dump-provenance.ttl>; 
   rel=prov:provenance; anchor="http://logd.tw.rpi.edu/...dump.ttl"
S: Link: <http://logd.tw.rpi.edu/...dump-forward-provenance.ttl>; 
   rel=prov:provenance; anchor="http://logd.tw.rpi.edu/...dump.ttl"

Note that, in the above example, the Link headers returned contain explicit anchor parameters with the URI of the original resource. Without these anchors, the links would relate the supplied provenance URIs to the pingback resource used as the POST request URI.