Difference between revisions of "Provenance ping-backs"

From Provenance WG Wiki
Jump to: navigation, search
(Benefits)
(How to inform upstream parties about derivations)
Line 53: Line 53:
 
== How to inform upstream parties about derivations ==
 
== How to inform upstream parties about derivations ==
  
 +
=== Tim's proposal ===
 
Parties that serve provenance for entities may accept HTTP POSTS to the URI of those entities. The content of the post should contain some PROV serialization (PROV-O, PROV-JSON, PROV-XML, etc) that mentions the URI that is being posted to.
 
Parties that serve provenance for entities may accept HTTP POSTS to the URI of those entities. The content of the post should contain some PROV serialization (PROV-O, PROV-JSON, PROV-XML, etc) that mentions the URI that is being posted to.
  

Revision as of 20:58, 13 November 2012

If party D derives an entity by using an existing entity provided by an "upstream" party U, it would be useful to have a common mechanism for party D to inform party U about the derivation. Establishing this mechanism permits provenance consumers to trace forward in addition to backward in time.

Examples

Example 1 - Creating Linked Data out of a CSV

Party U: epa.gov offers facility listing for Missouri

Party D: rpi.edu offers an RDF file on http://logd.tw.rpi.edu

Party U would benefit from this information:

<http://logd.tw.rpi.edu/...dump.ttl> 
    prov:alternateOf <http://www.epa.gov/enviro/html/frs_demo/geospatial_data/state_files/state_combined_mo.zip>;
    prov:tracedTo    <http://www.epa.gov/enviro/html/frs_demo/geospatial_data/state_files/state_combined_mo.zip> .

<http://www.epa.gov/enviro/html/frs_demo/geospatial_data/state_files/state_combined_mo.zip>
    prov:alternateOf <https://explore.data.gov/download/uu7d-z828/CSV>;

<https://explore.data.gov/download/uu7d-z828/CSV>
    irw:redirectsTo <http://www.epa.gov/enviro/html/frs_demo/geospatial_data/state_files/state_combined_mo.zip> .

Example 2 - Rehosting an RDF file in a separate SPARQL endpoint

Party U: rpi.edu

Party D: ox.ac.uk loads RPI's RDF file into their own endpoint.

Party U would benefit from this information:

<http://ox.ac.uk/instances/endpointNamedGraph32> prov:tracedTo <http://logd.tw.rpi.edu/...dump.ttl>  .

Example 3 - An application is built from querying a SPARQL endpoint

Party U: ox.ac.uk

Party D: my.name/blog/2012-03-23

Party U would benefit from this information:

<http://my.name/blog/2012-03-23#chart_2> prov:tracedTo <http://ox.ac.uk/instances/endpointNamedGraph32> .

Benefits

1)

When somebody visits data.gov, they could be pointed directly to http://my.name/blog/2012-03-23 to see one existing perspective on the data being offered.

2)

data.gov, RPI, and oxford can quantify their contribution and return on investment: "1 blog(s) used results of our labor"

How to inform upstream parties about derivations

Tim's proposal

Parties that serve provenance for entities may accept HTTP POSTS to the URI of those entities. The content of the post should contain some PROV serialization (PROV-O, PROV-JSON, PROV-XML, etc) that mentions the URI that is being posted to.

The blogger (or, any crawler that finds the provenance in the blog) could tell Oxford:

POST /instances/endpointNamedGraph32 HTTP/1.1
Content-Type: text/turtle
Content-Length: 32

@prefix prov: <http://www.w3.org/ns/prov#> .
<http://my.name/blog/2012-03-23#chart_2> prov:tracedTo <http://ox.ac.uk/instances/endpointNamedGraph32> .

Then Oxford could tell RPI:

POST /...dump.ttl HTTP/1.1
Content-Type: text/turtle
Content-Length: 32

@prefix prov: <http://www.w3.org/ns/prov#> .
 <http://ox.ac.uk/instances/endpointNamedGraph32> prov:tracedTo <http://logd.tw.rpi.edu/...dump.ttl>  .


Then RPI could tell data.gov:

POST /...dump.ttl HTTP/1.1
Content-Type: text/turtle
Content-Length: 32

@prefix prov: <http://www.w3.org/ns/prov#> .
 <http://ox.ac.uk/instances/endpointNamedGraph32> prov:tracedTo <http://logd.tw.rpi.edu/...dump.ttl>  .

Then data.gov could tell any visitor to their page:

"Hey, check out http://my.name/blog/2012-03-23 because it uses this data"