Difference between revisions of "PROV-FAQ"

From Semantic Web Standards
Jump to: navigation, search
(Validity and Bundles)
(Added example about Derivation vs. Usage/Generation)
Line 227: Line 227:
 
   endBundle
 
   endBundle
 
  endDocument
 
  endDocument
 +
 +
=== Can I infer Derivation from Usage and Generation? ===
 +
 +
No!  Consider the trivial contrived example of an activity X that uses 3 input entities, a,b,c, and generates 2 output entities, one with a value of the sum of the first two (d = a+b), and a second with a value of the second two (e = b+c).  You could express that:
 +
 +
used(X, a)
 +
used(X, b)
 +
used(X, c)
 +
wasGeneratedBy(d, X)
 +
wasGeneratedBy(e, X)
 +
 +
Given our description, we would also assert that:
 +
 +
wasDerivedFrom(d, a)
 +
wasDerivedFrom(d, b)
 +
wasDerivedFrom(e, b)
 +
wasDerivedFrom(e, c)
 +
 +
But you could not infer from the Usages and Generations for activity X that:
 +
 +
wasDerivedFrom(d, c)
 +
 +
or:
 +
 +
wasDerivedFrom(e, a)

Revision as of 19:19, 21 March 2013

Here are some answers to frequently asked questions about PROV and using it.

What is the relationship between Dublin Core and PROV?

The two specifications are complimentary. Dublin Core contains many provenance related terms many of which are more specific than those provided by PROV. For those terms, there is a mapping from Dublin Core Metadata Terms to PROV provided by a best practice document. If your system understands PROV, it will be able to understand the provenance terms of Dublin Core.

How can I define a sub activity?

PROV contains the notion of an activity. Some people have asked how to model sub-activities. We suggest using dcterms:hasPart


When should I use prov:Agent subtypes?

If you want to denote that some Person, Organization or Software Agent has responsibility for and activity or entity use the subtypes of prov:Agent. Otherwise, we suggest that use other ontologies like FOAF.

Examples of Provenance

Provenance of a Car

The prov-dm document gives the example of a car, moved from Boston to Cambridge (see example 5 CR document). For this car, we identify multiple entities exposing various facets of the thing: Joe's car, Joe's car in Boston, and Joe's car in Cambridge.

entity(joe-car)
entity(joe-car-boston, [prov:location="boston"])
entity(joe-car-cambridge, [prov:location="cambridge"])
alternateOf(joe-car-boston,joe-car-cambridge)
specializationOf(joe-car-cambridge, joe-car)
specialization(joe-car-boston, joe-car)

Joe-car-cambridge begins to exist when the car arrives is Cambridge, and Joe-car-boston ceases to exist (invalidation) once it leaves Boston. So joe-car-cambridge's generation time is defined as the time at which in arrives in Cambridge.

Provenance of Flour

If a change in a resource's state is something to be documented in the provenance, then that requires multiple entities.

If a change is to be documented in PROV, then multiple entities are used, e.g. the flour before and after baking. If it is not documented, then only one entity is required. There is no notion of a change which is "documented but not significant", because it is unclear what significance would be in general except for the decision to model/document it. As before, a general, mutable "flour" entity can exist that is connected to the flour before and after baking using prov:specializationOf. For example:

ex:baked prov:used ex:flour1
ex:flour2 prov:wasGeneratedBy ex:baked
ex:flour2 prov:wasDerivedFrom ex:flour1
ex:flour1 prov:specializationOf ex:flour
ex:flour2 prov:specializationOf ex:flour

Why doesn't PROV use FOAF?

TODO: Tim Lebo

Access and query - arbitrary data

@@TODO: text lifted from what was section 3.4 of PROV-AQ

If a resource is represented using a data format other than HTML or RDF, and no URI for the resource is known, provenance discovery becomes trickier to achieve. This specification does not define a specific mechanism for such arbitrary resources, but this section discusses some of the options that might be considered.

For formats which have provision for including metadata within the file (e.g. JPEG images, PDF documents, etc.), use the format-specific metadata to include a <a class="internalDFN">target-URI</a>, <a class="internalDFN">provenance-URI</a> and/or <a class="internalDFN">service-URI</a>. Format-specific metadata provision might also be used to include <a class="internalDFN">provenance information</a> directly in the resource.

Use a generic packaging format that can combine an arbitrary data file with a separate metadata file in a known format such as RDF.


Access and Query - alternatives

@@TODO

The following text copied from PROV-AQ "Best Practices"

Using SPARQL for provenance queries

Simply identifying and retrieving provenance information as a resource on the Web may not always meet the requirements of a particular application or service, e.g.:

  • the resource for which provenance information is required is not identified by a known URI
  • the provenance information for a resource is not directly identified by a known URI
  • a requirement to access provenance information for a number of distinct but related resources in a single atomic operation
  • etc.

A provenance query service provides an alternative way to access provenance information and/or provenance-URIs. An application will need a URI for the provenance query service, and some relevant information about the resource whose provenance is to be accessed.

The details of a provenance query service is an implementation choice, but for interoperability between different providers and users we recommend use of SPARQL RDF-SPARQL-PROTOCOL RDF-SPARQL-QUERY. The query service URI would then be the URI of a <a href="http://www.w3.org/TR/rdf-sparql-protocol/#conformant-sparql-protocol-service" class="externalRef">SPARQL protocol service</a> (often referred to as a "SPARQL endpoint"). The following subsections provide examples for what are considered to be some plausible common scenarios for using SPARQL, and are not intended to cover all possibilities.

A SPARQL protocol service description may be published using the SPARQL 1.1 Service Description vocabulary SPARQL-SD.

The following subsections illustrate use cases for querying a SPARQL-based provenance query service.


Find a provenance-URI given a target-URI

If the requester has a <a class="internalDFN">target-URI</a>, a simple SPARQL query may be used to return the corresponding <a class="internalDFN">provenance-URI</a>. E.g., if the original resource has a target-URI http://example.org/resource:

  @prefix prov: <http://www.w3.org/ns/prov#>
  SELECT ?provenance_uri WHERE
  {
    <http://example.org/resource> prov:hasProvenance ?provenance_uri
  }
            

Find Provenance-URI given identifying information about a resource

If the requester has identifying information that is not the URI of the original resource, then they will need to construct a more elaborate query to locate a resource description and obtain its provenance-URI(s). The nature of identifying information that can be used in this way will depend upon the third party service used, further definition of which is out of scope for this specification. For example, a query for a document identified by a DOI, say 1234.5678, using the PRISM vocabulary PRISM might look like this:

  @prefix prov: <http://www.w3.org/ns/prov#>
  @prefix prism: <http://prismstandard.org/namespaces/basic/2.0/>
  SELECT ?provenance_uri WHERE
  {
    [ prism:doi "1234.5678" ] prov:hasProvenance ?provenance_uri
  }
            


Obtain provenance information directly given a target-URI

This scenario retrieves provenance information directly given the URI of a resource, and may be useful where the provenance information has not been assigned a specific URI, or when the calling application is interested only in specific elements of provenance information.

If the original resource has a URI http://example.org/resource, a SPARQL query for provenance information might look like this:

  @prefix prov: <http://www.w3.org/ns/prov#>
  SELECT ?generationStartTime WHERE {
      <http://example.org/resource> prov:wasGeneratedBy ?activity .
      ?activity prov:startedAtTime ?generationStartTime .
  }
            

This query extracts a "generation start time" for an artifact by following links to the start time of the activity which generated it.


Incremental Provenance Retrieval

<a class="internalDFN">Provenance information</a> may be large. While this specification does not define how to implement scalable provenance systems, it does allow for publishers to make available provenance in an incremental fashion. We now discuss two possibilities for incremental provenance retrieval.

Via Web Retrieval

Publishers are not required to publish all the provenance information associated with a given resource at a particular <a class="internalDFN">provenance-URI</a>. The amount of provenance information exposed is application dependent. However, it is possible to incrementally retrieve (i.e. walk the provenance graph) by progressively looking up provenance information using HTTP. The pattern is as follows:

  1. For a given resource (resource-URI) retrieve it's associated provenance-URI-1 and its associated target-URI-1 using a returned HTTP Link: header field (<a href="#resource-accessed-by-http" class="sectionRef"></a>)
  2. Dereference provenance-URI-1
  3. Navigate the provenance information
  4. When reaching a dead-end during navigation, that is on encountering a reference to a resource (target-URI-2) with no provided provenance information, find its provenance-URI and continue from Step 2. (Note: an HTTP HEAD request for target-URI-2 may be used to obtain the Link: headers without retrieving the resource representation.)

To reduce the overhead of multiple HTTP requests, a provenance information publishers are encouraged to link entities to their associated provenance information using the prov:hasProvenance predicate. Thus, the same pattern above applies, except instead of having to retrieve a new Link header field, one can immediately access the resource's associated provenance.

The same approach can be adopted when using the <a class="internalDFN">provenance service</a> API (<a href="#provenance-services" class="sectionRef"></a>). However, instead of performing an HTTP HEAD or GET against a resource one queries the provenance service using the given <a class="internalDFN">target-URI</a>.


Via SPARQL Queries

Provenance information may be made available using a SPARQL endpoint (<a href="#querying-provenance-information" class="sectionRef"></a>) RDF-SPARQL-PROTOCOL RDF-SPARQL-QUERY. Using SPARQL queries, provenance can be selectively retrieved using combinations of filters and or path queries.

Can PROV-XML use GRDDL?

PROV-XML documents can include a reference to a GRDDL transformation, which -- when invoked with the document as input -- will produce a PROV-O representation of the PROV-XML. Include the xmlns:grddl and grddl:transformation attributes below, and point to any appropriate XSL stylesheet (the one shown is for demonstration purposes and is incomplete).

<prov:document
  xmlns:grddl="http://www.w3.org/2003/g/data-view#"
  grddl:transformation="https://raw.github.com/timrdf/provenanceweb/master/src/provx2o.xsl"

The WG could add a default XSL within the w3.org/ns... Adopters are encouraged to specify whichever @transformation they wish (including their own tailored to their flavor of prov).

Validity and Bundles

The following example shows a document containing two bundles. This document is invalid because the first bundle contains a cycle of derivations.

document
 prefix ex <http://example.org/>

 bundle ex:b1
   entity(ex:e)
   wasDerivedFrom(ex:e,ex:e)
 endBundle

 bundle ex:b2
   entity(ex:e1)
   entity(ex:e2)
   wasDerivedFrom(ex:e2,ex:e1)
 endBundle
endDocument

In this second example, the document is valid, since the validity of each bundle is established independently of the other bundle. While there is a cycle of derivations, this cycle is not present in any single bundle.

document
 prefix ex <http://example.org/>

 bundle ex:b1
   entity(ex:e1)
   entity(ex:e2)
   wasDerivedFrom(ex:e1,ex:e2)
 endBundle

 bundle ex:b2
   entity(ex:e1)
   entity(ex:e2)
   wasDerivedFrom(ex:e2,ex:e1)
 endBundle
endDocument


Finally, in the third example, the document is not syntactically correct since descriptions in bundle ex:b2 refer to prefixes ex1 and ex2, which are not defined in bundle ex:b2 or at the toplevel.


document
 prefix ex <http://example.org/>

 bundle ex:b1
   prefix ex1 <http://example.org/>

   entity(ex1:e1)
   entity(ex1:e2)
   wasDerivedFrom(ex1:e1,ex1:e2)
 endBundle

 bundle ex:b2
   entity(ex1:e1)
   entity(ex2:e2)
   wasDerivedFrom(ex2:e2,ex2:e1)
 endBundle
endDocument

Can I infer Derivation from Usage and Generation?

No! Consider the trivial contrived example of an activity X that uses 3 input entities, a,b,c, and generates 2 output entities, one with a value of the sum of the first two (d = a+b), and a second with a value of the second two (e = b+c). You could express that:

used(X, a)
used(X, b)
used(X, c)
wasGeneratedBy(d, X)
wasGeneratedBy(e, X)

Given our description, we would also assert that:

wasDerivedFrom(d, a)
wasDerivedFrom(d, b)
wasDerivedFrom(e, b)
wasDerivedFrom(e, c)

But you could not infer from the Usages and Generations for activity X that:

wasDerivedFrom(d, c)

or:

wasDerivedFrom(e, a)