Warning:
This wiki has been archived and is now read-only.

Provenance Example described with the Provenance Vocabulary

From Provenance WG Wiki
Jump to: navigation, search

This pages provides an attempt to describe the ProvenanceExample using the Provenance Vocabulary (and some other vocabularies the Provenance Vocabulary was designed to be used with).

Preliminaries

The description is presented using the human-friendly [Turtle syntax] for RDF data.

The URI prefixes used in the description are:

@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

@prefix foaf:    <http://xmlns.com/foaf/0.1/> .
@prefix irw:     <http://www.ontologydesignpatterns.org/ont/web/irw.owl#> .

@prefix prv: <http://purl.org/net/provenance/ns#> .
@prefix prvFiles: <http://purl.org/net/provenance/files#> .

Description of the Processing steps

This description is separated into different parts, corresponding to the processing steps listed in the ProvenanceExample. In addition to the description, this section discusses which aspects of the processing steps can not be expressed with the Provenance Vocabulary.

Processing step 1

government (gov) converts data (d1) to RDF (f1) at time (t1)

ex:gov rdf:type prv:HumanActor , foaf:Organization .
ex:d1 rdf:type prv:DataItem .
ex:f1 rdf:type prv:DataItem ;
      prv:createdBy [ rdf:type prv:DataCreation ;
                      prv:usedData ex:d1 ;
                      prv:performedBy ex:gov ;
                      prv:performedAt "... t1 ..."^^xsd:dateTime  ] .

Processing step 2

government (gov) generates provenance information (prov) regarding RDF (f1)

ex:prov rdf:type prv:DataItem ;
        prv:createdBy [ rdf:type prv:DataCreation ;
                        prv:performedBy ex:gov ] .

not expressible:

  • ex:prov is a representation of provenance-related metadata
  • ex:prov is about ex:f1 (maybe this can be done by just making ex:prov a Named Graph containing all the RDF triples listed before)

Processing step 3

government (gov) publishes RDF data (f1) along with its provenance (prov) on a portal with a license (li1); the rdf data is now available as a Web resource (r1)

ex:gov rdf:type prv:DataPublisher .
ex:portal rdf:type prv:DataProvidingService ;
          prv:usedBy ex:gov .
ex:r1 rdf:type irw:WebResource ;
      irw:isIdentifiedBy [ rdf:type irw:URI ;
                           irw:hasURIString "http://example.org/r1"^^xsd:anyURI ] .

not expressible:

  • license information
  • The Provenance Vocabulary cannot be used to simply describe the fact of publishing an artifact (without describing that something or someone retrieved the thing that has been published). However, some of this information can be found in the description of the next processing step.

Processing step 4

analyst (alice) downloads a turtle serialization (lcp1) of the resource (r1) from government portal

ex:alice rdf:type prv:HumanActor , foaf:Person .

ex:f1 prv:containedBy ex:f1_and_prov .    # Here I assume that i) prov is an RDF based
ex:prov prv:containedBy ex:f1_and_prov .  # provenance description and that ii) f1 and prov
ex:f1_and_prov rdf:type prv:DataItem ;    # are "merged" into a combined set of RDF triples.
               prv:serializedBy ex:lcp1 .

ex:lcp1 rdf:type prv:File ;
        irw:isEncodedIn <http://www.iana.org/assignments/media-types/text/turtle> ;
        prv:retrievedBy [ rdf:type prv:DataAccess ;
                          prv:accessedService ex:portal ;
                          prv:accessedResource ex:r1 ;
                          prv:performedBy ex:alice ] .

issues with the example itself:

  • Alice cannot download lcp1 directly, she must use an HTTP client software for that. However, the provenance description representes what has been given in the example.

Processing step 5

analyst (alice) generates a chart (c1) from the turtle (lcp1) using some software (tools1) with statistical assumptions (stats1)

ex:c1 rdf:type prv:DataItem ; # I understand the chart as a (visual)
                              # representation of data here.
      prv:createdBy [ rdf:type prv:DataCreation ;
                      prvFiles:usedDataFile ex:lcp1 ;
                      prv:usedData ex:stats1 ;
                      prv:performedBy ex:tools1 ] .
ex:tools1 rdf:type prv:NonHumanActor ;
          prv:operatedBy ex:alice .


Processing step 6

newspaper (news) publishes the chart (c1) within a document (art1) written by (joe) using license (li2)

ex:news rdf:type prv:HumanActor , foaf:Organization .
ex:joe rdf:type prv:HumanActor , foaf:Person .
ex:art1 rdf:type prv:Artifact .

not expressible:

  • Since doc is an prv:Artifact but not a prv:DataItem (in the Provenance Vocabulary a prv:DataItem is a rdfs:subClassOf a prv:Artifact) we cannot describe this processing step; the Provenance Vocabulary focuses on data items.

Processing step 7

government (gov) publishes an update (d2) of data (d1) as a new Web resource (r2)

ex:d2 rdf:type prv:DataItem ;
      prv:precededBy ex:d1 ;
      prv:createdBy [ rdf:type prv:DataCreation ;
                      prv:usedData ex:d1 ] .
ex:r2 rdf:type irw:WebResource ;
      irw:isIdentifiedBy [ rdf:type irw:URI ;
                           irw:hasURIString "http://example.org/r2"^^xsd:anyURI ] .

QUESTION: is it correct that gov now publishes d2 directly; wouldn't it be more consistent if gov were publishing RDF data f2 which was obtained from d2?

Processing step 8

blogger (bob) downloads turtle (lcp2) of the resource (r2) from government portal, determines that it's a different version of the same data

ex:bob rdf:type prv:HumanActor , foaf:Person .
ex:lcp2 rdf:type prv:File ;
        irw:isEncodedIn <http://www.iana.org/assignments/media-types/text/turtle> ;
        prv:retrievedBy [ rdf:type prv:DataAccess ;
                          prv:accessedService ex:portal ;
                          prv:accessedResource ex:r2 ;
                          prv:performedBy ex:bob ] .

issues with the example itself:

  • same issue for bob as for alice (see processing step 4)

not expressible:

  • the fact that bob "determines that it's a different version of the same data"

Processing step 9

blogger (bob) generates new chart (c2) based on the data (lcp2) using some software (tools2) with statistical assumptions (stats2)

This step may be described in a similar manner as step 5.

Processing step 10

blogger (bob) publishes the chart (c2) under an open license (li3).

not expressible:

  • Again, the simple fact of publishing something cannot be described by the Provenance Vocabulary (see processing step 3).