PROV examples - directory conventions

From Provenance WG Wiki
Revision as of 16:49, 29 March 2012 by Tlebo (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Author: Tim Lebo, with many good suggestions from prov-wg

This page is a proposal for how to organize concrete PROV examples. It is intended that these examples can guide discussions of mappings between PROV serializations, be available for automated verification, and provide test cases for PROV tools.


PROV examples were previously kept in the following places:

This design is a fresh start.

Example root

The collection is maintained in the prov-wg mercurial repository at

see MercurialRepository and Mercurial_repository for instructions on how to clone, add to, and commit to the prov-wg repository.

Facets of organization

Some materials that we might want to organize around the example.

  • Example name ("painting flying to boston", "khalid at restaurant", etc.)
  • Application domain (e.g. Journalism, Life Sciences, Financial and Legal Audit) (TODO)
  • Narrative and discussion (e.g. 1-2 pages in wiki format)
  • Original encoding Format (ASN, XML, RDF, JSON)
  • Derived encoding Formats (ASN, XML, RDF, JSON)
  • Testing materials (??, XQUERY, SPARQL, ??)
  • Test targets (i.e., the expected outputs to compare)

Directory organization and naming conventions

Directory for each example

A directory is created for each example. The example's directory is named according to the pattern:


The title is optional and need not be unique. Use dashes to separate words, avoid underscores and spaces. Once the number is assigned, it may not be changed.

For example, the following two directories contain the materials for the first two examples. The first one is not titled, while the second one is.

<number> is determined by counting the number of examples directories and adding one.


Place the file containing the example in a directory that corresponds to its format. Name the file to match the name of the example directory. Use a file extension to indicate which format the example uses (e.g. ttl, rdf, nt for PROV-O, asn for PROV-DM, xml, etc) For example, the following two examples encode RDF using Turtle:

The following format directory names may be used:

  • rdf for PROV-O. This is for the rdf/xml, turtle, ntriples, rdfa, trig, trix, etc. formats.
  • asn for PROV-DM.
  • xml for PROV-XML.
  • json

Illustrating Example files


Automated creation

NOTE: prov-wg has not reviewed this section 

If the example can be created automatically, place the code that can do such at:

the results should be stored at:


If the original example was transcribed (either manually or automatically), place the result in a directory named after the resulting format. So, if these rdf examples were converted to ASN, the result would be stored at:

Note that these locations are different than the following (rdf/convert/ is missing). In the cases above, the ASN was converted from rdf. In the cases below, the ASN was the original format.

To take it a step further, we can see round tripping as:


The structure of the testing materials is still undecided. It will likely change with each test format. Whatever structure is used in one format directory (e.g. examples/$eg/asn/) should be used in all format directories (e.g. examples/$eg/$format/convert/asn/)


If queries can be applied to the example file (e.g. eg-1.ttl), then put the queries in a directory named query. Follow the same pattern for naming queries as we used for naming examples:


For example, the first query is not titled, but the second one was:

The query should be applied to the union of the example files in the current directory (in this example, examples/eg-1/rdf/eg-1.ttl is the only file).

Expected output for a query should match the name of the query, but be placed in the compare directory:

Queries on Round-trips

Queries that applied to earlier encodings of the same format can be applied to the round-tripped results, without duplicating the queries. Queries that should be applied ONLY to the round-tripped results may be placed in the appropriate location (by following the overlay of the two design patterns):

Note that the following two queries are distinct. The former applies to both RDF encodings, but the latter only applies to the older RDF encodings.


This section is a sketch and needs to be annealed with an example.

If the example uses inference, the ontology or rules can be stored in the infer directory. The inferences should be applied to the union of the files in the example.

The expected output is stored in the compare directory.

If the inference should be queried, the query pattern in the previous section can be applied:


Use a directory document at any level within the example directory structure to document the corresponding level. For example, the following text files describe the purpose of the overall example.

Within a document directory, a file homepage may contain one or more URLs that describe this example:

The PROV-O encoding can be documented here:

The ASN conversion processactivity can be documented here:

The ASN output can be documented here:

(and so on)

Reserved namespace for wiki documentation

The namespace is reserved for wiki page documentation for all examples in The following mapping must be followed:<eg-name> <=><eg-name>

See How to document a PROV example

Additional Links


  • We may want to put in some examples for the top 10 domains that at least the folks participating in the WG believe are applicable domains. In other words, I believe some organization on the page (via tagging or structure, whatever) that is by domain would benefit us. In order to foster the adoption of the standard, we need to show folks that are looking for solutions to specific problems, more so than tool builders, how they can solve their specific problems.
  • DONE remove first level format layer, bringing example to top
  • DONE add place to link to homepage for example. (document/homepage described on this page)
  • DONE (The examples I have in mind 1) identify the problem 2) the use of provenance 3) the PROV example 4) how PROV is accessed and queried? I'm thinking 1-1 1/2 pages per each example.) How to document a PROV example
  • DONE add template on wiki to document example.
  • DONE Where to put a stub template? For example, the rdf stub defines some common prefixes. (first for examples are stubs)