First comments extracted from the responses

From Provenance WG Wiki
Jump to: navigation, search

Main needs of the stakeholders

  • 1.-Software performance description.
  • 2.- Heterogeneous information integration (as in the news example) to increase data quality. (we would include licensing here too)
  • 3.- Scientific workflow description and experiments (reproducibility and evaluation)
  • 4.- Digital objects/records preservation
  • 5.- Guidelines for vendors to provide provenance
  • 6.- Provenance management
  • 7.- Interoperability with other existant approaches/standard.
  • ¿Are these needs covered by the current model? ¿Do we need to address them in domain-specific application profiles?

There is the need of having a lightweight core for those users that are aiming to describe the provenance of tools/resources without wanting to describe their performance. That is, they want to assert the who, when and where, but not "how" or "why".

Data integration, scientific workflows and experiments and digital object preservation seem to be the areas with a major interest by the stakeholders, so we are heading in the right direction. We will need an example in the scientific domain too, though. (Daniel G)

For an example in the scientific domain, please see Provenance of microarray experiments. There we identify 4 layers of provenance:

  • institutional (who created the data)
  • experimental context (what were the parameters/objects used to perform an experiment that lead to the data)
  • significance analysis (what were the algorithms/software used to transform raw data into processed data)
  • dataset description (where is the datasets made available/where is it published).


Representation languages already used

  • 1.- OPM (OPMO/OPMV): 11 +1 Experimenting
  • 2.- PML: 3 +1 Experimenting
  • 3.- OAI-ORE: 1
  • 4.- Own vocabulary/Grammar: 3.
  • 5.- Nothing yet: 10
  • 6.- VOID: 1
  • Can these languages be mapped/adapted to PIL?

OPMO and PML were both taken as reference to develop the initial list of concepts discussed in the domain model, (so it is easily adaptable to PIL). VOID adds descriptions about datasets, but in general is a lightweight vocabulary. I am not very familiar with the OAI-ORE vocabulary, but I remember that it basically included aggregations to gather descriptions from the different resources. I'm not very sure how this would be adapted to PIL. (Daniel G)