From Provenance WG Wiki
Suggested Proposals ahead of teleconference
Ahead of the 2012-01-19 call, I would like to put forward some proposals regarding identifiers, following my view on this issue described at ProvenanceOfW3CReport. The hope is to help steer discussion towards consensus, so that we can update the prov-dm document.
There is a requirement that *all* objects of discourse are identifiable and have an identifier.
Hence, when we express an entity record, we use the identifier of that entity.
e.g. in ASN: entity(w3:WD-prov-dm-20111215, [ prov:type="WD" ])
Here, w3:WD-prov-dm-20111215 identifies the entity, not this record.
Constraint still applies http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html#identifiable-record-in-account and w3:WD-prov-dm-20111215 helps identify a record locally in an account.
Generation and Usage Events
Generation and Usage events also belong to the universe of discourse. So they should be given identifiers.
Currently, prov-dm indicates that these identifiers are optional, and identify generation/usage record.
Instead, these identifiers will denote events (not records) and will be mandatory.
Question: was does it mean for the object property provo:wasGeneratedBy? Tentative answer: this property is already unable to distinguish multiple qualified generations between a given entity/activity pair.
In precise derivation-records, all identifiers refer to objects of the universe of discourse: generated and use entities, activity, generation and usage events.
wasDerivedFrom(w3:WD-prov-dm-20111215,hg:Overview.html, ex:rcp, rec:g, rec:u)
I am still not clear about how to handle notes. We want to annotate records ... not objects of the universe of discourse.
The "equation" still holds:
entity id + account id = natural key for entity record
It works fine for activity too, generation and usage too.
However, derivation, association, acted on behalf of are not directly identified. Or are they part of the universe of discourse and should be identified?
The E/R diagram is not a representation of the structure of records. Instead, it represents how all the concepts introduced of our conceptualization relate together.
Then we have accounts.
Accounts are a construct of prov-dm. Hence, we may think they are not part of the universe of discourse. Except that, we want to use accounts to assert attribution of provenance.
In fact, if we push accounts to the universe of discourse, an account becomes a thing, according to our conceptualization. We then need to define entities about that thing, to describe its provenance.
Indeed, there may be different entities for a given account:
- the original/conceptual account, as its asserter created it, with a given name (The records in that account may or may not be known by the asserter: indeed, the asserter may still be generating provenance records for this account).
- the account fragment, as returned by a call to prov service using prov-aq.
- the processed account, bundled with annotations by another party
All these examples of accounts should be seen as entities for provenance purpose.
account(acc1, ...) entity(e_acc1,[prov:type="AccountEntity",account="acc1"]) wasAttributedTo(e_acc1,agent)
Cross account relations
We still need to be able to "link" entities and activities across account.
Entity identifiers globally identify entities (likewise for other instances of our conceptualization). Account identifiers globally identify accounts.
So, we could write:
account(acc1, entity(w3:WD-prov-dm-20111215, [ prov:type="WD" ])) account(acc2, entity(w3:WD-prov-dm-20111215, [ prov:type="html4" ]) ) wasAlternateOf(w3:WD-prov-dm-20111215 in acc1, w3:WD-prov-dm-20111215 in acc2)
So the proposal 5:
Allow any reference to an instance of the domain of discourse to be qualified by an account id.