ISSUE-186: Section 5.2.1 (PROV-DM as on Nov 28)

Section 5.2.1 (PROV-DM as on Nov 28)

Raised by:
Satya Sahoo
Opened on:
The following are my comments on Section 5.2.1 of the PROV-DM as on Nov 28:

Section 5.2.1:
1. "entity record is a representation of an entity."

Comment: So, we make provenance assertions about the entity or the entity record? How is a provenance assertion about the entity differentiated from an entity record?
For example, is there is a difference between:
a) entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])
b) e0 has size 10KB on disk - this assertion clearly does not mean that the entity record "entity(e0, [ prov:type="File", ex:path="/shared/crime.txt", ex:creator="Alice" ])" has size 10KB! The entity record, with about 80 characters, may have size 1KB on disk.
e0 is a representation of the entity (located at /shared/crime.txt and created by Alice). In any knowledge representation approach and in information systems, we always work with representation of the real world thing and refer to these representations by an identifier. Clearly entity and its records are two distinct information resources. How is fusing entity and its record into single identifier relevant for modeling provenance of entities?

2. "id: an identifier id identifying an entity; the identifier of the entity record is defined to be the same as the identifier of the entity; "

Comment: If the id of entity and entity record are the same, then how can two distinct set of assertions about the same entity exist?
If we use wasComplementOf Approach: We will create a new identifier everytime we want to make an assertion about the same entity?
E.g. Harvard University was established in the 17th century.
Harvard University was established in the year 1636.
will require two distinct identifiers for Harvard University?
Using wasComplementOf does not solve the problem since if there are 100,000 assertions about Harvard University we will end creating 100,000 identifiers and will have to link them together using 100,000 wasComplementOf properties. This is clearly an overly complicated modeling approach. More importantly, this goes against the Web architecture approach of re-use identifiers instead of minting new ones (in this case clearly avoidable):
From the AWWW [1] :
a. Good practice: Avoiding URI aliases - "A URI owner SHOULD NOT associate arbitrarily different URIs with the same resource."


3. "If an asserter wishes to characterize an entity with the same attribute-value pairs over several intervals, then they are required to assert multiple entity records, each with its own identifier (so as to allow potential dependencies between the various entity records to be expressed)."

Comment: If the entity has to be characterized with different attribute-value pairs over same intervals, do they create distinct identifiers?


Related Actions Items:
No related actions
Related emails:
  1. Re: PROV-ISSUE-186: Section 5.2.1 (PROV-DM as on Nov 28) [prov-dm] (from on 2012-02-13)
  2. Re: PROV-ISSUE-186: Section 5.2.1 (PROV-DM as on Nov 28) [prov-dm] (from on 2012-02-10)
  3. Re: PROV-ISSUE-186: Section 5.2.1 (PROV-DM as on Nov 28) [prov-dm] (from on 2011-12-07)
  4. PROV-ISSUE-186: Section 5.2.1 (PROV-DM as on Nov 28) [prov-dm] (from on 2011-12-07)

Related notes:

No additional notes.

Display change log ATOM feed

Chair, Staff Contact
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <>.
$Id: 186.html,v 1.1 2013-06-20 07:37:26 vivien Exp $