Re: PROV-ISSUE-71 (Conceptual Model draft): Section 3.2 of Conceptual Model draft (Content and Editing) [Conceptual Model] from Luc Moreau on 2011-08-24 (public-prov-wg@w3.org from August 2011)

From: Luc Moreau <L.Moreau@ecs.soton.ac.uk>
Date: Wed, 24 Aug 2011 21:08:39 +0100
To: Satya Sahoo <satya.sahoo@case.edu>
CC: public-prov-wg@w3.org
Message-ID: <EMEW3|f1bf276209b8b66b636dbc495e717957n7NL8k08L.Moreau|ecs.soton.ac.uk|4E555A47>
Hi Satya,

Comments interleaved.

On 24/08/11 17:56, Satya Sahoo wrote:
> Hi Luc,
> My comments are inline:
>
>     On 08/07/2011 12:48 AM, Provenance Working Group Issue Tracker wrote:
>
>         PROV-ISSUE-71 (Conceptual Model draft): Section 3.2 of
>         Conceptual Model draft (Content and Editing)  [Conceptual Model]
>
>         http://www.w3.org/2011/prov/track/issues/71
>
>         Raised by: Satya Sahoo
>         On product: Conceptual Model
>
>         Hi,
>         I am reviewing the current draft of the conceptual model as
>         part of our work on the formal model and will be posting
>         comments/suggested changes in a section-wise manner:
>
>         Section 3.2:
>         _____________
>
>         1. What is the difference between e0 and e1? Since we have
>         "Event evt1: Alice creates (pe0) an empty file in
>         /share/crime.txt. We denote this e1.", clearly the file
>         "share/crime.txt" did not exist prior to the time that event
>         evt1 started/happened?
>
>
>     e0 and e1 are two different characterizations of the file.
>     e0 characterizes the file by its location only.
>     e1 in addition characterizes its content.
>
>
>
> So, e0 and e1 are "labels" referring to the same Entity instance? If 
> yes, then we should make that connection explicit and maybe identify 
> one of the two labels as the "preferred label".

No, each of e0 and e1 identifies a characterized thing (as per 
introduced in the model).
We further say that one is a complement of the other.

It's the whole point of the model that we don't identify a thing, but a 
thing+situation, characterized by
the asserter.

However, you have a point, what is it in e0 and e1 for the reader?  all 
this is identified by the attributes.
I am not sure a preferred view. How can we decide?

>
>
>
>         2. "The entities, as characterized, hold during intervals
>         delimited by events." - What does "hold" mean? Existence?
>
>
>     Really, we mean that the attributes have those values during
>     intervals. We need a better word than "hold".
>
>
>         3. In "The following table lists all entities and their
>         corresponding validity intervals" does "validity interval"
>         means their existence or something else?
>
>
>     It's the same interval as in previous point. Again, better term
>     requried.
>
>
>         4. The duration of existence ("validity interval"") of
>         entities should be time interval and not "event intervals".
>
>
>     No we disagree here. It is on purpose that we refer to events. We
>     can order events. It's difficult to order time, because
>     of asserter/observer clocks.
>
>
>
> What metric are we using to order the events? The default would be 
> time, second option may be control.

We essentially adopt Lamport's partial ordering of events. You may want 
to map events to a globally unique clock; Lamport came up with the idea 
of clock vector (to allow for multiple clocks).

However this mapping is outside the data model.

>
> Regarding the issue of asserter/observer for time, similar to the 
> assumptions made when real world is modeled in an information system, 
> we have to make an assumption about time values - either they are 
> asserted w.r.t. a single perspective (clock) or they are explicitly 
> stated to be asserted by different perspectives (clocks).
>
> If we cannot make the above assumption, I am not sure how can we 
> associate a time value with any information.

We can always associate time values, they may not be useful if we don't 
know how clocks relate.

It seems to me that we should be able to capture useful provenance, even 
if our the observers have non-synchronized clocks, even clocks evolving 
at different rates. My  preference would be to go for the most laxist 
assumptions on clocks.  At the moment, the model
does not identify any assumption. Can we live with that?

>
>
>
>
>         5. Why is the validity/existence of e4 limited to event evt5
>         (this should be a time value and not event as discussed in
>         point (4))- we do not have any information that it stopped
>         being e4 after evt5 ("Event evt5: Edith emails (pe4) the
>         contents of /share/crime.txt as an attachment, referred to as
>         e5.")
>
>
>
>     OK, simply (as we say in text) evt5 is considered as the most
>     recent event, i.e. the end of time, in the context of this example.
>     If time continued, ... yes we could refer to a more recent event.
>
>
>
> I think we should leave out the "end of validity" information, unless 
> we explicitly stated in the scenario description that e4 validity 
> ended at some point - this will be consistent with our "open world 
> assumption".
>

OK, we can revise accordingly.

>
>
>         6. What does "t" mean in "processExecution(pe0,create-file,t)"
>         - duration of process, start of process, or end of process?
>         Why are we associating time value with some PE and not with
>         others, "processExecution(pe5,spellcheck)" since time is not
>         mentioned in Section 3.1 "File Scenario"?
>
>
>     Correct, the abbreviation is not declared, it's intended to
>     represent start time.
>     The purpose is to show that time is optional, but we should have
>     written it in the text.
>
>
>         7. "isGeneratedBy(e0,pe0,outFile)" and
>         "isGeneratedBy(e1,pe0,outContent)" is not consistent with
>         "Alice creates (pe0) an empty file in /share/crime.txt. We
>         denote this e1." from Section 3.1. There is no connection
>         between pe0 and e0 asserted? In addition, since pe0 led to
>         creation of "empty file", what does
>         "isGeneratedBy(e1,pe0,outContent)" mean?
>
>
>     It's an interesting one for discussion.
>     e1 is complement of e0. Why can't they be both generated by the
>     same pe0?
>
>
>
> I am still trying to understand the notion of "complement", so I 
> cannot comment on its use.
>
> When you say "both" can be generated by same pe0 - are we referring to 
> the same Entity instance with two "labels" or different Entity instances?

The example indicates there are two entities e0 and e1 (remember, each 
represent an identifiable characterized thing).

(Note the model does not really define a notion of instance (vs class))
>
>
>
>         8. Does "isGeneratedBy(e4,pe2,attachment)" mean that we are
>         considering "emails" (pe2) to include the process of
>         "attaching a file to a mail", which in turn includes the
>         processes "copying file e2", "uploading to email server,
>         thereby creating the file e4 in the email server"?
>
>
>     We are unspecific about e4. It could be a file the email server,
>     or it could be the bits on the wire, when a message is sendt
>
>
>         9. "To distinguish the various entities generated by a given
>         process execution, a role (construct described in Section
>         Role) is introduced." - since we already have different
>         identifier for the entities e1, e2, etc. we are not using role
>         to differentiate between entities. The different "roles" maybe
>         more relevant to identify specific types of processes, for
>         example "fileCreation", "addingContent", "attachingFile" etc.?
>
>
>     OK, poor choice of word.
>
>
>         10. Similarly, for "Uses" property, we are not using "roles"
>         to distinguish the various entities. The given role examples
>         "in and fileIn" may help us differentiate between the PEs -
>         one may be "addingContent" (pe3) and "spellchecking" (pe5)
>         processes - but I think roles are redundant here since we are
>         already using different identifiers for these two PEs.
>
>
>     The same entity could be used by the same process execution
>     multiple times. That's when we need roles to distinguish the
>     various uses.
>
> Agree and they would be roles assumed by the Entity instances.

What do you mean by assumed?

>
>
>
>         11. In "Control", we say "the nature of this influence is
>         described by a role (construct described in Section Role).",
>         but the example roles are describing the entities "Alice -
>         creator", "Bob - author", "Charles - communicator". Further,
>         these roles can be used to characterize the types of processes
>         "fileCreation" etc. as described in points (9) and (10).
>         Examples of roles for "Control" (maybe represented as sub
>         properties) are "starts", "stops", "pauses" etc.
>
>
>     Roles in this case are not used to their full potential, since
>     other agents could also control these processes.
>
>     As indicated in the document in the role section, the meaning of
>     roles is process execution specific.
>
>     I am not sure about these starts/stops/pauses. They refer to
>     another email discussion. I don't think we can model that. Or if
>     we want, we need
>     to revisit the model seriously.
>
>
>
>
> They are specializations of "control", since control is a generic 
> term. I am not sure I understand the difficulty in modeling them as 
> specialization of control (as needed by an application).

OK, I was too quick, I can see a start and end role, for the agents 
starting or ending processes.

Referring to another thread, I don't think that pause/resume can work in 
the model.
>
>
>
>         12. Does an event have a time duration or does it happen in a
>         time instant? How is event related to PE or other concepts? Is
>         there a need to have a provenance concept called "event" -
>         alternatively we can describe the File Scenario in Section 3.1
>         using time values?
>
>
>     Event are meant to occur at a time instant. Not sure we say it.
>     I would like a provenance concept 'event' to be introduced. This
>     would facilitate life in explaining things.
>
>     Should we raise this as an explicit issue?
>
>
>
> Would "event" be a specific part of a PE or can "event" be defined 
> independent of a PE? I am trying to understand this notion and its scope.

I see events as instantaneous timepoints that can be ordered according 
to some partial order.
Events are typed (start/end/use/generate).
A process execution encompasses a set of events (all the events it is 
connected to).

>
> Thanks.

Cheers,
Luc
>
>
> Best,
> Satya
>
>
>
>
>
>     -- 
>     Professor Luc Moreau
>     Electronics and Computer Science   tel: +44 23 8059 4487
>     <tel:%2B44%2023%208059%204487>
>     University of Southampton          fax: +44 23 8059 2865
>     <tel:%2B44%2023%208059%202865>
>     Southampton SO17 1BJ               email: l.moreau@ecs.soton.ac.uk
>     <mailto:l.moreau@ecs.soton.ac.uk>
>     United Kingdom http://www.ecs.soton.ac.uk/~lavm
>     <http://www.ecs.soton.ac.uk/%7Elavm>
>
>
>
Received on Wednesday, 24 August 2011 20:09:17 UTC