Review of DM WD4 from Timothy Lebo on 2012-02-22 (public-prov-wg@w3.org from February 2012)

From: Timothy Lebo <lebot@rpi.edu>
Date: Wed, 22 Feb 2012 13:46:49 -0500
To: "public-prov-wg@w3.org Group" <public-prov-wg@w3.org>
Message-Id: <995BD58C-DB94-4052-BE85-BE9A271695C0@rpi.edu>

I was asked to review DM WD3. This email constitutes my review.
I have included supplemental notes that I hope the DM editors will review and consider in future versions.
I have raised a few of the bigger issues in the tracker already.

Regards,
Tim

Goals of the review (per http://www.w3.org/2011/prov/wiki/Meetings:Telecon2012.02.16#PROV-DM_Simplification):

• decide whether the new documents are inline with the simplification objective

• recommend whether they become the new editor's draft

• if not, identify blocking issues
• if yes, identify potential issues to be raised against these future new editor's draft

• decide whether ISSUE-145, ISSUE-183, ISSUE-215, ISSUE-225 and ISSUE-234 (all relating to identifiers) can be closed

------
http://www.w3.org/2011/prov/track/issues/145
qualified identifiers may not work well with named graphs

This issue can be CLOSED. The treatment of AccountEntities (which I hope will be renamed to prov:Provenance) and the section on provenance of provenance does not impose a scoping of identifiers.
This will make it easy to implement using RDF mechanisms

------
http://www.w3.org/2011/prov/track/issues/183
identifiers in prov-dm

The use of identifiers is no longer confusing. They identify Entities, Activities, etc.
"Records" (a dying term) are not identified, they identifier they mention is identifying the Entity, Activity, Involvement, etc.

------
http://www.w3.org/2011/prov/track/issues/215
ProvenanceOfW3CReport

The example is good because it shows two perspectives, which makes it easy to use for AccountEntity (prov:Provenance).
The identifiers make it a bit dry and hard to follow, but the concrete aspect is MUCH more useful.

------
http://www.w3.org/2011/prov/track/issues/225
What are the objects in the universe of discourse?

This can be CLOSED. It is not confusing in the current writeup.

------
http://www.w3.org/2011/prov/track/issues/234
id identifies entity, not the record

Can be CLOSED.

------- supplemental notes --------

About notes in http://www.w3.org/2011/prov/wiki/ProvDMWorkingDraft4#Design_decisions

• If part 3 is now separate from part 1, there is no need to talk about 'Entity Record' (or whatever Record) in part 1. Instead, we can just mention Entity (or whatever other concept).

+1 This is much more natural

• Given that Part 3 is just about ASN, and therefore is a language, then we can without confusion, talk about 'Entity Expression' since now these would be Expressions of the language

• Does this mean that we would be dropping the term record entirely? What would we bundle up though?

I would say we bundle up "expressions". One could bundle ASN expressions, RDF expressions, XML expressions, etc.

• What about assertions? So should still use the word?

I would suggest the more general term "expression" in place of "assertion".

------- supplemental notes --------

About http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html

Sections entitled "Activity-Entity Relation" seem a bit unnatural. Perhaps something like "Relations between Activities and Entities" would be clearer.

The phrase "when the data it is about changes" is unclear.

"To address this challenge, an upgrade path is proposed to enrich simple provenance..." This paragraph is nice. I'd suggest including "specific subject" in "qualify the subject of provenance".

Is it okay to use ASN before it is defined? "In section 3, PROV-DM is applied to a short scenario, encoded in PROV-ASN, and illustrated graphically."

"Section 4 provides the definition of PROV-DM." is a bit ambiguous. Please elaborate.

The following duplicates: "Activities that operate on digital entities may for example move, copy, or duplicate them. Activities that operate on digital entities may for example move, copy, or duplicate them."

I propose to change the Agent definition from->to:
"An agent is a type of entity that can be associated to an activity, to indicate that it bears some form of responsibility for the activity taking place."
"An agent is a type of entity that bears some form of responsibility for an activity taking place."

perhaps add the person invoking the grammar checker to the following example (to illustrate the levels of responsibility):
"Software for checking the use of grammar in a document may be defined as an agent of a document preparation activity, and at the same time one can describe its provenance, including for instance the vendor and the version history."

add "an" to "Generation is the completed production of a new entity by activity." -> "Generation is the completed production of a new entity by an activity."

reads oddly: "the activity had not begun to consume or use to this entity"

avoid parens in a definition: "(and could not have been affected by the entity)"

avoid "internal" in collection definition "A collection is an entity that has internal structure." -> "A collection is an entity provides structure to some constituents." (or something)

shocked by naming of "AccountEntity" why not "PlanEntity" and "CollectionEntity" (no, I don't want that...) I propose to rename "AccountEntity" to "Provenance"

This sentence is long. Suggest stopping it at the first comma. "It is important to reflect that there is a degree in the responsibility of agents, and that is a major reason for distinguishing among all the agents that have some association with an activity and determine which ones are really the originators of the entity."
("and that is a major reason for distinguishing" -> "There is a major reason for distinguishing")

Suggest removing "active" in "indicating that the agent had an active role in the activity". Does RPI have an active role in the writing of this email (since I'm an RPI student...)? I'd say they have a role, but not an active one.

http://dvcs.w3.org/hg/prov/raw-file/default/model/working-copy/towards-wd4.html#section-UML shows Activity wasStartedBy Agent, but Luc just said in email recently that only Activity wasStartedBy Activity is the way forward. I prefer Activity wasStartedBy Agent and think that some other involvement should be named for the special informed involvement Activity ?triggered? Activity.

"ex:pub2" is a bad name - is it an activity or entity? I recommend "ex:act2"

why aren't the edges labeled in the example?

avoid term "minted" when talking about choosing a URI for a Resource. "minted" is colloquial.

"3.3 Attribution of Provenance" -- YES! :-)

The definition of Activity "An activity is anything that can operate on entities." seems to talk about the future

activity(id, st, et, [ attr1=val1, ...]) does include brackets for optional constituents st and et

"(This type is equivalent to a "foaf:person" [FOAF])" --> we should not bind ourselves to FOAF:

Please add a note to section Note to encourage people to use Account / AccountEntity/ Provenance to annotate provenance assertions as a better practice. When using AccountEntity, the annotated thing can be described _directly_ as a single triple instead of using Notes. Notes are very much "scruffy provenance" and do not benefit from the directness afforded by AccountEntity / prov:Provenance.

:prov_1 {
:simon a prov:Human;
prov:hasAnnotation [
a prov:Note; ex3:reputation "excellent";
rdfs:comment "This is a kludge way to get indirection. Use prov:Provenance instead.";
];
}

:prov_2 {
:simon ex3:reputation "excellent" .
}

:prov_1 a prov:Provenance; prov:wasAttributedTo :first_asserter .
:prov_2 a prov:Provenance; prov:wasAttributedTo :trust_evaluator_agent. .

I'm starting to agree that wasGeneratedBy(id,e,a,t,attrs) should become Generation(id,e,a,t,attrs)

This starts to distract, I think: "While each of the components activity, time, and attributes is optional, at least one of them must be present."
Permitting degenerate cases should not be a priority. If not much (or nothing) is said with an assertion, let it be.

remove "order" from "wasGeneratedBy(e1,a1, 2001-10-26T21:32:52, [ex:port="p1", ex:order=1])" because it is distracting and encourages not using PROV for things that PROV should do.
I think Paolo agreed to this before.

both agents are responsible in Responsibility. Suggest to rename "responsible" to "superior" in "responsible: an identifier for the agent, on behalf of which the subordinate agent acted;" in section 4.2.3.1

two wasQuotedFroms in the UML diagram in section 5

Received on Thursday, 23 February 2012 13:24:48 UTC