Warning:
This wiki has been archived and is now read-only.

Antoine Isaac

From Provenance WG Wiki
Jump to: navigation, search
In fact I did not look very thoroughly into the mappings of section 3.1. Time allowing I may send another email later. The mappings look  appropriate at first sight, though. Most of my comments (listed below) are editorial, though some may touch on conceptual issues in the arguments exposed in the text.
The only real problems now could be with roles, e.g., prov:Creator, prov:Publisher. In section 3.2 they are introduced as classes but in section 3.3 they are used as instances of classes. And in section 3.4 it is mixed: the Turtle example has an instance but Fig. 3 has prov:Creator as a class with an instance which is not mentioned in the Turtle example. What is your choice? How does PROV handles roles?

That was a typo on our side, sorry. Roles are introduced as Classes for which instances should be created. Therefore sections 3.2 and 3.4 are ok, and we will fix them in section 3.3 (as instances of classes).

A last general note/disclaimer, I have to say that I will not apply the mappings soon myself, especially not the complex ones (with a 1:15 multiplier ratio between the input triples and the output triples in section 3.3, the clean-up in section 3.4 is a much welcome suggestion!). To some extent I am reading the document now not because I plan to push implementation of all its receipes in Europeana of elsewhere, but because it is a good introduction to PROV for a more traditional metadata community. And this is far from a little achievement. Well done!
Best
Antoine

Thanks! It's a pity you won't use them, as they could enable interoperability with other PROV implementations.

- Abstract: please spell out the URI of the "here" hyperlinks. Or create a specific paragraph in the intro that does it, and point to this paragraph from the abstract.

done

- Status of document: remove "(to be published as X)" or "(Proposed recommendation)" from the listing of PROV documents. In fact I'd suggest just to make a reference to PROV-OVERVIEW and do a much welcome shortening of the section.

This is something common to all the family of PROV documents and we should not change it :(

- ToC - Structure of the document: the document misses an "Appendices" section to wrap A and B together apart from the other (numbered) sections.
- ToC - Structure of the document: I see no reason why there is a B1. The notion of "informative references" is maybe not very useful in a note such as yours!

These are general to all the W3C PROV specifications. I don't think they can be changed at this stage.

- Use of "Dublin Core", "DC", "dc": a couple of occurrence of "Dublin Core" occur after you've started using the abbreviation in a systematic way. Homogenization would be good! Also, there's (at least) one "dc" in courrier font in 2.2.

ok, fixed

========= Section 2.1

- The word "affected" in the first paragraph (E.g., it hints that a resource can be "affected in the past") does not mean much to me, as a non-native speaker. .

What would it help to understand it better? (I understand it as a non-native speaker). Maybe influenced?

- the paragraph on "Descriptive terms" mentions 30 terms for that category. Table 2 has 29.

Thanks, fixed.

- Perhaps a similar issue as above, for "derivation". The elements for rights, which are often related to access and consumption, seem to have a broader scope than what I understand to be "derivation". It's as if you are trying to shoehorn rights into this category. I'm not convinced, and I can't see much benefit in trying this anyway. There could just be an extra category. As a matter of fact I would find this in line with the fact that all rights-related properties have naturally found their place in Table 6 of the rejected properties. For some it is so obvious that you have (rightly) not written a reason for rejecting them!

I have changed the name of the category to "Derivation and licensing Terms". We want them to stick together because they answer the question on how the resource is influenced by changes in ownership (How?).

- Table 2. My printout did not print the expected "What" in the first line (it could be a bug on my side).

Fixed.


========= Section 2.2

- example 1: I'd recommend using more meaningful URIs for the document versions, e.g. ex:prov-dc-20130312 and ex:prov-dc-20121211.

ok, done

- "relates to the different states that the document had". My gut reading of this sentence was that it was about versions only, which is too restrictive (there's more at stake than logical versions of a doc) and unconvincing (if the aim was to capture versions only, dct:replaces would be quite enough). Perhaps replace by "relates to the different stages the document underwent" or something grammatically correct than this.

Changed as suggested

- "involves two different states of the document: the document before it was issued and the issued document". To many readers in the DC community, there will be just one document before and after issuing; it does not really change. Perhaps removing "document" from the second part ("states of the document: /before/ and /after/ publication) will help not discouraging them. This can also make the sentence more coherent (the object is "states" in the first part and "the document" in the second).
It looks like nitpicking, but I fear there's a real risk of losing a part of your core audience here.

changed

- Figure 1: if the graph convention used is the one used throughout all PROV documents, it may be useful to mention. It looks very ad-hoc, otherwise.

Yes, it is the one used in all PROV documents. But I don't think there is a need to state that, right? You will see it if you browse any of those.

- Approach 2: I don't buy the argument that the pattern "implies that ex:doc1 was generated by _:activity and then used by _:activity afterwards". Is there some specific semantics to activities' properties, which I'm missing?

As I understand it, Approach 1 does not imply that _:resulting_entity was generated by _:activity and then _:used_entity was used by _:activity afterwards", which is the exact transposition of your interpretation in Approach 2. It is because the way PROV is defines. An entity can not be used before its creation. Therefore If an activity uses and generates an entity it must have created it and then generated it. It says so in the text brackets (PROV entities must exist before being used). Do you think it needs additional clarification?

- Fig.2 whether I'm right or wrong on the above issue, you can remove "(as it implies[...]activity)" from the caption. It doesn't really belong there.

ok!

- I thought (from the previous version of the PROV-DC document) that the most important argument against Approach 2 was that PROV discouraged a same resource to be used as the input and the output of an activity at a same time. Has it changed? Personally I didn't like that PROV rule, but in the context of a DC-PROV mapping this was a very powerful argument...

We realized (due t some reviews) that Approach 2 is not invalid from the PROV perspective (an activity might generate and use the same resource), but instead of representing what we want (an entity is used and then generated), then it is interpreted as the entity being generated and then used, which is misleading. Why does this happen? Because entities must be generated before created in PROV.

========== Section 3.1

- first paragraph move "(i.e. they will be able to understand DC statements)" just after "to interoperate with these DC statements". the bracketed sentence doesn't really explain "reasoning" per se, it rather tries to explain interoperability.

And is "by applying means of OWL 2" really grammatical a construction? Fixed.

- Table 3: finding dct:Agent here comes a bit as a surprise, as the class has not been introduced before (e.g. in Table 2). Perhaps it could be presented aside.

Now all the classes have been included and separated in a different table.

- Table 3 is really big and has a lot of white space. Maybe removing the namespace prefixes (which do not bear much info anyway, given what the columns include) would allow to trim the first three columns.

I don't agree. The table is big but all the information is necessary (I think it is valuable to know which vocabulary is being mapped to what without having to scroll to the headers).

- Please keep in Table 3 the order defined in Table 3! The current mismatch makes comparison difficult, and for no real reason it seems.

Done

- "This is valid since from the PROV point of view" and the rest of the paragraph should be tightened. In the RDF graph that results from example 1, there is a prov:Entity with two prov:generatedAtTime statements. Is it valid or not? The paragraph currently hint both (it is valid, but does not comply to PROV constraints), which is confusing.
- Table 5 has a confusing introduction: what is its rationale as a separate table? The fact that it's mapping to inverse relationships, or the fact that it's mapping to outside the core of PROV?

That the relationships are outside the core of PROV. I'll clarify it in the text


========== Section 3.2

My personal taste would be to remove the somewhat redundant prov:Activity and prov:Role from the rdfs:subclassesOf prov:Create and prov:Creator.

Why? The refinements have been added to clarify how the PROV terms have been extended to qualify the complex mappings. This has been requested by other reviewers, and I think it shoould stay.

You could replace "refinements of the properties have been omitted" by "refinements of the properties are not needed". The latter is stronger, and still true!

Due to some other reviews we have removed this sentence. It was confusing.


========== Section 3.3
I don't understand why replacement is presented as the result of a "search and replace". There's no "search" implied as default in a dct:replaces link, isn't it?

Yes, we removed this in a previous revision of the document.

========== Section 3.4

The notion of "complement" is unclear. Rather than "certain properties complement each other" couldn't we have "certain properties indicate a same activity"? Changed to: certain properties refer to the same activity.

I am not sure also that dct:modified and dct:contributor are so connected. A contributor can be involved in the creation of the document, I believe. 

True, it might depend on your use case. They normally complement each other, so I wouldn't want to drop it from the list (after all this is a suggestion section). Thoughts?


========== Section 3.5

Table 6: it is confusing to find here the elements that Table 1 lists as relevant for provenance (who when how) and the descriptive metadata elements. The table would benefit from the descriptive ones to be removed, especially the one for which it is absolutely no surprise that they shouldn't be mapped. Or at least the categories should be separated in different tables...

Splitting the tables (in all 4 categories, in fact) would also allow to get rid of the second column, which consumes a lot of space for pretty much nothing. I don't agree here. Table 1 categorizes the DC terms according to their relation to provenance, true, but the table 6 (now 9) is for those terms that don't map to prov terms as well. I think it is usefult to have it in a single table for readibility. Plus, a rationale for each term is given. However, I agree with you in that the order in the first tables and here should be consistent. I'll order everything alphabetically.

It would also help comparisons. Table 2 has 29 "descriptive metadata element", Table 6 has 28. With the order being different, and the size so big, I won't make the effort to know which element has been left out.

Thanks for pointing this out. All the links in table 1 link to the specific term in the mapping, so in order to know whether one has been mapped or not you only have to click on it. I will preserve the order in order to make things easy though.

dct:isRequiredBy line has a type ("reosource")

fixed.