Warning:
This wiki has been archived and is now read-only.

Tom Baker

From Provenance WG Wiki
Jump to: navigation, search

Response to the review from Tom Baker.

My comments are divided into two postings. This posting addresses:
1. Status of the Turtle representations and the subclasses they declare
2. Various points of substance
3. Minor editorial points
The next posting will continue with:
4. Issues in the Introduction re: Dublin Core and "DC Terms"
I reviewed the Mapping primarily from the standpoint of Dublin Core. Though I
am currently the CIO of DCMI, my review has not gone through DCMI process so
should be considered my opinion. I have also reviewed aspects of the Mapping
from the standpoint of one who has been involved in various contexts with W3C
process (e.g., Point 1 below).
What I am not qualified to comment on in much detail are aspects related to the
PROV model, which I have not studied in detail. There were one or two places,
flagged below, where I thought that deeper knowledge of the model was really
necessary for understanding particular points. However, it speaks well for the
authors that I felt I could follow it without extensive knowledge of PROV. I
like it when the authors suggest that the Mapping could facilitate PROV
adoption by allowing users to use Dublin Core statements as a starting point
for generating more complex PROV representations -- a very good idea and one
that could inform a very instructive tutorial or primer.
Tom
[1] http://www.w3.org/TR/2013/WD-prov-dc-20130312/
======================================================================
1. Status of the Turtle representations and the subclasses they declare 
The Turtle representations of the mappings are buried in anchors to
the hyperlink "here" in the Abstract but are not further mentioned.
Generally speaking, the use of "here" as a hyperlink is not ideal in
specifications such as this, which many people may read in the form
of a printout, or offline, perhaps in Instapaper on an iPad. 
I suggest:
-- Create entries for the Turtle representations in the References
section [3], then cite them in the specification. 

Done

-- Discuss the Turtle representations somewhere in the specification
besides just the Abstract, and add some explanation clarifying their
status. Do they fall under a W3C namespace policy? Are they linked to
WD-prov-dc such that any future revisions in the Turtle representations
could only be undertaken in the context of a revision of WD-prov-dc?
Are they provided merely as a convenience for readers, or do the editors
intend them to be used (and how)? I do not think a long text is
required, but it would be good to clarify for the reader what these are
and how they fit into W3C publication and maintenace processes, and to
make their URIs visible in References.

TO DO

-- In Section 3.2, I am puzzled about the status of "subclasses" such as
prov:Publish. I see that these subclass declarations in Turtle are
mirrored in [2], but I see no referece to prov:Publish in PROV-O.
It is unclear, in other words, whether: 
To properly reflect the meaning of the Dublin Core terms, more specific
subclasses are needed:
means
more specific subclasses would be needed (but haven't been created)
or
more specific subclasses have been created
If the latter, then the text would need to point to PROV-O. If the
former, then it would be doubly important to clarify the status of the
Turtle representations. Does [2] intend to encourage people to use
prov:Publish in their data?
[1] http://www.w3.org/ns/prov-dc-directmappings.ttl
[2] http://www.w3.org/ns/prov-dc-refinements.ttl
[3] http://www.w3.org/TR/2013/WD-prov-dc-20130312/#informative-references

I have added some text to clarify this. The refinements are introduced to qualify DC statements with PROV. So we would just need the refinements if we want to produce PROV data or we want to derive DC data from PROV data. The refinements are included as an extension to PROV-O. They are not part of the core ontology.

----------------------------------------------------------------------
2. Various points of substance
-- 1.1 Namespaces (and the term "namespace")
The term "namespace" is used a bit loosely here. It is worth noting that
the current draft RDF 1.1 Concepts and Abstract Syntax spec, while still
just a Working Draft, concludes that [1]:
The term "namespace" on its own does not have a well-defined meaning in
the context of RDF, but is sometimes informally used to mean "namespace
IRI" or "RDF vocabulary".
I suggest changing the name of the section and tweaking a few things:
1.1 Namespace URIs

Done

The namespace URIs used in this document can be seen in Table 2.
Table 2: Namespace URIs used in the document
prefix Namespace IRI Used for
owl <http://www.w3.org/2002/07/owl#> The OWL vocabulary [OWL2-OVERVIEW].
rdfs <http://www.w3.org/2000/01/rdf-schema#> The RDFS vocabulary [RDFS].
prov <http://www.w3.org/ns/prov#> The PROV vocabulary [PROV-DM].
dct <http://purl.org/dc/terms/> The DCMI /terms/ vocabulary [DCTERMS].
ex <http://example.org> Application-dependent URIs. Used in examples.
[1] https://dvcs.w3.org/hg/rdf/raw-file/default/rdf-concepts/index.html#vocabularies

Fixed!

-- 3.3.2
The sentence:
It is important to note that since the range for dates in Dublin Core is a
rdfs:Literal and xsd:dateTime for the prov:atTime property, the mapping is
only valid for those literals that are xsd:dateTime.
is not very precise. Perhaps you mean something like:
It is important to note that since the range for DC date properties is
rdfs:Literal, and the range of the prov:atTime property is the class
of literals with the datatype xsd:dateTime, the mapping is only valid
for those literals that have (or could be assigned?) the datatype
xsd:dateTime.
...assuming that "range... is the class of literals with the datatype
xsd:dateTime" is a correct interpretation (I haven't checked the other
specs).

changed

-- 3.3.3
The sentence:
In Dublin Core, most of the properties relating entities to other entities
don't describe the involvement of a specific activity (e.g., dct:format,
dct:source or isVersionOf).
is awkwardly worded. Do you perhaps mean:
In Dublin Core, most of the properties relating entities to other entities
do not imply activities related to provenance (e.g., dct:format,
dct:source or isVersionOf).

That sounds way better. Thanks.

-- 3.3.3.1
I found the following sentence hard to understand:
The replacement is the result of a "search and replace" Activity, which
used a specialization of the replaced entity (_:old_entity) and produced a
specialization of the replacement (_:new_entity).
...but I do not know the PROV model well enough to propose a clearer
text.

In PROV, activities use entities to generate new entities. I have changed the wording so as to use the same terms: The replacement is the result of a "replace" Activity, which used a specialization of the replaced entity (_:old_entity) and generated a specialization of the replacement (_:new_entity).

Specializations could be seen as contextualized entities. That is, there may not be a relation between the entities themselves when one replaces the other, but from the point of view of the activity replacing them, one is derived from the other (they switch positions). It can be seen better with an example: if I have a catalogue and I replace book 1 (item 2 in the catalog) with book 3 (item 27), you can say that item 27 was derived from item 2 (in the context of the catalog), even if book 3 is not derived from book 1.

-- 3.4 Cleanup
I wonder if "cleanup" is the best heading for this section. After using
SPARQL, as described in the previous sections, one ends up with a PROV
graph that has blank nodes for entities, and the process of assigning
identifiers to those blank nodes could be thought of as "cleanup". So far,
so good.
What the "suggestions" then discuss, however, are not methods for cleaning
up an existing generated graph, but different templates for generating
_new_ and _different_ PROV graphs from the same DC statements. As I read
it, this section has more to do with different possible ways to generate
graphs, starting with somewhat different assumptions (related to different
possible ways to model things using PROV), and resulting in different
patterns. If my reading is correct, then I would suggest saying this more
clearly in the introduction to the section and giving the section a more
specific name, such as "Generating PROV graphs using different templates". 

I don't agree with you here. The suggestions are methods to reduce the amount of blank nodes, thus cleaning up the previous graphs. The first method does so by applying a new template, that is true. I'll try to add some sentences in the beggining of the section to clarify.

-- Table 6 - dct:references
For most properties, the commentary says they have been "excluded"
or "left out" of the mapping. For dct:references, however, the text says
that dct:references "has been dropped from the mapping". This wording
makes it sound like there was an earlier, published mapping from which
this was dropped -- more like a change note for a specification than part
of the specification itself. I suggest using "excluded" or "left out".

This has been changed in the latest version of the document. Due to additional feedback and according to the PROV definitions, dct:references has been mapped as a subProperty of prov:wasDerivedFrom. The rationale is in the text, so this proposed change does not apply.

-- Reference in "Reference" section
Currently reads:
[DCTERMS]
Dublin Core Terms Vocabulary. 8 December 2010. URL: http://dublincore.org/documents/dcmi-terms/
Should read:
[DCTERMS]
DCMI Metadata Terms. 8 December 2010. URL: http://dublincore.org/documents/dcmi-terms/

Fixed!

-- In the sentence:
For example, when mapping dates only unqualified properties can be extracted,
I was unsure what you mean by "unqualified".

I have removed that sentence. I have changed it for this one: "For example, when mapping dates there is no information to guess whether an activity with an associated date is a creation, a modification or a publication"

======================================================================
3. Minor editorial points
-- s/don't/do not/ (3.3.3), also search/replace "couldn't", "doesn't", and other contractions

Done

-- "cleanup" and "clean-up" are used inconsistently

Done

-- s/refering/referring/

Done

-- 2.1 Provenance in Dublin Core: Section "Descriptive Terms": replace ", etc."
with a full stop because the sentence already starts with "Some examples".

Done

-- 3.3. Change "We divide the queries in different categories" => "into different categories".

changed.

Here is the second part of my comments on [1]. These comments
address issues in the Introduction re: Dublin Core and "DC Terms".
Tom
[1] http://www.w3.org/TR/2013/WD-prov-dc-20130312/
======================================================================
4. Issues in the Introduction re: Dublin Core and "DC Terms"
-- Characterizing the difference between Dublin Core and PROV-O.
A number of interesting points are made in the text about the difference
between DC and PROV, but they are scattered around the text. For example:
-- "A substantial number of terms in the Dublin Core vocabulary provide
information about the provenance of the resource. Translating these terms
to PROV makes the contained provenance information explicit within a
provenance chain." [from the Abstract]
-- "If an action is involved... then it is relevant for its provenance."
-- "While Dublin Core includes provenance information, its focus lies on the
broader description of resources. PROV models a provenance chain, but it
provides almost no information about the involved resources themselves."
[from the Conclusion]
It would be nice to pull these insights together, for the benefit of the
readier, in the Introduction. I suggest a way to do this below.
-- Use of "DC Terms"
The Introduction currently says:
The Dublin Core Metadata Initiative (DCMI) [DCMI] provides a core metadata
vocabulary (commonly referred to as Dublin Core) for simple and generic
resource descriptions. The original element set (DC elements) was created
in 1995 and contains 15 broadly-defined elements still in use. The core
elements have no range specification, and arbitrary values can be used as
objects. The core elements have been expanded beyond the original fifteen.
Existing elements have been refined and new elements have been added. This
expanded vocabulary is referred to as "DCMI Terms" (DC terms) and currently
consists of 55 properties [DCTERMS].
The use of DC terms is preferred and the DC elements have been
depecreated. Both sets have different namespaces. The original element
set is typically referred with the dc prefix, while dct (or dcterms) is
used as prefix for the DC Terms.
This document defines a mapping between the DC Terms and the PROV
Ontology (PROV-O) [PROV-O], which defines an OWL2 Ontology encoding the
PROV Data Model [PROV-DM].
The situation is admittedly confusing -- a product of historical choices
made more than a decade ago -- but basically: 
-- "DCMI Metadata Terms" is the name of a specification [1], periodically updated,
that includes terms identified using using several namespace URIs, among which
the ones of interest to us are: 
-- http://purl.org/dc/elements/1.1/: the original namespace URI, with the
fifteen properties of "the Dublin Core," which were coined
before the RDFS notions of domain and range had even been
standardized, and which thus have no domains or ranges. While the
use of properties in the /terms/ vocabulary (below) is "gently
promoted" by DCMI in the believe that terms with domains and ranges
are more precise, in a helpful way, than terms which can take either
an entity or a literal as object, there are people in the Dublin
Core community who believe that "rangeless" (aka "free-range")
properties are underspecified in a _helpful_ way. These properties
are still very widely used, and DCMI has carefully avoided saying
they are "deprecated". Indeed, they are no longer even referred to as
"legacy" properties.
-- http://purl.org/dc/terms/: the namespace URI coined after we realized
it was a bad idea to put version numbering into a namespace URI. In
order to assign domains and ranges to the fifteen properties of "the Dublin
Core," we re-coined equivalents of the fifteen properties in the /terms/
vocabulary and assigned domains and ranges to those.
Without going into a long explanation in the prov-dc document, I suggest
saying, simply:
The Dublin Core Metadata Initiative (DCMI) [DCMI] provides a core
metadata vocabulary (commonly referred to as Dublin Core) for simple
and generic resource descriptions.[DCTERMS] The original Dublin Core Metadata
Element Set was created in 1995 and contains fifteen broadly defined
properties that are still in use. Properties identified using the
original namespace URI http://purl.org/dc/elements/1.1/ have no
specified ranges, meaning that arbitrary values can be used as
objects. In order to assign ranges, DCMI replicated the fifteen
properties using the namespace URI http://purl.org/dc/terms/. Additional
properties and classes beyond the original fifteen were coined using
this namespace URI. In this document, properties and classes using the
/terms/ namespace URI are referred to, simply, as DC Terms.
This document defines a mapping between the DC Terms and the PROV
Ontology (PROV-O) [PROV-O], which defines an OWL2 Ontology encoding the
PROV Data Model [PROV-DM]. [@@@ - see below] This mapping has been
designed for several purposes:
1. Bridge the gap between the DC and PROV communities, in order to
provide valuable insights into the different characteristics of both
data models.
2. Help developers to derive PROV data from the large amount of Dublin
Core data available on the web, improving interoperability between DC
and PROV applications.
3. Facilitate PROV adoption. Simple Dublin Core statements can be used
as a starting point for more complex PROV data generation.
-- To follow up on my point above re: the differences between DC and PROV,
I suggest adding a paragraph at the point marked "[@@@]" above that says,
roughly:
The PROV vocabulary and data model are focused on expressing actions
and resource states in a provenance chain rather than on describing
resources in a general sense. The Dublin Core vocabulary is focused on
describing resources in a general sense, but a substantial number of
terms in the vocabulary provide information related to the provenance
of the resource. Mapping statements using Dublin Core into statements
using PROV makes the contained provenance information explicit.
[1] http://dublincore.org/documents/dcmi-terms/

Given that you have way more expertise in this subject than I have, I have replaced the old text with the proposed one.