Stian Soiland-Reyes

From Provenance WG Wiki
Jump to: navigation, search
My comments are mainly editorial.
Blocking issues:
21) dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf.

agreed and changed as a superProperty of prov:wasRevisionOf.

23) dct:references should be subproperty of prov:wasInfluencedBy

dct:references is already a subproperty of prov:wasDerivedFrom. Thus the influence relationship can be directly inferred.

1) Outdated citations:
> [DCTERMS] Dublin Core Terms Vocabulary. 8 December 2010. URL: http://dublincore.org/documents/dcmi-terms/
Should be:
> Dublin Core Terms Vocabulary. 14 June 2012. URL: http://dublincore.org/documents/2012/06/14/dcmi-terms/

I'm not sure about this change, since Thomas Baker (CIO of DCMI) proposed to use the current one.

> [OWL2-OVERVIEW] W3C OWL Working Group. OWL 2 Web Ontology Language: Overview. 27 October 2009. W3C Recommendation. URL: http://www.w3.org/TR/2009/REC-owl2-overview-20091027/
should be:
> [OWL2-OVERVIEW] W3C OWL Working Group. OWL 2 Web Ontology Language: Overview. 11 December 2012. W3C Recommendation. URL: http://www.w3.org/TR/2012/REC-owl2-overview-20121211/

changed.

2) Links to mappings
> The mapping is expressed partly by direct RDFS/OWL mappings between properties and classes, which can be found _here_.
> Therefore, refinements of classes defined in PROV are needed to represent specific Dublin Core activities and roles. This set of PROV  refinements can be accessed _here_.
The use of "here" hyperlinks is not good practice because it does not
mean anything, specially not when scanning the page for links.
Try:
> The mapping is expressed partly by _direct RDFS/OWL mappings (Turtle format)_ between properties and classes.
> Therefore, _refinements of classes defined in PROV (Turtle format)_ are needed to represent specific Dublin Core activities and roles.

Converted everything to reference and cited the files appropriately

3)
> The use of DC terms is preferred and the DC elements have been depecreated.
--> deprecated

This paragraph was changed due to a suggestion from tom baker. This change does not apply.

4)Table 1 is meant to categorize into What/Who/when/how - but for
"Descriptive metadata" the sub-category is "-" instead of "What".

Fixed.

5)>  but as ownership is considered the important provenance information for many resources
"the" -> "to be"

Fixed

6)> This leaves one very special term: provenance.(..) This term can be considered a link between the resource and any provenance statement about the resource, so it cannot be included in any of the aforementioned categories.
Why is not "provenance" a "what"? How is it any different from say
"abstract" or "tableOfContents" ?
I suggest just changing "cannot be" to "is not" - and we can get away with it.

Done


7)> Example 1: a simple metadata record: Add "in Turtle format [Turtle]".

Fixed.

8)> ex:doc1 dct:title "A mapping from Dublin Core..." ;
> dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ;
> dct:created "2012-02-28" ;
> (..)
Could some indentation be used in the example for the continuation lines? ie:
> ex:doc1 dct:title "A mapping from Dublin Core..." ;
>     dct:creator ex:kai, ex:daniel, ex:simon, ex:michael ;
>     dct:created "2012-02-28" ;
> (..)
(check your tabs -> spaces)

Definitely. Added.


9)> are descriptions of the resource ex:doc1
italics on "descriptions"

Done

10)> As a dc metadata
dc -> "DC" and no < code >

Changed

11)
> a different prov:specialization of the document
--> prov:specializationOf

Fixed.

12) > which is a prov:sprecializationOf the resource
--> prov:specializationOf

Well spotted, thanks.


13)> Since we cannot ensure that the published resource has not suffered any further modifications, :_resultingEntity is also a  prov:specializationOf the resource ex:doc1.
I don't get this reasoning. I agree it is a specialization, as it is
the ex:doc1, but only in the published state - but I don't understand
the "cannot ensure" bit - it would be a specialization if there were
modifications or not. Perhaps the idea being that there could be two
publications that both led to ex:doc1 at different points in time?
Change to:
" :_resultingEntity is also a prov:specializationOf the resource
ex:doc1, as it describes the document after a particular publication"

By "we can't ensure" we meant that since entities are mutable, all the modifications, etc. are done to states of the entity (specializations), and not directly to the entity. I have adopted your change in order to clarify the text.


14) (not important)
Figure 1 and following are blurry when zooming in or printing out. Is
it possible to include the image in a higher resolution or as SVG (but
scale it down with CSS)? For example, see Figure 1 in
http://www.w3.org/TR/prov-o/#starting-points-figure

Done

15)Figure 1 and following use a notation like:
prov:Entity
ex:doc1
it is not clear - beyond the capital letter - what is the identifier
and what is the class. Could styling be used, such as italics on the
classname? (UML uses «guillemets» - but perhaps italics would work
better)

Unfortunately I can't put text in italics in the figures, it would require me to redo them again. I think that this suggestion is not really necessary since with capital letter reads fine. I have looked other documents and they don't even introduce the class.


16)Figure use style _:user_entity but the text uses _:usedEntity.
Suggestion is to unify them as _:usedEntity to match camelCase of
prov-o terms.

Ok Changed in html, figures and files.

17) prov:Entities must exist before being used
< code > style here is misleading -> "PROV entities" without < code >

Fixed.

18)> The mapping is divided in several subsections:
> (..)
> Section 3.4 : Strategies for cleaning up some of the blank nodes produced by the approach presented in Section 3.3.
" :" ->":"

Fixed.

19)Table 3 includes dct:Agent and dct:ProvenanceStatement - but none of
the DCT classes were introduced in Table 1.
Many of the other DCT classes (BibliographicResource,
LicenseDocument, PhysicalResource, etc) are generally mappable as
subclasses of prov:Entity. We should either provide those or say why
we have not provided them (for instance a particular license document
becomes also a prov:Entity as soon as you talk about its provenance
with say prov:wasAttributedTo).
dct:Location should be equivalentClass to prov:Location
prov:Collection subclassOf dcmitype:Collection
(note: dcmitype:Software is NOT a subclassOf prov:SoftwareAgent - as a
script file, C source code etc. are (generally) different from the
active agent of their execution)

Added table with the mappings. Added rationale for those not mapped.


20)I kind of doubt that dct:rightsHolder is about provenance (although
rights could have interesting provenance!), as you could easily be a
rights holder without having any part of creating the resource. For
instance Michael Jackson at some point bought the rights or Beatles
songs, but he later sold those to Sony in 1995 [1]. So does that mean
that a Beatles song from 1967 is attributed to Sony in 1995, because
they are the rights holder? Which activity did Sony participate in?
(Buying the rights?). This is difficult with DCTerms because the
entities are fully mutable.
If this was expanded in section 3.3.1 (prov:RightsAssignment ?) it could be OK.
[1] http://www.snopes.com/music/artists/jackson.asp

The reason was exactly that. More than buying the rights I'd say that the activity is signing the license with the rights, but I guess that buying the rights is also valid. Changes in ownership are what actually motivated the provenance records in art, so we thought it made sense keeping this kind of attribution. I'll expand the complex mapping.


21) dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf.
BLOCKING.
dct:isVersionOf is NOT an equivalent property with prov:wasRevisionOf.
In DC Terms, isVersionOf is a hierarchical attribute, more on the
lines of prov:specializationOf, and does not mandate any time
directionality (thus is not a subproperty of prov:wasDerivedFrom).
Example of hierarchical use:
https://metacpan.org/source/ASCOPE/Net-Flickr-API-1.7/Changes
<http://aaronland.info/perl/net/delicious/Net-Flickr-API-1.7.tar.gz>
       dcterms:isVersionOf <http://aaronland.info/perl/flickr/api/> ;
       dcterms:replaces
<http://aaronland.info/perl/net/flickr/api/Net-Flickr-API-1.69.tar.gz>;
<http://aaronland.info/perl/net/delicious/Net-Flickr-API-1.69.tar.gz>
       dcterms:isVersionOf <http://aaronland.info/perl/flickr/api/> ;
       dcterms:replaces
<http://aaronland.info/perl/net/flickr/api/Net-Flickr-API-1.68.tar.gz>;
And example of its "inverse" dct:hasVersion in use can be found in DCT itself:
>From http://dublincore.org/2012/06/14/dcterms.ttl
dcterms:hasPart
   dcterms:hasVersion
<http://dublincore.org/usage/terms/history/#hasPart-003> ;
   dcterms:issued "2000-07-11"^^<http://www.w3.org/2001/XMLSchema#date> ;
   dcterms:modified "2008-01-14"^^<http://www.w3.org/2001/XMLSchema#date> ;
   a rdf:Property ;
And in http://dublincore.org/usage/terms/history/#hasPart-003 it says
(in HTML): that
   <http://dublincore.org/usage/terms/history/#hasPart-003>
dcterms:replaces
<http://dublincore.org/usage/terms/history/#hasPart-002> .
So here dcterms:hasPart hasVersion both #hasPart-003 and #hasPart-002
- but #hasPart-003 replaces #hasPart-002. This is the same as our
example of specializationOf in the primer -
http://www.w3.org/TR/prov-primer/#alternate-entities-and-specialization.
It would be strange to enforce prov:wasDerivedFrom for such
hierarchical relationships, the BBC frontpage is not (necessarily)
derived from the BBC frontpage today.
On http://dublincore.org/documents/usageguide/qualifiers.shtml we find:
> isVersionOf
>
> Label: Is Version Of
>
> Term description: The described resource is a version, edition, or adaptation of the referenced resource. Changes in version imply   substantive changes in content rather than differences in format.
>
> Guidelines for creation of content:
>
> Use only in cases where the relationship expressed is at the content level. Relationships need not be close for the relationship to be  relevant. "West Side Story" is a version of "Romeo and Juliet" and that may be important enough in the context of the resource description  to be expressed using isVersionOf. The Broadway Show and the movie of "West Side Story" also relate at a similar level, but the video and DVD of the movie are more usefully expressed at the level of format, the content being essentially the same.
>
> See also isFormatOf.
However not all  dcterms:hasVersion / dcterms:isVersionOf
relationships express hierarchical specialization, and so I don't
recommend using prov:specializationOf as superproperty of
prov:isFormatOf.
More current usage and guideline for isVersionOf is provenance-related:
http://wiki.dublincore.org/index.php/User_Guide/Creating_Metadata#IsVersionOf
> This property describes the relationship between the described resource and another resource, that is a former version, edition or   adaptation of the described resource (e.g. the described resource is the revision of a book, or another recording of a song, etc.). Another  version implies changes in the content of a resource. For resources with different formats use isFormatOf. For the reciprocal statement use hasVersion.
As a compromise I therefore suggest instead to say that:
prov:wasRevisionOf rdfs:subPropertyOf dct:isVersionOf
And equivalent for Table 5:
 prov:hadRevision rdfs:subPropertyOf dct:hasVersion

Changed.

22) dct:hasFormat is also subproperty of prov:wasDerivedFrom
dct:hasFormat is defined as:
>  A related resource that is substantially the same as the pre-existing described resource, but in another format.
So the subject is pre-existing.
http://wiki.dublincore.org/index.php/User_Guide/Creating_Metadata#IsFormatOf
has more:
> This property describes the relationship between the described resource and another resource, that is a former version of the described   resource with the same intellectual content but presented in another format (e.g. the described resource is the microfilm version of a  printed book, or the pdf version of a doc document). For intellectual changes between resources use isVersonOf. For the reciprocal statement use hasFormat.
So this is implying that the object has somewhat been formed from the subject.
Therefore dcterms:isFormatOf should be a subproperty of
prov:wasDerivedFrom - in addition to being a subproperty of
prov:alternateOf.
Equivalent for Table 5:
dcterms:hasFormat rdfs:subPropertyOf prov:hadDerivation

I think you are right. Added to the mapping

23) dct:references should be subproperty of prov:wasInfluencedBy
dct:references is made a subproperty of prov:wasDerivedFrom, which
sounds very strong to me. I would use prov:wasInfluencedBy.
> Influence ◊ is the capacity of an entity, activity, or agent to have an effect on the character, development, or behavior of another by  means of usage, start, end, generation, invalidation, communication, derivation, attribution, association, or delegation.
(We don't know the details of how the reference was used).
Equivalent for Table 5:
dct:isReferencedBy rdfs:subPropertyOf prov:influenced

As we discussed in last week's telecon, dct:references is a subproperty of wasDerivedFrom. It might seem a bit strong, but after all the resource referencing the pre existing resource wouldn't have been the same if the preexisting resource didn't exist. No changes done.

24) justification for dct:source
> dct:source    rdfs:subPropertyOf      prov:wasDerivedFrom     dct:source is defined as a "related resource from which the described   resource is derived", which matches the notion of derivation in PROV-DM ("a transformation of an entity in another").
You need to justify why this is NOT an equivalent property. In
SKOS-terms I would call them a skos:closeMatch rather than a
skos:broadMatch; but in OWL/RDFS we don't have that luxury. I do agree
on the mapping you suggest - to make it consistent with the other
mappings. (with equivalent dct:isFormatOf would effectively become a
subproperty of dct:source, which might be odd in DCT). So the
justification should be something like:
> However, prov:wasDerivedFrom also covers broader derivations such as "an update of an entity resulting in a new one" which is not covered  by dct:source.

Added

25) PROV refinements does not include mapping for dct:rightsHolder
See #? above if this should be in or not.

Now it is included


26)> Additional refinements of the PROV properties have been ommitted, since the direct mappings presented in Section 3.1 already define the relationship between both vocabularies.
What does this mean? Rephrase.

I have removed it. It doesn't add anything

27)> The mapping corresponds to the graph in Figure 1 (with small changes for creator and rightsHolder).
I don't understand this. Neither the mapping below nor Figure 1
describes rightsHolder. Figure 1 shows dct:publisher. Rephrase.

Changed to: "The mapping for each term encodes a similar graph to the one presented in Figure 1..."

28)> A creator is the agent in charge of the "Create" activity that generated a specialization of the entity ?document. The agent is assigned the role "creator".
Some use of < code > here would improve readability.
Note: I have not checked the syntax of the SPARQL CONSTRUCTs beyond
reading them.

Done.

29)> In case of publication, a second specialization representing the entity before the publication is necessary:
Why is this necessary? If I write a blog post using Wordpress.com, and
I immediately click "Publish", then there is no "unpublished" entity.
Your argument would otherwise also potentially apply for contribution
- if I contributed to the entity, it must have been created before! In
both cases we would make unfounded assumptions about the contribution
and publication activities.
Remove the need for _:used_entity - you might instead leave a note
that "If it is known that the ?document existed before publication,
for instance as a draft, you may also add:
       _:used_entity a prov:Entity;
                       prov:specializationOf ?document.
       _:activity      prov:used _:used_entity .
       _:resulting_entity prov:wasDerivedFrom _:used_entity .
This also applies to dct:issued.

I don't agee here. If you "publish" the entity you submit the text via a form, etc. to the wordpress platform to publish. That text would be the "usedEntity". The same thing applies for issued. You can contribute to create an entity that has not existed, but in order to make public some content (publish), you need some pre- existing content. Otherwise it wouldn't be a "publish" activity, but a "creation" activity...

30) dct:dateCopyrighted should NOT have a used_entity
Copyright is usually something you have immediately, or are you
arguing there is always an uncopyrightable used-entity first? (Say an
empty document)?
(Note that I'm fine with the used-entity for the remaining cases)

I think that this is similar to my previous argument. You create a resource and then you copyright it. They are different activities. The input of the copyrightable activity could be the text you want to copyright.


31) dct:isReplacedBy/dct:replaces should be subproperty of prov:alternateOf
(and listed in Tables earlier)

I don't agree. If I have a catalog of books/products, etc. and I replace item 4 in the catalog (a travel guide of Madrid) with item 45 (a travel guide of paris), then they are not alternates of each other. However the specialized entities in the catalog MadridGuideAsItem4 and ParisGuideAsItem4 would be alternates of each other (and one is derived from the other). I'll add it in the complex mapping.


32)> However, the derivation relationship cannot always be applied between the original entities, because they could have existed before the  replacement took place (for example, if a book replaces another in a catalog we cannot say that it was derived from it).

I agree - but then why does the query include:

_:new_entity prov:wasDerivedFrom _:old_entity .

Because the derivation exists between the specialized entities (the contextualized entities in the catalog).


33) reosource -> resource
> Property used to describe that the current resource is required for supporting the function of another resource. This is not related the   provenance of the reosource

Fixed


34) dct:date
I think this could be given a complex mapping.
DCT says:
> A point or period of time associated with an event in the lifecycle of the resource.
So perhaps just saying there was an event:
CONSTRUCT{
        _:event a prov:InstantaneousEvent ;
            prov:atTime ?date .
} WHERE {
 ?document dct:date ?date.
}
However, as we don't know the nature of the association between the
?document and the ?date, this is a bit useless, and so if you think we
include this, it should have a note:
Note that the above inference would not generally be considered useful
due to the ambiguity of dct:date (we don't know how the entity is
related to the event), however the above rule is included here for
completeness

It can't do any harm to add it, so I have included it :)