From XG Provenance Wiki
Revision as of 18:08, 6 January 2010 by Ygil
Guidelines to post comments
- Please add your comments using this format --User:Ringo
- Someone mentioned in the call today brought up the issue of "fraud provenance". Paul pointed out that this is related to the issue of verifying attribution, and I agree. But perhaps there is more to this topic. I have been thinking about the distinction between "reported" vs "reconstructed" provenance. Reported provenance is one that is created as the object in question is created, so it reflects the actual origins of the object. Reconstructed provenance is one that is created later by guessing how the object came about. Reported provenance is how we typically think of provenance. Reconstructed provenance is not so much discussed, but a good example is using a plagiarism detection tool to generate a best guess at the provenance of a document that we recognize as not an original. The plagiarism tool in effect takes a best guess at the actual provenance of the document. I imagine that reconstructed provenance may be added by others that were not the actual originators of the object to address incomplete provenance. Perhaps reconstructed provenance could be added as a dimension under content? --User:YolandaGil
- I agree, this distinction is important. In the Linked Data use cases we mainly have reconstructed provenance because we use provenance-related metadata. BTW, the term "reported" seems a bit misleading; I understand reconstructing provenance based on provenance information provided by third parties (via metadata) as provenance reported by these third parties. For this reason, I suggest the term "recorded" provenance instead of "reported". --User:OlafHartig
- I agree as well. Does this also capture the difference between asserted and inferred provenance? I think it does. In one case, the person/agent says this is what the provenance of a data item is and in the other case the person/agent figures infers from a series of facts the provenance of a data item. Yolanda, I actually disagree, I think your "reconstructed" or inferred provenance is actually the primary mechanism for determining provenance in the real world. For example, curators often build up the provenance of a piece of artwork by taking a evidence from a variety of sources (books, signatures, shipping logs, etc..). They then use this evidence to ground their reconstructed provenance. --User:Pgroth
- In some circumstances, it may be important to distinguish between asserted and inferred provenance, particularly if provenance information is intended to be used in a legal context. Thus, it may be useful to characterise provenance information created after the event as:
- agent asserted provenance information;
- third party asserted provenance information; or
- inferred provenance information.--User:Crunnega
- Should we consider adding a "reliability" dimension to the provenance dimensions which would consider the relative reliability of different types of provenance information? For example: Independently verified vs. non-verified; Self-asserted vs. third party-asserted; Trusted source vs. un-trusted source --User:Crunnega
- ACTION: Changed the relevant Attribution item to include the distinctions brought up in this discussion. --User:YolandaGil
- Given the Content category includes dimensions that are concerned with what a provenance description is about I miss a data access dimension here. For data from the Web it is an important part of the provenance how the data was accessed. This includes questions such as: who published this data on the Web (this doesn't have to be the party responsible for creation), which server/service was accessed to retrieve the data, who is responsible for this server, was the retrieved data digitally signed (and by whom). Notice, such kind of questions can also be asked for source data that has been used as input in data creations (given the source data was retrieved from the Web). The Attribution dimension does not seem to cover this; in particular, from the discussion at the end of our last call (11/20/09) I learned that the general understanding of attribution is responsibility for creation (in the sense of Dublin core dct:creator). --User:OlafHartig
- ACTION: Added a "Data Access" category under the Content/Process dimension. --User:YolandaGil
- Suggestion: qualify the use cases/provenance dimensions. I guess this is a follow on from the comment I made during the last call Nov 20, regarding the different scope and context for these use cases. I then remarked that these groups differ from each other in that some describe "ends" while others describe "means to an end". I now see that they are no longer called use cases, they are dimensions. This is fine, however I think it would still help to add a little context to them, for example:
- Content: these categories seem to be of interest to various subclasses of information consumers. In fact, those consumers need not be aware that anything like provenance exists. This to me is a major discriminator
- Management: management of what? to me these are means to an end (i.e., making provenance available per se, presumably to become part of some use case). But I would put dissemination control in the Content category, because that is an end in itself. To me it does not belong in management.
- Use: to me these are definitely means to an end, i.e., technical issues in making provenance available for some user goal in an effective way.
Is this a sensible and acceptable qualification of these dimensions? --User:Pmissier
- Hi Paolo, I'm not sure I agree with the characterization of Use. To me, Use about what users want to do with provenance. I don't think it is necessarily about exposing provenance or about technical issues. It's about what do we want to do with provenance. For Management, it's about management of provenance information itself once you have it. --User:Pgroth
- I have the following suggestions for the provenance dimensions:
- Under "Content", we can add two more categories of provenance content
- Agent - Provenance in form of entities responsible for creation/modification of data. For example, instrument information in scientific experiments
- Spatio-Temporal-Thematic Information - Provenance in form of spatial information (for example, geographical coordinates), temporal information (for example, date, time), thematic or domain-specific information (for example, sample used in a proteomics experiment)
- Under "Management", we can add/modify two categories
- "access" to "Query Mechanism" - Retrieval of provenance information including query formulation/execution mechanism or access to an authorative service
- Provenance Representation (that includes the "interoperability" category from "Use" dimension. I believe provenance interopability is not a use of provenance but a provenance management issue) - representation or modeling of provenance information that includes ability to incoporate domain semantics
- Under "Use", maybe we can change "Commonality" to "Comparison" since provenance information can be used to compute both commonality as well as differences. -- User:Satya
- I believe Agent is already captured under the Attribution dimension.
- I believe Spatio-Temporal-Thematic Information is an aspect of the Process dimension, I am not sure we want to elaborate or highlight these three specific aspects in particular.
- The distinction between finding provenance and querying us useful and should be added.
- I think interoperability is a requirement that comes from users, and although addressing it through provenance management is one possible approach there are alternative solutions. So I would leave it under Use.
- ACTION: Expanded the "Access" dimension to include the distinction between finding provenance and and query formulation/execution aspects. --User:YolandaGil
- ACTION: Renamed the "Commonality" to "Comparison". --User:YolandaGil