Talk:Requirements Clean

From XG Provenance Wiki
Jump to: navigation, search

Place comments here. Please identify yourself when making the comment. Thanks, Paul Groth

Paolo: firstly, thank you for doing this. It's a complex task and it's coming along well :-)

A general comment is that the use cases are quite lenghty and I struggled to extract the parts that are relevant, ie. map to specific user requirements etc. It may be useful to add structure (just bullet lists?) and maybe interleave refs to requirements right in the text? when you write "Alice wants to ..." that would often be naturally linked to one or more reqs. Addedd structuring of this type would greatly help!

Another general comment to facilitate navigation through the large amount of reqs, is to prioritize them and identify the top-k that one should really not miss..

BlogAgg comments:

  • some reqs in the list appear to be very similar, for example:
    • C-Attr-UR 1 and C-Attr-UR 2,
    • C-Vers-UR 1, C-Vers-UR 1b

since the use cases are being consolidated, maybe some of these reqs could be consolidated, as well?

  • some are more general than others, i.e. U-Inter-UR3, U-Tru-UR 1 are quite general
  • U-Debug-UR 1 is a debug req., however it seems quite central to the core purpose of the BlogAgg system
  • this one is a bit pedantic:

some of the actions attributed to BlogAgg seems to be quite knowledge-intensive, but it seems that it is the BlogAgg system that does them "BlogAgg wants..." shouldn't these be user actions enabled by BlogAgg?

Biomedical Scenario comments:

  • This is interesting: C-Proc-UR 3: A process should be reproducible from its provenance graph. Is anyone doing that? (other than in Taverna :-))
  • I'm not sure I understand this req.: U-Imper-UR 1: Allow users to access provenance information even if it cannot be directly observed.

Engineering Contract Scenario:

  • M-Pub-UR3: Choose a representation format to publish provenance information -- I thought the case was more related to filtering the type of data and associated provenance information to be released, rather than its format?
  • M-Acc-UR 1: Given an entity, the user must be able to query a single source or federation of sources for provenance information directly applicable to that entity. Why is this relevant?

on the requirements discussion

  • Management: I think the notion of views over provenance graphs lurks behind this note (which now features very prominently on this page! :-)), but I don't see provenance views explicitly mentioned anywhere?
The scale of provenance information is a major concern, as the size of the provenance records may by far exceed the scale of the artifacts themselves. Despite the presence of large amounts of provenance, efficient access to provenance records must be possible. Tradeoffs must be made regarding the granularity of the provenance records kept and the actual amount of detail needed by users of provenance. For instance, in the Biomedical Scenario, the complete provenance about the result of an analysis may include a huge portion of the biomedical literature.

In other respects, the "use cases" section reads rather like part of a paper in the making, which I think will require a few more revisions, but is a sensible think to do.

Response from Paul

In terms of the length of the use case, we now provide brief summaries. I think they are the required length to capture enough of the requirements. The exact requirements listed at the end of the use cases was there for our tracking. This will not be there in the document we distribute.

BlogAgg wants.. means the website (or its developers) want to accomplish particular functionality.

Your ntoe on "views over provenance graphs" is addressed in the Use section of the discussion:

An important challenge that we face is to allow for multiple levels of abstraction 
in the provenance records of an artifact as well as multiple perspectives 
or views over such provenance

Comments by Luc

Some comments from Luc. Thanks for the good work. I like the flagship use cases.

A few comments on the use cases:

  • BlogAgg: At the end, we could add a paragraph along the line: BlogAgg would themselves make visible the process they followed in order to create the information associated with the visual seal.
  • Biomedical scenario: there are a couple of occurrences of the word provenance, which need to be removed.
  • Can the text of the scenario make explicit that reproducibility/auditability of scientific results is crucial (e.g. to avoid another climategate)? Note this appears in the list of requirements, but not in the text.
  • Engineering Contract Scenario: I was not sure that the last paragraph referred to a provenance problem, but a problem of data integrity. Couldn't we simply add cryptographic hashes to the data to solve the problem?

A few comments on requirements:

  • Some "glue" is needed here, i could not really understand what this section was about when reading the introduction.
  • In the paragraph on 'evolution and versioning', there is a discussion as to whether full provenance should be published with updates, etc. Isn't this a management question?
  • While I agree that provenance should be the subject of dissemination control, I thought the requirements on "dissemination control" were about using provenance to check/control/enforce dissemination of information. Likewise for policies, and privacy protection.

A few typos:

  • trending topic -> trendy topic
  • its sweating -> it is sweating
  • anonmyzation
  • documentatation

Response to Luc from Paul

For BlogAgg, we now have

By clicking on the seal, the user can inspect how the post was 
constructed and from what sources and a description of how
the trust rating was derived from those sources. 

Biomedical: there's only one mention of provenance and it seems consistent. The scenario states: The results need to meet a high standard of evidence so that the biologists can publish them in peer-reviewed journals. Is this enough on the actability front?

Engineering Contract: I think it's nice to show that data integrity has an overlap with provenance and it make the scenario seem reasonable.

I think the new introduction provides some better "glue" between the two sections. I think your right that you can use provenance for dissemination control but with respect to management I think we meant management of the provenance information itself.

Response from Simon: Regarding the Contract (3rd) Scenario, the last paragraph is intended to be not just about data integrity but also semantics of past time information, drawn from the Timeliness use case. I think this is important from a provenance perspective: while a cryptographic hash would tell you that the data hasn't been altered, meeting the main concern of Customers Inc., it does not tell you how to interpret the provenance data, which is the question being asked by BWF.

Comments by Yogesh

Use case 1.

The use case does not seem to cover requirements

  • “C-Vers-UR 2b: Determine and record when content was changed” and “C-Entail-UR 8: Identify the date and time of the derivation”: This can be fixed by adding a clause on needing to know when the image was changed, or for using temporal ordering to find out whether an image is a source image or modified one.
  • “M-Pub-UR4: Users need to identify who published the provenance information”: The “seal” that the user clicks on can also show where the provenance for the content comes from. Motivating visualization of provenance through graphs may also be appropriate here.
  • “U-Under-UR4: Enable users to approach the provenance graph at different levels of detail,”: We can have different details of provenance presented to BlogAgg experts and public users.

Use case 2.

  • Improve intro.
  • Though the scenario is apt, the tie-ins between the use case and the requirements are less clear than in the previous use case. Rather than giving the context and leaving the requirement to interpretation, they should be succinctly and explicitly stated in the use case as before. Many of the requirements only seem to vaguely emerge from the use case, and some are clearly absent. Is this use case trying to cover more requirements than it should?

Use case 3.

Design Contract seems a more appropriate title than (Software) Engineering Contract

The following requirements can be covered better

  • “C-JUST-UR3: The justification should be preserved so that the actual long-term behavior of a product, or effects of a policy can be compared with predictions” can be strengthened by mentioning a duration for which the provenance data should be stored (e.g. Statutes of Limitations for which contract can be enforced).
  • “M-Acc-UR 3: query a single source or federation of sources” can be motivated as part of combining the design from multiple sites when designing a new site. The provenance for multiple sites and their contracts need to have federated access.
  • “M-Pub-UR3: Choose a representation format to publish provenance information” can be introduced by stating that the proof is shared with Customer’s Inc. in a format with both parties.
  • “Identify agents (e.g., humans and software components) responsible for conclusion derivation”


  • Object: We may also want to identify a particular version of the artifact. E.g. contract number is insufficient. Are “services” considered as artifacts or processes?
  • Process: The machine learning example repeatedly referred to is not actually mentioned in the Biomedical use case.


Provenance representation language: should we also acknowledge that multiple languages must be able to coexist and interoperate?


Are we mentioning the need for provenance specific query constructs/functions?

Some Typos

“or the usage restrictions are on content” -> “or the usage restrictions on content”

“It thus this by aggregating” -> “It does this by aggregating”

“information is is not” -> “information is not”

“thousands to hundred of thousands a sites” -> “thousands to hundreds of thousands of sites”

“content can repurposed” -> “content can be repurposed”

“anonmyzation” -> “anonymization”

“questionaire” -> “questionnaire”

“is faulty due to proper quality control” -> “is faulty due to improper quality control”

“many kinds of entity” -> “many kinds of entities”

“documentatation” -> “documentation”

“argumentation” -> “arguments”

“whether the license the panda” -> “what license the panda”

Consistently use UK or US spellings (e.g. artifact vs. artefact, summarised)

Response to Yogesh from Paul

Use Case 1:

For the time of derivation, I think this is covered by the notion of expiring license. As suggested the seal shows where the information comes from. The level of details is covered in the Biomedical use cases.

Use Case 2: I think the use case has been improved from last time. I don't think the use case needs to be that explicit as requirements are discussed in the next part of the document.

Use Case 3: The tile has been changed. Added a sentence about common formats.

There is a broader question of query federation, I think this may be a technical requirement and not a user requirements.