Difference between revisions of "ProvenanceAccessScenario"

From Provenance WG Wiki
Jump to: navigation, search
m (Abstract Description: Re. http://www.w3.org/2011/prov/track/actions/29)
(ACTION-29: Rephrase into user scenario and questions about access)
Line 39: Line 39:
 
The text ''There was a lot of crime in London last month.'' is a minimal portion of the hypothetical article '''art1''' from the [[ProvenanceExample|Data Journalism Example]]. All other components of '''art1''' (i.e., the incidence map, chart, and  photograph) are excluded from this scenario so we can focus on ''access'', not ''modeling''.
 
The text ''There was a lot of crime in London last month.'' is a minimal portion of the hypothetical article '''art1''' from the [[ProvenanceExample|Data Journalism Example]]. All other components of '''art1''' (i.e., the incidence map, chart, and  photograph) are excluded from this scenario so we can focus on ''access'', not ''modeling''.
  
'''Obtaining document D'''
+
*'''Obtaining the document D'''
 +
**''Variety of sources'': In this scenario, the document D can be obtained from the web at ''multiple locations'': [https://github.com/timrdf/vsr/raw/master/data/source/tim-lebo/london-crime/version/2011-Jul-07/manual/crime.html one] '''(D1)''' or [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/att-0031/crime.html  more] '''(D2)'''.
 +
**''Variety of document forms'': To emphasize the fact that the ''text'' of document D (i.e., ''<nowiki><html><body><p>There was a lot of crime in London last month.</p></body></html</nowiki>'') can take a variety of forms, an [https://github.com/timrdf/vsr/raw/11e4a2ab0315ba7808a393e2a0b5f69f952b7235/data/source/tim-lebo/london-crime/version/2011-Jul-07/manual/crime.png ''image'' on github] '''(D3)''' and an [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/0032.html email's] [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/att-0032/crime.png image attachment] '''(D4)''' are available.
 +
**''Variety of source types'': The same document D available on the ''web'' (D1-4) and accessed through a web browser can also be obtained through ''email'' as an attachment (e.g., the two emails with [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/0031.html html] '''(D5)''' and [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/0032.html png] '''(D6)''' attachments sent to public-prov-wg@w3.org) accessed through an email client. These different forms of the document D can also be on your local hard drive or ''USB drive'', such as the files ''//acme/downloads/crime.html'' '''(D7)''' and ''e:\usbShare\crime.png'' '''(D8)''', accessible through a command shell.
  
In this scenario, the document D can be obtained from the web at
+
*'''Enacting the "Oh yeah?" feature'''
[https://github.com/timrdf/vsr/raw/master/data/source/tim-lebo/london-crime/version/2011-Jul-07/manual/crime.html one] or [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/att-0031/crime.html  more] locations. The same document D can also be obtained as an email attachment (e.g., in the [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/0031.html email sent] to public-prov-wg@w3.org). Copying from a USB drive is a third way to obtain the document D.
+
** A user requests a web browser '''(W)''' used to obtain the web documents (D1-4) to enact the "Oh yeah?" feature on the whole documents
 
+
** A user requests an email client '''(E)''' used to obtain the email attachments (D5-6) to enact the "Oh yeah?" feature on the whole documents
'''Variety of forms'''
+
** A user requests a command shell '''(S)''' used to obtain the files (D7-8) to enact the "Oh yeah?" feature on the whole documents
 
+
To emphasize the fact that the text of document D (i.e., ''<nowiki><html><body><p>There was a lot of crime in London last month.</p></body></html</nowiki>'') can take a variety of forms, an [https://github.com/timrdf/vsr/raw/11e4a2ab0315ba7808a393e2a0b5f69f952b7235/data/source/tim-lebo/london-crime/version/2011-Jul-07/manual/crime.png image on github] and an [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/0032.html email's] [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/att-0032/crime.png image attachment] are available.
+
 
+
'''Starting with the basics, and adding provenance'''
+
 
+
Document D is as simple as we could make it. Then we set them up on the web and in email. They can also be on your hard drive. So what do we have to add to enable access to provenance for these four files ([https://github.com/timrdf/vsr/raw/master/data/source/tim-lebo/london-crime/version/2011-Jul-07/manual/crime.html html], [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/att-0031/crime.html html], [https://github.com/timrdf/vsr/raw/11e4a2ab0315ba7808a393e2a0b5f69f952b7235/data/source/tim-lebo/london-crime/version/2011-Jul-07/manual/crime.png png], [http://lists.w3.org/Archives/Public/public-prov-wg/2011Jul/att-0032/crime.png png])?
+
  
 +
*'''Accessing the provenance'''
 +
** What information '''(I)''' do the clients (W, E, S) need in order to access and retrieve the provenance for the documents D1-8? The provenance ''may'' have access control.
 +
** Where does the information (I) come from for the different forms and sources of the document?
  
 
[[Category:Scenario]]
 
[[Category:Scenario]]
  
 
[[Category:Discussed at F2F1]]
 
[[Category:Discussed at F2F1]]

Revision as of 14:49, 14 July 2011

Created by the Access and Query Task Force at the F2F1 to allow the WG to compare the different access proposals prepared and discussed at the meeting.

Abstract Description

  • A user obtains a document (D). The initial scenario will focus on an html document without inclusions such as javascript or images.
  • The client software (browser, email client etc.) offers an "Oh yeah?" feature, by which provenance (P) of the document is accessed and maybe retrieved by the client software.
  • Provenance for the complete document is accessed from the document provider as well as from third-parties.
  • What does the client do when the feature is enacted?
    • what information (I) does it need in order to perform the retrieval/access of provenance?
    • where does information I come from?
  • We should consider that document (D) was downloaded from the web, obtained from an email attachment, or found on a USB stick.
  • We should consider that access control over provenance may be required
  • Multiple formats for provenance may be available from the provider or third parties. The "Oh yeah?" feature may want to select which format to retrieve.


Issues to consider in the future:

  • Could we rephrase without reference to provenance, say it refers to trust?
  • Should we consider getting the provenance of the whole document or part of it?
  • It is out of scope to reconcile potentially conflicting provenance
  • How to retrieve provenance partially

Scenario rationale

  • email/USB don't have a URL
  • http/email have in band metadata

Concrete Example

As a concrete example for this access scenario, we start with a minimal document D whose text is:

<html>
   <body>
      <p>There was a lot of crime in London last month.</p>
   </body> 
</html>

The text There was a lot of crime in London last month. is a minimal portion of the hypothetical article art1 from the Data Journalism Example. All other components of art1 (i.e., the incidence map, chart, and photograph) are excluded from this scenario so we can focus on access, not modeling.

  • Obtaining the document D
    • Variety of sources: In this scenario, the document D can be obtained from the web at multiple locations: one (D1) or more (D2).
    • Variety of document forms: To emphasize the fact that the text of document D (i.e., <html><body><p>There was a lot of crime in London last month.</p></body></html) can take a variety of forms, an image on github (D3) and an email's image attachment (D4) are available.
    • Variety of source types: The same document D available on the web (D1-4) and accessed through a web browser can also be obtained through email as an attachment (e.g., the two emails with html (D5) and png (D6) attachments sent to public-prov-wg@w3.org) accessed through an email client. These different forms of the document D can also be on your local hard drive or USB drive, such as the files //acme/downloads/crime.html (D7) and e:\usbShare\crime.png (D8), accessible through a command shell.
  • Enacting the "Oh yeah?" feature
    • A user requests a web browser (W) used to obtain the web documents (D1-4) to enact the "Oh yeah?" feature on the whole documents
    • A user requests an email client (E) used to obtain the email attachments (D5-6) to enact the "Oh yeah?" feature on the whole documents
    • A user requests a command shell (S) used to obtain the files (D7-8) to enact the "Oh yeah?" feature on the whole documents
  • Accessing the provenance
    • What information (I) do the clients (W, E, S) need in order to access and retrieve the provenance for the documents D1-8? The provenance may have access control.
    • Where does the information (I) come from for the different forms and sources of the document?