Provenance Working Group Teleconference

03 Jun 2011

See also: IRC log




<trackbot> Date: 03 June 2011

<sandro> scribe: tlebo

<dgarijo> http://www.w3.org/2005/Incubator/prov/wiki/File:Provenance-XG-Overview.pdf

<GK_> Having trouble with conference passcode again

yolanda: notes the final report.

<dgarijo> the link to the final report: http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/

slide 3

trust, what things are and what they mean, how it was collected. CLOSED SYSTEM - we know it all and trust it.

<GK_> Provenance: needed for operating in an open information system. Make implicit expectations of closed system explicit.

contrast with OPEN SYSTEM - harder to use it because many contribute that you do not know.

consumer: how can I trust what I see?

<GK_> (Slide 3)^

Yolanda listing examles of multiple sources from which we collect evidence. who created it, who is responsible, whom do I attribute?

how old, who is managing repository? how can we veify these aspects?

slide 4

in business - how do we ensure compliance with processes. e.g., outsourcing and getting results.

in science - how are results obtained? papers can get retracted.

in news -

<GK_> Wondering how much interaction is there between work on provenance and work on trust in open systems (e.g. trust conferences, etc.

in law and IP - who owns or has released document with what permissions?

slide 5

TBL's oh yeah button quote 1997

trust at the top of the layer cake.

slide 6

provenance need quotes.

slide 7

open government

John Sheridan UK National Archives data.gov.uk "Provenance is the number one issue that we face when publishing governmetn data in data.gov.uk"

being able to qualify what the data means.

slide 8

provenance in science. not being able to reproduce results.

research forensics - people that dissect publications failing to reproduce results. e.g. clinical trials being done are based on false results.

e.g. Nobel prize winner's paper was retracted becuase couldn't be reproduced (not the prize paper)

some think "provenance is a no brainer; just do it :-)"

slide 9

work done in incubator group

slide 10

<GK_> IMO, If we can't make it a (nearly) a no-brainer for developers, we'll struggle to make it happen

people don't know how to approach provenance.

linked data community if facing the problem - querying the linked data and getting triples that don't make sense. what text extraction tools produced them?

scattered terminology, confounded with "trust"

<GK_> Before "provenance", there was a fair amount of SemWeb interest in "Context"

increased interest in provenance: Luc claims 1/2 of provenance papers published in last two years.

slide 11

incubator group: state of art and develop road map

slide 12

slide 13

shared definition done at VERY END of group's work.

summarized 30 use cases by using 3 flagship scenarios

reviewed existing provenance vocabularies.

numbers (11/15) are dates

slide 14

<jorn> (month/day)

(slide assumes audience knows period of activity)

<GK_> I'd quite like to take this definition, and notes, into the WG work

provenance is the infrastructure that provides the BASIS to decide trust, verification, etc.

trust algorithm operate over provenance records.

provenance assertions of provenance assertions

inference to handle incompleteness and errors.

different accounts for same resource.

slide 16

Three major dimensions to use to think about provenance.

Dimension 1 - content = what are we representing?

(5 types of Dimension 1, Content: attribution, process, evolution and versioning, justification for decisions, and entailment)

Dimension 2 - Management

<GK_> @tlebo, still talking to (1) content, I think

(4 types of Dimension 2, Management: publication, access, dissemintation control, scale)

(@GK_ sorry, I confounded Data Access and Access)

I know 2) Mangement - Access as "Discoverability and Accessibility"

slide 17

Dimension 3 - Use includes (Understanding, interoperability, comparison, accountability, trust, imperfections, debugging)

<paolo> just muted myself, sorry

3 Dimensions are a framework to think about provenance issues.

slide 19

30 use cases from the community

<GK_> I've wrestled with these 3 dimensions; still not completely sure, but seems to be (1) what does provenance consist of; (2) how make provenance available; (3) what can I do with provenance once I get it?

spent a lot of time defining how to structure use cases.

slide 21

3 flagship scenarios

slide 22

blogging news company needs to produce truthful and quality reports.

tweets of panda, NYTimes journalist - all different sources that the blogging news company can use.

<jcheney> By the way Yolanda there are slides for the Disease Outbreak scenario at: http://www.w3.org/2005/Incubator/prov/wiki/Analysis_of_Disease_Outbreak_Scenario

did the tweeter modify the image of the panda?

<jorn> "without getting sued" :)

manage heterogenous provenance records. how to present them, how to expose more details.

slide 25

disease outbreak

different communities analyzing the outbreak

slide 26

business scenario - how does a company show that they complied with a contract? letting the consumer run verification procedures.

keeping some processes proprietary, but not breaking the verification.

slide 30

start of art report

slide 31

areas of research and application for provenance

slide 32

Luc's survey

(I organized the mappings at https://spreadsheets.google.com/spreadsheet/ccc?key=0ArTeDpS4-nUDdFBrQ3ZJMXROUHh4SmxRUVE5V0QwbVE&hl=en_US#gid=0)

yolanda enumerating the provenance vocabularies

<jorn> provenance surveys in literature: http://www.w3.org/2005/Incubator/prov/wiki/Provenance_Survey

origina mappings that Yolanda mentioned: http://www.w3.org/2005/Incubator/prov/wiki/Provenance_Vocabulary_Mappings#Mappings

slide 34

short vs longer term recommendations for next steps.

reproducability should be longer term

open to questions

GK_: relationships to other work? Trust in open systems. Has provenance work interacted with work in trust in open systems and the Trust Conferences.

Yolanda: published survey of Trust in CS and semweb 3/4 years ago. on prov-xg wiki state of the art report.

<jorn> http://www.w3.org/2005/Incubator/prov/wiki/Provenance_Survey

trust: can you trust a certain entity. Can I authenticate to give access. Develop algorithms that I trust you and you trust another (transfer of trust) PLENTY of work this.

LESS work on "can I trust this content" (as opposed to "can I trust this entity"

trust you on movie recommendation or using one road over another.

content-based trust research is quite narrow.

trusting agents vs. trusting content.

Yolanda: many say doing provenance is easy, just make a schema and do it.

but the content in the provenance record is one, but how do you access, manage, and use those records?

it requires many considerations.

need for standards - many systems that track provenance by themselves, but how can other systems get, read and use those records?

need provenance in an open system where you don't have full control.

<Zakim> GK_, you wanted to test understanding of dimensions

<dgarijo> not only that, but provide guidelines for publishing provenance should be important too

GK_ how do 3 dimensions apply to doing a user requirements analysis? "what, how, and why" a fair reflection?

yolanda: yes

tlebo: scientific apps? observation and measurements?

yolanda: use case 2, but there are MANY sociological aspects within that scientific process.

tlebo: is there a nugget of observation and measurement within the disease outbreak flagship scenario?

yolanda: yes.

pgroth: notion of objects

<Luc> thanks Yolanda!

<jun> thank you very much Yolanda!

+1 for Yolanda being helpful!

<pgroth> +1 thanks

<jorn> yupp, thanks a lot :)

<GK_> Thank you Yolanda.

<paolo> thank you once again, Yolanda!

<dgarijo> thank Yolanda!

<dgarijo> *thanks

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.136 (CVS log)
$Date: 2011/06/03 16:00:30 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.136  of Date: 2011/05/12 12:01:43  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/infernece/inference/
Succeeded: s/Dimention/Dimension/
Found Scribe: tlebo
Inferring ScribeNick: tlebo

WARNING: No "Present: ... " found!
Possibly Present: GK_ IPcaller ISI P11 P14 P2 P22 P24 P7 P8 P9 Provenance StephenCresswell Yogesh YolandaGil aaaa consumer dgarijo frew jcheney joined jorn jun luc paolo paolo_ pgroth prov sandro stain tlebo trackbot trust yolanda zednik
You can indicate people for the Present list like this:
        <dbooth> Present: dbooth jonathan mary
        <dbooth> Present+ amy

WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Found Date: 03 Jun 2011
Guessing minutes URL: http://www.w3.org/2011/06/03-prov-minutes.html
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.

[End of scribe.perl diagnostic output]