Provenance Incubator Group Teleconference -- 18 Dec 2009

<trackbot> Date: 18 December 2009

trackbot, prepare telcon

<trackbot> Meeting: Provenance Incubator Group Teleconference

<trackbot> Date: 18 December 2009

Discussion of new batch of use cases (led by Simon Miles and Satya Sahoo)

- http://www.w3.org/2005/Incubator/prov/wiki/Domain_Specific_Provenance_1

- http://www.w3.org/2005/Incubator/prov/wiki/Domain_Specific_Provenance_2

- http://www.w3.org/2005/Incubator/prov/wiki/Use_Case_private_data_use

Coverage of provenance dimensions by current use cases (led by Simon Miles and Yolanda Gil)

Planning for next meeting, agenda and scribe (led by Yolanda Gil)

Review of action items (by scribe)

<YolandaG> Thanks for doing this Irini!!

<crunnega> Christine on the phone too

yolanda: Look at the provenance dimensions and use cases and how to organize the use cases and provenance dimensions

Satya will be covering 2 use cases

Biomedical one.

<ssahoo2> http://www.w3.org/2005/Incubator/prov/wiki/Domain_Specific_Provenance_2

<mccuskej> same here

Use case inspired from experiments. Combine data from different sources and databases

Manual Extraction and NLP techniques

Basic issue is whether a particular instrument has been used.

Interpretatioon query and experimentation results.

Types of data: curated data with high quality. But, data from prediction algorithms does not have the same quality as the human curated data.

Examples/Sets of Goals in the Use case: exhanging data between groups, essential to understand the process and the instruments used.

Get administrative data (instruments etc.)

Standard queries in provenance scenario to be answered.

Important to add information that is important to understand and interpret results

Storing and querying efficiently provenance information is a big issue

yolanda:

yolanda thinks that a general problem is the presence of experimental data and with no provenance such data has a limited use.

Question: how do we capture and represent provenance information to be used later on.

yolanda thinks that there is a more general problem that is important.

yolanda's question: in terms of provenance does it mean that there is a provenance query engine that searches the web that will be looking for all experimental data with provenance and it will return these results?

satya's answer: information is linked to the experimental results and the results are tracked back (provenance within a lab)

<JimM> a way to register to get updates to prov would address this - a trackback service

yolanda: what happens in a data exchange or data integration scenario? what is the scale?

Satya: scale of provenance information increases

<JimM> (there was an IEEE Escience 2009 presentation doing this for citations)

James: results in social sciences used in policy decisions.
... do they exist regulations that must be satisfied in the biomedicine domain?

<JimM> Pharma and analytical chemistry labs would be under FDA and other regulations

Satya: no legal requirements except the fact that journals want to have the dataset used in the papers published

Yolanda: good practices exist but not in the form of regulations

<JimM> legally acceptable records were an interest expressed via censa.org in the context of e-notebooks

Satya: argument from the community is that they want to maximize the publications before releasing the dataset

Yolanda: another argument is that it is too much work to capture all the information
... as a group can we facilitate and production of provenance information?

Satya: 2nd Use case

Use Case from Paolo.

They want to enhance the provenance information from a workflow enviromnent

highlight from the use case domain specific metadata for provenance

provenance trail from workflow must be extended with provenance annotations

specific challenge how to best to associate unstructured provenance with domain specific provenance.

<JimM> the key issue with annotaions is that they need to be part of the account structure, i.e. they are things being asserted

Satya: workflow based infrastructure associated with the domain specific vocabularies

can domain specific ontologies be used to annotate the trail of workflow process?

JimM: we need to be able to have an assertion structure for provenance metadata
... in a provenance discussion we need to deal with named graphs, reasoning, in order to be able to answer questions related to implicit information

JimM: we need to be able to make assertions across sources

Luc: not sure he would describe those as a provenance use case. To Luc, a provenance use case should solve a query of the user.
... the use cases state that the users want to just query the provenance but not why.
... 2nd Use case: not a functional requirement for provenance
... Use case should not be defined in terms of provenance

<mccuskej> +q

<YolandaG> -q

<YolandaG> -q JimM

<YolandaG> -q Luc

<JimM> i'd be curious to hear more about why named graphs are insufficient...

-Irini

<ivan> I guess this is the paper Irini referred to: Fundulaki, Irini, Vassilis Christophides, Giorgos Flouris, and Panagiotis Pediaditis. "On Explicit Provenance Management in RDF/S Graphs." In First Workshop on the theory and practice of provenance, TaPP'09, San Francisco, CA, James Cheney. San Francisco, CA, 2009. http://www.usenix.org/events/tapp09/tech/full_papers/pediaditis/pediaditis_html/.

Yes, thanks Ivan.

Satya: 3rd Use case

<Luc> http://www.w3.org/2005/Incubator/prov/wiki/Use_Case_private_data_use

Luc; 3rd USe Case [ Use of private data]

Regulations for the use of private data, data protection acts

the use case refers to the provenance dimensions for accountability

processes use information compatible with rules/regulations

able to audit systems that process private information.

check whether the use of data was legal

whether the colleciton of data was lawful

the problems with the scenario:metadata representation (all possible notions: tasks, obligations, etc.)

for this SW technologies

another problem: provenance management: processing has to be documented so there is the need for a common documentation and provenance models (interoperability issue)

auditing the provenance in order to perform the auditing task

the results of the audit can be trusted if the provenance can be trusted

-Irini

-q

cryptography hashes as part of provenance

checking the provenance against rules and this is a provenance use issue

<crunnega> +q

JimM: the audit can be done only if provenance is reconstructed

trail is going to be broken by the different playes

players

<crunnega> There may be a business advantage in being able to reassure customers that their priviacy policies and practices can be verified

provenace could give some hints on the problem but not explanation of what has happened.

partial provenance could nail down where the leak has hasppened

(from JimM)

Yolanda thinks is very controvercial to create a use case to highlight compliance

Luc; the primary dimension is accountability which is not necessarily compliance.

Do not want to enforce compliance just be able to have accountability

A different use case: compliance to processes

crunnega: number of use case scenarios for privacy that could use provenance

Personal Data/Private Data equivalent terms.

<jcheney> = confidential data?

Yolanda takes the floor:

Yolanda plans to talk to Simon to go through provenance dimensions and use cases

Invitation to members to join and see the coverage of use cases

Missing half of the expected set

F2F meeting: most popular venue WWW, 2nd Meeting in NYC

<YolandaG> http://www.w3.org/2002/09/wbs/43897/FindingTimeforF2F/results

Considering both venues WWW, IPAW

<mccuskej> I can't log into that page.

<JimM> +1 for two mtgs

Possibility to join on phone.

<ivan> i do

end of April will be reasonable. IPAW could be a good idea.

Next Meeting, January 8th

trackbot, end telcon

- DRAFT -

Provenance Incubator Group Teleconference

18 Dec 2009

Attendees

Contents

Summary of Action Items

Scribe.perl diagnostic output