Provenance Incubator Group Teleconference

10 Sep 2010

See also: IRC log




<trackbot> Date: 10 September 2010

<YolandaGil> http://www.w3.org/2005/Incubator/prov/wiki/Telecons#Scribing_for_the_Provenance_XG_Group


Pgroth: lots of places where finding commonalities would help --

the starting place is to find a common model

<Luc> +q

that's smt where we can produce very quickly

<Paulo> +q

Yolanda: "we" is just the members of this group

so should a group be formed focused on a model

<ssahoo2> http://www.w3.org/2005/Incubator/prov/wiki/Provenance_Vocabulary_Mappings

<pgroth> but there's more we could recommend than a model?

<pgroth> no

<ssahoo2> maybe with this work?

Paolo: what's the starting point for a common {model,...}

YG: W3C more likely to be keen if we nmake a case where the goal can be accomplished realistically

<jun> i think Paul's is still quite broad and general. we probably need to be more specific

<ssahoo2> and the use cases as motivation

<pgroth> I'm happy to be more specific

<pgroth> :-(

<pgroth> =:-)

Luc: in support of Paul's idea: there are many types of provenance


LM: would like to see a common data model to describe provenance for complex information flows (web)

so need to qualify the type / scope of provenance we are addressing

LM: we have identified many models in this space. Starting from scratch would not be wise

let's instead start from an existing model that can be mapped to others

OPM could be a suitable starting point for (i) data model (ii) ways to access provenance -- on a realistic timeline

<jcheney> I have to go in a minute, but just wanted to suggest something:

YG: efforts to define common models at W3C take time to reach consensus --

would additional industry input give us more focus?

what would the goal be? is integration/interxchange our core goal?

<jcheney> There's "data model" for provenance (where there are several possibilities already), and there are "access" and interchange issues which could be relatively independent of the data model used. Would a working group focus on one or both?

<jcheney> Sorry, got to go.

Paulo: good outcome of our work so far is convergence towards a common terminology -- used to define some of the reqs and elements of the various models we looked at

<pgroth> +q

focusing on terminology may be a more efficient use of our time

Paulo: outcome would be dictionary/ thesaurus etc. -- e.g. terms we have used for our reqs. need formalising
... look at the DC example

<JimM> provenance is critical missing infrastructure

<JimM> in science, business, SOA interactions, social trust, ...

<JimM> provenance is first and foremost about causal connections, but it is critical that it interoperate with descriptive metadata - provenance definitions often differ by how much descriptive metadata is in scope

JimM: (scribing his own verbal contribution :-) )

<JimM> provenance has roots in several domains - workflow, electronic records, arts, library science

<JimM> ProvenanceCC (causal core) - things processed through events under the control of agents within a descriptive context - the nature of processes and things differ by domain...

<ssahoo2> I would beg to differ with Jim here :) - provenance involves relationships beyond causal properties only

<JimM> provenance has network effects - provenance across systems has more value than provenance within a system

<ssahoo2> we have several such examples in biomedicine, sensors etc. - this was the driving reason for including more than causal relationships in the Provenir ontology

<JimM> there are aspects of provenance that are ready for standardization, as well as implications and extensions that will require significant research - many research aspects overlap strongly with other domains (e.g. semantic web, trust, social networking, ...)

ssahoo2: like the idea of starting from a common terminology, a model could be too ambitious

SS: Luc suggests Web focus, but Web is so pervasive, it would not help us focus at all
... agree that OPM can be a starting point for a terminology rather than a data model
... causal relations are not all that there is to provenance -- we can take terms from multiple models, organised around an OPM core

YG: why hasn't this common terminology emerged so far?

<JimM> what is part of a data model that is not defined by terminology (what are we trying to leave out by arguing for terminology?)

SS: mapping activity is a good starting point (data, process, agent)
... one's (provenance) metadata is another's data
... so the def is necessarily app-specific

<Paulo> +q

SS: maybe it's just a matter of time -- with more time a good term. would have been created

Paul (PG): we can recommend ways to access provenance information

irrespective of how provenance is represented

<jun> as well as the publication of provenance inforation

we need to act quickly on this or somebody else will come in with concrete proposals, which may not be as well thought out

<jun> it's quite urgent to have a guide about how to publish and access provenance information on the Web, NOW. but are we ready to jump over the task for creating the common data model?

<pgroth> I think minimum publish + access

YG: multiple recommendations are ok

<pgroth> ack

<DGarijo> what about pointing to models until we find a common terminology?

<pgroth> but we have opm.... so why not use it?

YG: look at the RDF example: simple and quick to specify

<DGarijo> yeah, I agree

<pgroth> i mean it's not perfect but it's there

<Luc> Has W3C defined terminologies before, independently of protocols or data models

<Luc> ?

<pgroth> i don't think so...

Paulo: example of terminology: OPM has its def. of causality relations, others may have a different understanding of causal relationships
... the def of causal relationship is still controversial

<pgroth> in general the w3c recommendations are:

<pgroth> 1) guidance

<pgroth> 2) languages

<pgroth> 3) apis

<JimM> re:causality - I think the issues are more about the types of processes being modeled versus the concept of causality

<JimM> OPM examples are computation/physical process heavy

Paulo: we have now come to have terms where we now better understand each other

<DGarijo> I think that the intention could be represented by the role of the agent baking the cake..

and we need to make accommodations -- for the sake of being able to move on

<Luc> I think that Paulo is in fact making the case that a vocabulary should NOT be defined independently of a data model.

YG: focusing on big research topics such as the notion of causality may be too ambitious

<Luc> I agree with Yolanda. If we decide to standardize a data model, we do not have to talk about causality if we feel this not appropriate.

YG: research interests should be kept separate from practical issues of Web users

<JimM> there are financial causes, mental causes, physical causes ...

<pgroth> he just said that he didn't care

<ssahoo2> Properties in OPM are not adequate for expressing relationhsips in many application domains - hence I had suggested to Luc that we can take the OPM classes and add the named relationships from Provenir ontology

Paulo: avoid using controversial terms/concepts, but choose ones that we agree on, and move on

<pgroth> bio people

<pgroth> see youtube: http://www.youtube.com/watch?v=LVEPdV_warU&feature=player_embedded

<pgroth> :-)

<pgroth> what did we do with all those use cases?

<ssahoo2> exactly - we should have the use cases as motivation

<ssahoo2> a concrete use case: http://wiki.knoesis.org/index.php/Biomedical_Sciences

Paolo: think in terms of priorities

<pgroth> not one of us: http://www.buzzmachine.com/2010/06/27/the-importance-of-provenance/

of the use cases, reqs. et.c

<Paulo> we should indeed avoid controversial terms but we also need to know that certain terms are controversial (and this is something that people may have been silent about)

<jun> we are coming up with something quick and dirty from data.gov.uk

<pgroth> jun: but if that's good, then if everybody used it, the better

<YolandaGil> Jun: do you have a description of that? perhaps that could be considered a candidate starting point?

JM: on causality: there is a form of cause that is common to many of our processes -- mental, financial, computational....
... we just need to find a commonality across these

Paolo: practical criteria for prioritising: what's the most likely aspect of provenance that will be addressed by others if we don't ?

PG: the use cases are our valid starting point. We need to have something that gets use although not perfect. A very simple access model, for example

LM: a point on causality: it's diverting the discussion in a not useful direction -- the open process through which OPM went never criticised the term "causality", this is only recent
... one can give a technical answer, but to avoid controversy we can revise OPM

<JimM> speaking of wording: I think I've been convinced that event would be better than process...

<jun> sounds like a good plan

YG: what aspects of our scenarios would we want to have a common model for?

<pgroth> would circulating drafts help?

YG: leaving it for next call

<ssahoo2> would the scenarios reflect the use cases we have?

YG: what aspects of each scenario would be covered by a common model?

<DGarijo> thanks all, bye

<YolandaGil> Scribe: Paolo

<pgroth> luc, you got a sec?

<michaelp> bye

<YolandaGil> Next week we should analyze our 3 flagship scenarios and see what aspects we should focus on for our recommendations

<YolandaGil> See what aspects would benefit from a common model, what terminology is diverse and needs to be defined, etc.

<YolandaGil> trackbot, end telcon

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2010/09/10 16:15:23 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.135  of Date: 2009/03/02 03:52:20  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/this/Paul's/
Succeeded: s/Paul/Paulo/
Found Scribe: Paolo

WARNING: No "Topic:" lines found.

WARNING: No "Present: ... " found!
Possibly Present: DGarijo JM JimM LM Luc PG Paolo Paulo SS YG Yolanda YolandaGil jcheney joined jun michaelp pgroth prov-xg ssahoo2 trackbot
You can indicate people for the Present list like this:
        <dbooth> Present: dbooth jonathan mary
        <dbooth> Present+ amy

Found Date: 10 Sep 2010
Guessing minutes URL: http://www.w3.org/2010/09/10-prov-xg-minutes.html
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.

WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report

[End of scribe.perl diagnostic output]