Provenance Working Group Teleconference

26 Jan 2012


See also: IRC log


Curt_Tilmes, Luc, [IPcaller], +1.443.708.aaaa, +1.646.389.aabb, tlebo, +1.518.633.aacc, MacTed, jcheney, davidschaengold, Satya_Sahoo, kai
Graham_Klyne, Paolo_Missier, Khalid_Belhajjame, Daniel_Garijo
Paul Groth


<trackbot> Date: 26 January 2012

<scribe> scribe: Curt

<pgroth> http://www.w3.org/2011/prov/meeting/2012-01-19

<pgroth> PROPOSED to accept the minutes of the Jan. 19 telecon

<satya> +1

<davidschaengold> +1

0 (not present)

<Christine> 0 (not present)

<kai> 0 (not present)

<smiles> +1

<jcheney> +1

<pgroth> Accepted: minutes Jan 19 telecon

pgroth: next week, F2F, lots of scribes :)
... actions: satya reviewing issues

satya: will try to respond to each on list, but time is short, progress on many of them
... many already addressed, satya just needs to review and make proper recommendations

F2F prep document updates

pgroth: going through documents to determine status and if changes are needed before F2F
... prov-primer

working out updates needed, not changed since last editors version

satya: rdfs already provides way to do annotations, not currently modeled like that
... trying to bring everything into sync with prov-o and prov-dm in primer,

pgroth: prov-aq
...: Graham has made changes responding to most of issues, a few issues need discussion at F2F and after
... in good shape for F2F

pgroth: prov-dm

luc: third working draft to release today for F2F

pgroth: prov-o

many issues addressed at prov-o working group level, some still need whole WG to discuss

current version has edits

luc: no update for precise/imprecise derivations

satya: still under discussion, consensus not yet determined

luc: some decisions made

satya: progress has been made, but some things still unclear, need more discussion

pgroth: prov-sem

jcheney: not much changed recently, watching prov-o domain of discourse discussion, which may have an impact
... waiting for final determination to incorporate
... a few more things to flesh out that will happen prior to F2F

pgroth: most documents in reasonable sync. given work that has been done

Prov-dm for the 3rd working draft

<Luc> http://dvcs.w3.org/hg/prov/raw-file/default/model/ProvenanceModel.html#changes-since-second-public-working-draft

luc: work on complement, specialization, examples, derivation, collections, restructuring, new section 7 with constraints on data model
... ... agent and hadPlan

<pgroth> Proposed: Release Prov-dm as a third working draft

<smiles> +1

<jcheney> +1

<MacTed> +1


<kai> +1

satya: is the 3rd WD to reflect universe of discourse discussion identifiers?

luc: no, those aren't incorporated yet, those will go into the 4th WD, identifiers and accounts
... too many changes to incorporate, still determining final agreement on identifiers/accounts, may take a while

<satya> +1

satya: yes, those may have broad impact

<pgroth> Accepted: Release Prov-dm as a third working draft

satya: good to freeze changes at a defined point and release a good draft
... we should follow that model for prov-o

pgroth: required by W3C to release each 3 months

luc: good to have well-defined goals for each release

Identifiers in Prov-dm

<Luc> http://www.w3.org/2011/prov/wiki/UniverseOfDiscourse

<Luc> I hope I included all the votes (I just added James')

<pgroth> *All* objects of discourse ("entities") MUST be identifiable by all

<pgroth> participants in discourse. Object descriptions ("entity records" and

<pgroth> otherwise) SHOULD use an unambiguous identifier (either reusing an

<pgroth> existing identifier, or introducing a new identifier) for the objects

<pgroth> described." (intent)

pgroth: a series of items were considered to determine what should be part of the universe of discourse

<pgroth> Proposal 1: Entities and Activities belong to the universe of discourse.

<Luc> all votes were positive

<MacTed> I have failed to keep up with the list this week, and see argument with several of these proposals...

(many who voted are not present)

luc/pgroth: record previous vote for minutes rather than re-voting here

<Luc> ACCEPTED: Proposal 1. Entities and Activities belong to the universe of discourse.

<pgroth> Proposal 2: Events (Entity Usage event, Entity Generation Event,

<pgroth> Activity Start Event, Activity End event) belong to the universe of

<pgroth> discourse


<MacTed> I accept Proposals 1-4, and have concerns or issues with 5-9

<Luc> ACCEPTED: Proposal 2: Events (Entity Usage event, Entity Generation Event, Activity Start Event, Activity End event) belong to the universe of discourse

satya: with respect to prov-o, those were included

<Luc> Proposal 3: Derivation, Association, Responsibility chains, Traceability, Activity Ordering, Revision, Attribution, Quotation, Summary, Original SOurce, CollectionAfterInsertion/Collection After removal belong to the universe of discourse.

luc: Stian voted -1 (for all but associations)
... not sure of his rationale

tim: laundry list is long, a concern to determine how each should be modeled in prov-o

luc: satya suppoted derivation, association and activity ordering, do you support those?

tim: yes

luc: why doesn't stian think association should not be part of universe of discourse?

pgroth: possibly rephrase proposal 3 and re-vote?

luc: association belongs, since stian and tim do support those

<Luc> Proposal: 3a: Association belongs to the unvierse of discourse

luc: we'll discuss with stian further and rephrase rest of proposal 3

tim: accepts association

<Luc> ACCEPTED: Proposal: 3a: Association belongs to the universe of discourse

<pgroth> Proposal 4: AlternateOf and SpecializationOf belong to the universe of

<pgroth> discourse

pgroth: may need more discussion of proposal 4, postpone for now

<Luc> Proposal 5: Records do not belong to the Universe of discourse This includes Account Record.

pgroth: satya and macted disagree

satya: we need a construct to aggregate prov. assertions, if we remove records/accounts, we won't have a good way to do that

macted: is this to differentiate data/metadata in a given context?
... in a database world, the fields are filled with data, the table has the metadata

luc: we're trying to establish that

macted: we need to make that distinction

luc: we are talking about different levels, the world where things happen; level 2 descriptions of what happened in the world
... account records are at that second level
... we can go even higher to talk about provenance of provenance

macted: that isn't clear in these proposals

luc: we're trying to represent that intent

macted: things/entities are interchangeable, the proposals aren't clear

luc: we're trying to determine how to represent our intent into the documents

macted: difficult with text alone

<jcheney> See also ISSUE-212

luc: yes, more graphics would help explain the concepts

zednik: yes, confusing, perhaps graphics or ASN could help explain this better, esp. things like prov. of prov.

<jcheney> Is prov of prov on the critical path? I agree it's important but perhaps we should table it until one-layer prov is stable

pgroth: there is some demand of prov. of prov. from the group

macted: this is a perpetual problem in graphs, the recursion. These levels can be better described graphically

luc: we haven't determined how to express prov. of prov. yet

<zednik> @jcheney from http://www.w3.org/2005/Incubator/prov/XGR-prov-20101214/#Broad_Recommendations "Recommendation # 4: A provenance framework should include a standard way to express the provenance of provenance assertions, as there can be several accounts of provenance and with different granularity and that may possibly conflict"

luc: for some account records aren't part of discourse, but if you do want to talk about them, then you will have to identify them
... do we want to have prov. of prov.? is that part of the scope we should cover?

zednik: we don't want to preclude describing prov. of prov.

luc: the term 'thing' -- if we use an account record, we need to make the 'thing' an entity so we can describe it
... looking for guidelines/recommendations of where we are going with this

pgroth: if we remove notion of account record from proposal 5, would that be in line with our thinking?

<tlebo> +1 luc: the way to talk about things is by introducing entities. (we get provenance of provenance by making entities about the records - we effectively have shifted the two levels.)

<stephenc> We have a use case for provenance-of-provenance on legislation

<pgroth> Proposal 5: Records do not belong to the Universe of discourse

macted: this is the recursion problem. prov. of a thing is itself a thing (an entity) when asserting provenance about it
... difficult to express without a picture

luc: we need more guidance to even draw the picture

<tlebo> +1 (if i want to talk about Records, I make an entity about it)

<pgroth> i agree with you tlebo

luc: if all records have an identity, that is a different direction that if records are not part of the universe of discourse

macted: example - i have a table, built 1727, joe smith, sold on jan 19, 1728, sold again, again, again
... we track that journey through the world -- the provenance
... the records of that provenance are a distinct entity
... the provenance of the provenance are that I said it was built in 1727
... that shift the perspective up a level

<kai> +1 for provenance on provenance.

macted: one level talks about the table, one about the provenance, one about the provenance of the records of the provenance.

<kai> That's metadata provenance

<tlebo> (so Records out outside of DM's "current" macted:Shift)

macted: this can be difficult to follow

<tlebo> @macted, good example

pgroth: that use case is clear, but how do we best communicate that? what construct should prov-dm have?

macted: use a concrete example to figure that out, rather than trying to solve in the abstract
... have to look at both sides to make sure it all works

<pgroth> q

macted: doing the abstract first makes this harder

<zednik> +1 to use concrete example before decidiing on abstract model restrictions

satya: the way to talk about things is to introduce entities
... when we want to talk about prov-of-prov, we need to have a universal construct for that
... we have been discussing this notion already. records should be part of the universe of discourse

<tlebo> @satya, did you say that you need Account Records AND Accounts in UOD?

jcheney: I said I agree there is a difference between saying all records are part of the UofD, or if some could be
... some ambiguity. Some entities might contain information about provenance records contained elsewhere
... in order to express prov-of-prov
... this isn't something we have to decide now to make progress, could we say "by default records aren't necessarily identified entities in the UofD, but they might be"

<tlebo> +1 james: by default records are not in domain of discouse, but can be if entities are used to discuss them (this shifts the perspective)

kai: we have a similar problem in dublin core, we can describe everything, but then we have to describe the description

<tlebo> +1 "it's nothing special'!

kai: we need to be able to describe prov-of-prov, need to consider the prov itself as an entity.
... if we do that, then we don't have a problem
... keep it simple, just say that prov. itself can be an entity, then you can describe it just like you describe the prov. of any entity

<tlebo> +1 keep it simple (knowing that it can be shifted)

kai: simply handles the recursion

<pgroth> by default records are not in domain of discouse, but can be if entities are used to discuss them

<smiles> +1

<tlebo> records are only a means of transmission. We only care about the content of the transmission.

pgroth: trying to capture this -- james' proposal allows us to shift perspective, is that ok? is that sufficient guidance for luc?

<MacTed> see SKOS - containers of entities, which are containers of entities, which are containers...

luc: yes, that and the emails

<tlebo> I'm at the top of the hour

<jcheney> OK with me (that's actually tlebo's wording, but I like it)

<MacTed> er, sorry, SIOC not SKOS

<kai> Don't make the mistake that in the end you can describe the provenance of everything, the only exception would be the provenance (records).

pgroth: next few proposals need even more discussion

<pgroth> Proposal: by default records are not in domain of discouse, but can be if entities are used to discuss them

<tlebo> +1

<jcheney> +1

satya: what does "by default" mean?

<tlebo> "the current layers of the shift"

pgroth: when you describe provenance, you use things like entities, derivations, etc. not records

<jcheney> I think it means that you can't infer that a record is in the domain of discourse. You have to assert it.

pgroth: but if you want to describe prov-of-prov, you would (in some fashion) make the records into entities and use those

<satya> 0

<tlebo> If we argue for a third layer, we are not being compact and eloquent. And we could argue for the fourth, and fifth. It won't end.

satya: decision not critical to move on

pgroth: this is important for modeling

<pgroth> q

<jcheney> @satya: There is a difference between saying records "MAY" be in hte domain of discourse and records MUST be in the domain of discourse.

<kai> -1

<Luc> @tlebo: i dont think we would introudce more layers, but a "shift operator"

kai: I can describe the provenance of data, not just things
... provenance of data is itself data, so we can describe it the same way

<tlebo> @ speaker, because we already have what we need to discuss provenance (Entities)

<zednik> -1 (show concrete example before making modeling decision, not other way around)

pgroth: we have "provenance records". last week we said things in the UofD are identified
... if we say records are part of the UofD, then we have to give them identifiers -- that affects the modeling

kai: what is the problem giving them an identifier?

pgroth: sometimes, we might not want to assign them identifiers

<pgroth> entity(w3c.org)

<tlebo> (apologies)

pgroth: is that in our UofD?

<satya> Sorry, I have to leave.

kai: I can only describe identifiable things, so if we want to describe them, we have to identify them
... just a collection of statements might not have an identifier, so we'll have to identify them if we want to describe them

<jcheney> alternative wording: "records MAY be in the domain of discourse, but we don't assume that all records are in the domain of discourse" ???

pgroth: some agreement, but try different wording

<pgroth> records MAY be in the domain of discourse, but we don't assume that all records are in the domain of discourse

<jcheney> alternative wording: "records MAY be in the domain of discourse, but we don't assume that all records are in the domain of discourse" ???

<jcheney> is that at least clearer than "by default"?

kai: I think records are in the UofD, but only if they have an identity
... "every record that has its own identity is in the UofD"

luc: we were using accounts to handle this, not every single record
... we weren't going to have provenance of other records
... if we revisit this, we need to change more of the data model. we were previously only using accounts as a way to describe prov-of-prov
... are we questioning those decisions made 6 months ago?

<jcheney> It may not have been clear to everyone whether "records" included or excluded accounts in this discussion (it wasn't to me)

luc: the latest draft still says the only way to describe provenance itself is through accounts

kai: something that has a URI, an identity, is something that exists. why restrict how you can describe that thing?

luc: we aren't considering resources in general, just the way we model those things in prov-dm

<MacTed> SIOC Ontology -- http://rdfs.org/sioc/spec/ -- may save us reinventing many wheels....

luc: are we making provenance records part of the UofD. Can we represent prov. of accounts?

<MacTed> of particular use -- http://rdfs.org/sioc/spec/#sec-overview

luc: are account records part of the UofD?

kai: Is there a problem if that are not in the UofD?

luc: we are breaking early design decisions. saying they are part of UofD, we say that all records have to have identifiers
... implications is every prov. record would have to have a named graph to give the set an identifier
... this is a radical departure to current work


luc: we need guidance on this

kai: we can discuss at F2F
... we don't want to destroy current work
... we should be able to figure out something that works next week

pgroth: kai isn't saying we have to have identifiers for everything, we don't have to have mint identifiers for every prov. record
... we can use that as preliminary guidance

kai: yes, that is what I think, they CAN have an identifier, with that you can describe the records' provenance

<jcheney> That sounds like what I was trying to say.

kai: we should indicate that it is possible to describe prov-of-prov

<jcheney> Might be good to give a small meta-prov example like MacTed's in PROV-DM?

kai: we are mostly in agreement -- just need to detail

<pgroth> curt

<pgroth> I'll take care of it



<pgroth> trackbot, end telcon

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.136 (CVS log)
$Date: 2012/01/26 17:19:54 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.136  of Date: 2011/05/12 12:01:43  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Found Scribe: Curt
Inferring ScribeNick: Curt
Default Present: Curt_Tilmes, Luc, [IPcaller], +1.443.708.aaaa, +1.646.389.aabb, tlebo, +1.518.633.aacc, MacTed, jcheney, davidschaengold, Satya_Sahoo, kai
Present: Curt_Tilmes Luc [IPcaller] +1.443.708.aaaa +1.646.389.aabb tlebo +1.518.633.aacc MacTed jcheney davidschaengold Satya_Sahoo kai
Regrets: Graham_Klyne Paolo_Missier Khalid_Belhajjame Daniel_Garijo
Agenda: http://www.w3.org/2011/prov/wiki/Meetings:Telecon2012.01.25
Found Date: 26 Jan 2012
Guessing minutes URL: http://www.w3.org/2012/01/26-prov-minutes.html
People with action items: 

[End of scribe.perl diagnostic output]