Talk:Proposal for a Working Group on Provenance

From XG Provenance Wiki
Jump to: navigation, search

Feedback from James

1. The deliverables are numbered D1-D9, but there is no D5.

2. There are a lot of deliverables for 2 years: 5 recommendations and 3 notes. My understanding is recommendations require a longer lead time and public comment period, so producing 5 recommendations for a 2-year process seems like a lot.

By comparison, have a look at the RDB2RDF charter/WG: it has only 5 deliverables with 1-2 of them being recommendations, and was also meant to run in 2 years, and I understand that that has still been a slog.

3. What is the difference between having an XML "serialization" (D6) vs. an OWL/RDF/etc. "formal model" (D2)? Why do both (or either) need to be standardized?

4. Why do we have both a "formal model" and "formal semantics" deliverable? What is the difference, and what are the expected benefits of formalization?

5. Likewise, why do D4 (accessing and querying) and D7 (mappings) need to be recommendations/standards, rather than notes? I can see that the access issue might require some future architectural/protocol standardization. But is that something that can be done by a WG unilaterally? For querying and for the mappings I am not sure I understand the rationale for standardization. These could perhaps be sub-deliverables of the "primer" or "cookbook".

Overall, the current list gives me the impression of a last-minute rush to include everything that might be useful. This inclusiveness is good, but I worry that it might wind up overcommitting the WG or making the plan look too ambitious for the time available. My feeling is that the fewer discrete "tasks", the better for focus and flexibility, since there is a start-up cost to each deliverable.

I also wonder if we can estimate how much work the different parts will take, and which are considered "must be recommendation" vs. "decide later" and "required" vs. "optional". I understand that some thought about this was already done in the various WG charter drafts so maybe it is just a matter of transferring these to the wiki.

James Cheney

1. Done.

2. Well, I/we have thought a lot about this list of deliverables, and I don't see how it can be reduced, without having something incomplete. One of the rationales for this list was that we were not starting from scratch but from well understood vocabularies. I agree that we should consider more notes. I think it was my mistake to make D6 a Recommendation, it should be a note.

3. The purpose of the XML serialization is to make the standard accessible beyond the semantic web community. I see this as a low hanging fruit, to promote take up. They are also lots of tools around xml, which should facilitate the development of provenance tools.

4. An OWL ontology/RDFS Schema for a provenance vocabulary does not necessarily capture the meaning of the constructs. It is hopefully the purpose of the mathematical semantics. However, we are aware that it could be a huge job, potentially, which cannot fully be solved in the timescale. Should we try to scope it, or should we leave this to the WG?

5. As said above, it's a mistake to make the mapping (D6) a recommendation. Indeed, other provenance vocabularies may vary over time, and won't be as rigid as a standard, so, this shouldn't be a recommendation. As far as D4 is concerned, I think that the way provenance is exchanged with http, or the way it is embedded in rdfa should be a recommendation, because it's crucial to the edifice. It does not mean this should be a long document though.

I can assure you that there was no last minute rush in designing this list of deliverables. The XG was clear that some of the optional deliverables of earlier drafts (like semantics) had to become mandatory. We need to map all this on a timetable now. Luc

Should we consider dropping D6?

With a goal of reducing the amount of work, should we consider dropping D6? Shouldn't it the responsibility of designers of non-standardized provenance languages to explain how *their* model maps to a standard FOO?


Feedback from Paolo

I assume at this point the "email dust" has settled and all relevant info is in the wiki page.

main comments:

m1) I am not convinced by Luc's response to James' point (4). OWL is a language with a formal semantics, so when I describe FOO using OWL, I am giving FOO a formal semantics. Not? But, there may be aspects of FOO that cannot be captured using OWL -- and this is why I think that FOO should be informally defined at the same time. For these fragments we may well need a separate "formal model", which would be expressed using some other (logic?) language.

m2) I don't fully understand D4. "Access" and "query" seem to be both understood in the context of provenance embedding in some host language (RDFa, for exambe kept separate.ple), which however restrictive may still be fine. But, I would argue that they are two different things and IMHO should "access" can be anything from an API to a protocol (esp if we decide that the qualification "remote" is important), while "query" suggests a language that predicates over the representation of a data model, and whose interpretation results in data fragments (fragments of provenance graphs, in this case). I thought we had agreed earlier that the latter is out of scope. Either way, adding "how to query provenance through a SPARQL endpoint." is confusing, because (1) it assumes that queries are defined on the RDF mapping of FOO (what about XML, then??!), and (2) once the RDF schema for representing FOO is defined, I am not sure what else is left to say regarding queries, other than, well, they are SPARQL (and yes, they are submitted through a SPARQL endpoint). So I am not sure what this document would contain.

m3) Dropping D6: I tend to agree with Luc that it is not the group's responsibility to map any third party vocabulary to FOO. One argument for this is that either the set of vocabs to consider is unbounded (what do we mean by "extant provenance models"?), or the choice of vocabs to map from seems arbitrary. That said, I think that D7 could include example mappings, to the extent that there are volunteers with a vested interest in providing them.

pedantic points:

p1) In the list of OPM terms, regarding "opm:WasGeneratedBy": the example "A thumbnail image was generated by Blog Agg using the panda image." is slightly misleading, because "using the panda image" makes it a ternary relationship, while WasGeneratedBy is binary.

p2) In the list of Provenir concepts: I agree with Paul that the spatial proximity relation is a bit odd, and not just because it doesn't seem to belong to "provenance", but also because "adjacent_to" applies to points in a metric space, which (unlike with basic temporal relations, for example) requires the definition of a distance function to be supplied in order for the property to make sense. In other words, I can say that sensor A is adjacent to sensor B, but this does not mean much in spatial terms, in particular I cannot infer whether A affected B, without knowing how far apart they are (and possibly knowing a lot more!). So I believe that this kind of properties take us out of the scope of a provenance model.

Some answers from Luc.

m1) It depends what we mean by semantics. If we mean, say a denotational semantics, I don't think that you get it through an OWL specification. How do you characterize the set of inferences that you have over your provenance data model? I don't think you can do it through OWL. So, I believe that they are lots of things that could be done outside OWL, and they would help give a precise *meaning* to the data model. Now, this is an ambitious goal, which may need solve difficult technical challenges, and may not easily be done in two years, especially, if we don't have the necessary human resources involved in this activity. That's why, in the original charter draft, this was optional.

So, we need to have an agreed understanding of what we mean by semantics, and whether we make it optional or mandatory.

m2) Some typos fixed in the text. I agree that there is very little to say about how to query with sparql once ontology is defined. Mostly an example. The question is how you find the sparql endpoint, which is related to the presentation to the w3c architecture group, a while back. So i hope this can be a very short document. If the XML community had defined an XQuery/XPath endpoint protocol, it would be relevant to have a section on how to leverage such a protocol too. However, they have not done it.

Brief response on (m1) -- Paolo

I do agree that we won't get away with OWL. My point was simply that, /if OWL was all we needed/, then the semantics of FOO would come for free from the semantics of OWL. So for example, if you decided that in FOO, say, there is a property BAR that is transitive, then you can just say it in OWL without having to write a further inference rule that would say the same thing in a different language. But I suspect that some of the interesting semantic constraints on FOO may not be expressible in OWL (which may very well lead to our very own FOWL formalisation :-) )