W3C

Dagstuhl workshop break-out session discussion on provenance

02 Jul 2009

See also: IRC log

Attendees

Present
Regrets
Chair
Harry
Scribe
danbri, Ivan

 

 

deb: proof markup lang
... a representation strategy
... another model is trust
... i distinguish between trust repres. and trust calc
... we have a small onto for saying whether i trust some party
... and separate strategy for figuring out who to trust
... eg. wikipedia authors constantly being re-edited
... vs constaintly being cited
... 3rd module is who/what/when/where
... repr sttrategy

<MarkusK_> Proof Markup Language primer: http://inference-web.org/2007/primer/

deb: integrated recently with the OPM
... open prov model
... the opm challenge

<MarkusK_> Open Provenance Model: http://openprovenance.org/

<ocorcho> are two triples identical?

danbri - asked deb whether they have tools for essentially claim graph analytics

scribe: figuring out bigger pic from different accounts

deb - short answer - yes

scribe: builds on owl, proof work

yolanda: ... workflow systems ...
... probs when algorithms aren't designed to deal with prov at all, or in a granular way
... escience paper... by paul groth

<dlm> inference-web.org is a pointer to our general infrastructure that uses pml

presumably in http://www.informatik.uni-trier.de/~ley/db/indices/a-tree/g/Groth:Paul_T=.html somewhere

yolanda: trust has to do with ... "is this coming from harry..."
... " is it really from harry"

<dlm> http://inference-web.org/wiki/Publications has a reasonably complete set of publications on pml and

yolanda: trust can mean a lot more things
... i published about how trust can be associated with content

<dlm> the owl ontologies are up and availabe from http://inference-web.org/wiki/Documentation

yolanda: ... eg contracts with you for books (if i'm springer) ...

er, if you're springer

scribe: how can ppl reason about specific pieces of content in a document
... we've used open prov model for that

ivan: q same as for deb
... what part of the (whiteboard here) list does this cover?

yolanda: i divide things a bit differently
... we modularise for efficient query

paulo: dns as a source of prov info and trust
... when you open a web page, it's due to dns / domain names
... nobody can publish things in stanford.edu domain without being auth'd to do it

i need to think :)

<JeffP> danbri: domain name might be only a first step

<JeffP> yolanda: Paul T. Groth. A distributed algorithm for determining the provenance of data. In Proceedings of the fourth IEEE International Conference on e-Science (e-Science'08), 2008. http://www.isi.edu/~pgroth/papers/pgroth-dpquery.pdf

<JeffP> dlm: there are different ways to provide answers for prov queries

<JeffP> ivan: no way to trace from a URI to individuals

<JeffP> yolanda: there can be many questions, but one might not always reduce it to a trust question

frank: i don't know re prov ... but sounds like you're all solving too many probs at once

me: please look at OpenID and at the GRAPH construct in the W3C SPARQL standard - both key tech for prov

re sparql, see http://www.w3.org/TR/rdf-sparql-query/#namedGraphs

example:

SELECT ?who ?g ?mbox

FROM <http://example.org/dft.ttl>

FROM NAMED <http://example.org/alice>

FROM NAMED <http://example.org/bob>

WHERE

{

?g dc:publisher ?who .

GRAPH ?g { ?x foaf:mbox ?mbox }

}

reminder - twitter notes can be found c/o http://twitter.com/#search?q=swdag2009

harry: also stream of work from database community

eg from Peter Buneman "where & why" paper

(/me wonders who is doing work with temporal logic here ... eg. what's state of world is, given some set of datestamped claims)

harry: ... red, green etc data flavours

<ivan> (in the paper of James Cheney)

<ivan> harry: Talis has some implementation in this area

<ivan> ... it makes diffs on rdf and reports there

<ivan> ... and Irini Fundlaki (sp?) did some work formalizing it

<ivan> ... these, in general, try to be simpler than the complex open prov. models

<JeffP> yolanda: A paper on trust that introduces "content trust" as separate from "entity trust": Towards Content Trust of Web Resources, Yolanda Gil and Donovan Artz. Journal of Web Semantics, 5(4), 2007. http://www.isi.edu/~gil/papers/gil-artz-jws07.pdf Also a survey on trust in the semantic web: A Survey of Trust in Computer Science and the Semantic Web, Donovan Artz and Yolanda Gil. Journal of Web Semantics, 5(2), 2007. http://www.isi.edu/~gil/papers/jw

yolanda: resp to q of why ppl aren't tracking prov
... at least from kr world ... if you kept track of everthing , there's an efficiency overhead
... eg in our workflow system, everything is completely reproducible
... see above survey paper on trust
... reputation, trust metrics across entities, ... trust on information, ... info retrieval

<ivan> danbri: I like to separate problems

<ivan> ... temporl logic is a hard problems

<ivan> ... LOD is full of time related problems

<ivan> ... how much work can we do without getting into temporal logic?

<ivan> carlos: there are some temporal algebra work

<ivan> ... i am not an expert in temporal reasoning

<ivan> ... an example

<ivan> ... what I had to do is a bit similar was for business process analyis

<ivan> ... you want to be able to reason what happened, when, implications, etc

<ivan> ... 'how many obejcts did we selll then and then'

<ivan> ... the approach was to define the notion of time point and intervals, and then one can use Alan's algebra to handle that

<ivan> danbri: does this deal with open world issues?

<ivan> carlos: in that domain it is closed

<ivan> ... it is based on keeping the time stamp

<ivan> ... check is based on time comparison

<cpedrinaci> http://www.w3.org/TR/owl-time/

<ivan> MarkusK_: that can be very heavyweight stuff

<ivan> ... what i was wondering that we have a bunch of implementation issues depending locally

<ivan> ... i can do it on my local system

<ivan> ... but there is no general system

<ivan> ... there is no best practice

<ivan> ... eg in OWL2 you have the possibility to annotate individual statements

<ivan> ... i do not see many people using this

<ivan> ... there is a small insentive for this

(my worry: simple temporal situations ("dan painted the car red", "what colour is the car now?" hit horrible time and closed world and commonsense reasoning issues)

<ivan> harry: there is a paper from Pat Hayes on time issues; he says it is easy to do if pick one, but in general it is very complex

<ivan> deb: on the time issue, time has been axiomatized and there are reasoners

<ivan> deb: markus' point: there will be some standards soon

<ivan> ... we are publishing pml with provenance

see http://www.springerlink.com/content/32m70342u4536453/

<ivan> ... it is time to push something like this to the standard body

<ivan> frank: i am not sure where this temporal issue comes from

<ivan> ... all you want to do is to put a time stamp for a document, you do not need temporal logic for that

<ivan> danbri: (see notes above from danbri)

<dlm> we are now publishing provenance in pml for the tptp solution set (thousands of problems for theorem provers)

<dlm> as well as provenance for data.gov in pml

<ivan> carlos: because you have a the time stamp, you can reduce it to comparison, you do not need more

<ivan> frank: you do not really need temporal logic to look at provenance statements

<JeffP> steffen: Papers on RDF Querying Provenance - the conference version - B. Schüler, S. Sizov, S. Staab, Duc Thanh Tran. Querying for Meta Knowledge. In: Proc. of WWW-2008. http://www.uni-koblenz.de/~staab/Research/Publications/2008/WWW2008-MetaKnowledge.pdf - the long version - R. Dividino, S. Sizov, S. Staab, B. Schüler. Managing RDF with Meta Knowledge Awareness. In: Journal of Web Semantics. Special issue on "The Web of Data". To app

<ivan> ian: we had a research project to integrate temporal logic with descriptioin logic, but we gave up because we did not really have a use case

<ivan> ... we really needed a timeline and comparison; it is pretty difficult to find a use case for a full blown temporal logic

another example --- as currently defined, foaf:schoolHomepage x,y triples never go stale, but foaf:workplaceHomepage x,y triples go stale after x stops working for the Org whose homepage is y ... how to annotate that in the foaf schema so that aggregators can exploit this

yolanda: argumentation markup language
... in face of conflicting evidence

note btw there was an Uncertainty Reasoning W3C incubator - http://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331/

here are the use cases from the uncertainty guys: http://www.w3.org/2005/Incubator/urw3/XGR-urw3-20080331/#usecases

deb: condition for success is modularity
... we found we had to do a modularisation
... we needed some very lightweight pieces
... and complex explanation/justification pieces
... also stuff like "I trust x" ... or 2 steps thru social graph, etc
... modularity key to successsful charter, group, effort

carlos: what do we want from the group

<JeffP> jeff: there are lots of work on uncertainty ontology and reasoning, we also have results on extenstions of OWL 2 profiles and have some implementations, such as ONTOSEARCH2: http://www.ontosearch.org/

carlos: coming up with use cases, who produced this, whats qual of this data ,...
... which are the pieces of info that we need to capture
... can expect such a group to do this

ivan: approach we took yesterday ... to (a) come up with a few items that we still identify as major research items in this area

3 or 4 research items for jim's list

scribe: at same time, ... listed very practical things that we can do now, simply because there is a need
... if the practical outcome is that this group push for a w3c incubator group, that's a good outcome, though other outcomes are fine
... msg to the community on what things can practically be done
... eg yesterday, that guus will push thru a whitepaper on vocab hosting issues
... if we can achieve these two things, then we did something

aside- see also http://www.gridprovenance.org/ GRID work

ivan: if we do an incubator group, ... who does it, who does the charter, etc
... could laucnh eg in sept

<ivan> ---- break----

<ivan> collecting research issues:

<ivan> 1- what is the basic level of trust? What is the minimal units on the Web

<ivan> 2- what aspects of trust are directly of concern for the (semantic) web architecture

<ivan> deb's refinement: are they any of them separable

<ivan> 3. how to present all this to the user (eg for large scale problems)

a bit out of scope maybe, but i have some bbc+openid+foaf use cases sketched in this presentation: http://www.slideshare.net/danbri/bbc-semweb-panel-where-does-openid-fit-in

<dlm> presentation vs. representation vs. storage of the provenance information

<ivan> 4. control of the information (and the tradeoff between 3 and 4)

<dlm> also, does presentation differ based on different contexts

<ivan> 5. explanation, what is happening in a larger context

<ivan> carlos: between 1 and 2, how would all these pieces apply to the semantics web

<ivan> 6. crypto and semantic web

<ivan> practical things:

<ivan> Deb will set up an (initial) wiki to collect information on provenanceweb.org

<ivan> Yolanda will start up the charter work with Ivan's and Dan's help

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2009/07/02 16:07:48 $