Prov XG Live Day 2

26 Apr 2010

See also: IRC log


Yolanda, Christine, Sam, Satya, Paul, Yogesh


<scribe> ScribeNick: Yogesh

- Architecture diagram for use cases

- Slides contain:

-- Block diagram, process of figuring out provenance, information flow, screen shot of the blog site with arts of the aggregation.

-- Content, management, and use

- Sources/Entities

-- people: blog creator, news paper organization,

-- Process: aggregator, people:(quality checker, license checker, legal checker (e.g. children)), image reformat, twitter, retweeting, edit tweet, TinyURL, Image edit: crop, copy+paste, commentary, adding content, add seal, flagg content, inability to check content.

-- blogs, microblogs, tweet, retweet, web site, images, video, news aggregate

- Content

-- Temporal aspect, ephemeral? Do in second scenario.

-- No entailment -- Attributeion -- Process -- Evol -- Recorded vs. reconstructed provenance,

-- E.g. "according to #johnmarkov, blah blah" -> causes NY Times, which #johnmarkov writes for, to be verified and the quote checked. Assertion and validation of assertion.

- Management

-- Authenticate entity, Check certificate, identification of user (e.g. thru handles, #tags), check website,

-- Can websites expose provenance? E.g. John markvo's article can have provenance. NY Times runs its own provenance service. How do we acceess and query it?

-- Infrered vs. asserted provenance. -- Inferer by following retweets

-- Access control: -- NY times requires you to have an account with them to access provenance.

-- License: blog may not have copyright metadata listed...has to be infered, image metadata may have copyright encoded, tweets may not have copyright.

-- Scale: Scalable query store for provenance. Global, multi-lingual.

Harry from Edinburg

-- Report due in September, draft is online, incomplete provenance requirements,

-- General case and particular case

-- General case: distributed social network, who posted, when, why and how through various systems.

-- e.g. Go between social netwrok sites that are usually silos. Number of software to deal with this. Problem is that they loose provenance. Most use atom feeds.

-- Activity atom stream used by FaceBook, Microsoft.

-- Take your activity stream, photos, etc. to a different network. How do you track that you moved data. Digital death: how do you track provenance of data removal.

-- Due to provacy, it is good to have data diasappear, unlike in eScience

-- Removing data impacts provenance of e.g. comments

Keep track capabilities in a site used to create a data...e.g. tagging photos, parts of photos, .. granularity of provenance may differ

Deletion of data is important!

facebook freezes your data when you delete your account. people may want to *delete* their data.

If you dont have a tackback mechanism, then it is a problem...dependency chain

Identity axis: retracting data, messages, and deleting profiles. E.g. retracting retweet that was found to be inaccurate.

highlighting differences from database and escience provenance

String desire for anonymous identity

users may have different anon identities that have different levels of trust. E.g. not give away name or geolocation

Users may want to post verified information but keep their identity anon

e.g. like voting online. you want to be authentication, but not identified

inverse of this is shills...people paid to send false information

provenance around identity

identity may be separable from provenance

similar to anon f medical records

provenance and policy usecase. social networks that has legal implications.

e.g. embarassing photo posted online with free license picked up for advertisment

can provenance framework track upload, download, and reuse

can a policy framework enforce this?

hooks in provenance framework to enforce policy

social netwoks use xmpp, pubsubhub...no fine grained level of provenance

track change of state and distribute them

standing query on provenance. accountabilty to check status of provenance

part of our charter is to see how provenance fits in web architecture

atom feeds for microbloggin may be sufficient granularity, but an atom feed for a blog post is too coarse a granularity for identifying provenance

granularity...can we give all resources in blog post: images, video, hashtag, text their own ID in RDF

add vocab to atom to add arbitrary payloads

so provenance goes with atom feed

Harry happy to exchange telecons

An ideal provenance framework is two things: (1) some vocab like OPM that is extensible, in semantic web language, annotations, authors, etc. (2) Versioning framework: deletions, edits, change, baseline operations tracked over time

propagation techniques for changes

Yolanda: 3 different provenance concepts: Content, Management and Use

management: named graph, sparql, ..

META and link headers in profile pages as part of HTML...special RDF staements in Link headers that point to provenance

add /META to web URL where you can find the metadata for the page at

Erin Hammer, Yahoo; Jonathan Reese, Science commons

good people to talk to about technology

Dan Connoley, W3C

Net steps: coordinated calls. Invitation to send someone from xprov to social web telecon to highlight provenance.

About 12 use cases available in the social web group.

If both XG's have a consistent voice in their report, it may be a strategic win with W3C

if xprov's use case 1 correctly captures social web usecase, that would be useful

Wednesday 11AM ET calls

Tentetively May 12, 11AM ET for telecon interaction. 1hr calls.

<scribe> ScribeNick: Yogesh

Architecture Diagram for Usecase 1/Provenance Use

- Use

-- Understanding: Seal picture is derived from the provenace.

-- Use some rating system to create the seal from provenance

-- Diagram showing reason for rating with ability to navigate to more detail

-- collapsing granularity of user operatrions into single process

-- Interop: navigating provenance graph; possbily on on different website.

-- "w3c" icons to descreibe provenance. vocabulary can drive icons.

-- Accountabilty:

-- verification integrated with trust model of blog agg. Check ability to check

license copyright. check if edits were made to the picture, and if so, legally. e.g. getty images, dowload, modify, upload with creative commons license, xmpp embedded in images, verify cahin to see confidence on image copyright.

-- recording and showing decision making process, aggregation

Satya and Olaf working on vocabulary


Ask for vocabulary mapping volunteers on Friday call

Yogesh to summarize BlogAgg use case and technical requirement on Friday

Paul excused from call

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2010/04/26 16:35:03 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.135  of Date: 2009/03/02 03:52:20  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Found ScribeNick: Yogesh
Found ScribeNick: Yogesh
Inferring Scribes: Yogesh
Present: Yolanda Christine Sam Satya Paul Yogesh

WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Got date from IRC log name: 26 Apr 2010
Guessing minutes URL: http://www.w3.org/2010/04/26-prov-xg-minutes.html
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.

[End of scribe.perl diagnostic output]