Use Case Retweets

From XG Provenance Wiki
Jump to: navigation, search


Proposed by Paul Groth

(Curator: Satya Sahoo)

Provenance Dimensions

  • Primary: Republishing
  • Secondary: Interoperability, Responsibility, Understanding

Background Current Practice (optional)

Many services are now available that allow one to microblog or publish short status messages about events, link, what one is doing, etc. A canonical example of this is the service [Twitter]. Users post 140 character public short messages called Tweets. A phenomenon within Twitter is the idea of retweeting. Reposting someone else's tweet as your own message. Initially, users denoted a retweet with the prefix RT and the @user sign to denote attribution to the user. For example:

RT @ivan_herman A good link to have for #linkeddata: Data Incubator
7:21 AM Nov 26th from TwitBird iPhone in reply to ivan_herman

However, after a while the provenance of a tweet is lost as people retweet retweets. Because of the 140 character limitations, provenance metadata soon overwhelms actual content and the original content of a tweet is sometimes lost. This spurred Twitter to introduce new functionality that allows users to retweet from within the site without including provenance metadata within the Tweet itself.

Current practice includes tracking microblog messages within a single website. See twitter retweet functionality.


Broadly, determine how content changes (i.e. its version) across the web and who is responsible for those changes. In this specific instance, determine the original author and content of a microblog message (e.g. tweet). Determine any changes and the attribution of those changes as the microblog message is reposted (e.g. retweeted). Do this across many different web sites.

Use Case Scenario

The general use case is as follows:

  1. Alice retrieves (copies) content from Bob's website.
  2. Alice modifies this content and posts it to another website.
  3. Carol accesses the new website and wants to determine who is responsible for what part of its content.

Twitter Specific Use Case:

  1. Bob posts a message on twitter, which is automatically cross-posted to Facebook.
  2. Alice who is a friend of Bobs, reposts the message on Facebook adding a comment.
  3. Carol sees the message of Alice on Facebook and wants to know what the original message was and what is comment and who is responsible for what in the message.

Problems and Limitations

Current solutions only work inside one web site. There are difficulties maintaining what is comment and what is original. There is also a distinction between editing a post and just passing it on.

Technical Challenges:

  • How to track the alterations to Web content outside of a web browser.
  • identifying users that create and modify content. Do you need signatures or OpenIds or just a URL.
  • Aggregating changes from multiple websites requires agreed upon representations.

Unanticipated Uses (optional)

I think the same approach could be used for this use case and the Use_Case_Provenance_in_Blogosphere.

Existing Work (optional)

An excellent discussion of the issues with retweeting and the introduction of retweet functionality can be read in a blog post by Evan Williams: Why Retweet works the way it does