Use Case Identifying Attribution And Associations

From XG Provenance Wiki
Jump to: navigation, search


Yolanda Gil


Paolo Missier

Provenance Dimensions

  • Primary: Attribution (Content)
  • Secondary: Access (Management), Trust (Use)

Background and Current Practice

This use case is inspired by the Trellis project,  where we studied how to capture the analysis process when contradictory and complementary information are available. The analysis process is captured as argumentation structures that refer to source attribution and trust metrics.

Millions of people consult web resources daily, and analyze painstakingly complementary information and often contradictory information. As they perform this analysis, they look carefully at the attribution of the information: what entities were involved in providing the information: the writer (eg Ty Burr), the publication (eg The Boston Globe), the owner (eg the New York Times Company), the origin of the information (eg the person being quoted), etc. Some information providers are generally considered more trustworthy or reliable than others (eg a newspaper of record).  Some sources are considered authoritative in specific topics.  Some sources are preferred to others depending on the specific context of  use of the information (e.g., student travelers may prefer cheaper travel sources, while business people may prefer more reliable ones).  Some sources are preferred simply because they are known to keep their information up to date.

However, all this work that millions of people do every day is lost, ie, not captured on the web. Therefore, we all must start from scratch and simply rely on the rankings of search engines and our own limited expertise to do any task on the web. Users cannot easily access information about who is providing the information they are seeing. Worse yet, machines cannot access that information and assist users by reasoning about attribution and by assessment of sources.


Users could consult for any resource on the web what is its attribution, ie, the sources or entities that were involved in creating the resource.

Similarly, tools could be developed to do the same and assist users by reasoning about attribution and providing assessments about the entities involved in producing that information.

Use Case Scenario

A user finds a document on the web that quotes a New York Times article from  the REUTERS agency that contains the statement "At a press conference last Monday, a US Federal Reserve spokesperson reiterated that its chairman was not planning to raise the current interest rates".  The user may decide whether to believe (use) this information because of one or more of the entities that created it: the NYT, or REUTERS, the Fed spokesperson, or its chairman. The user would first have to find this set of attributions, and then use some criteria to discern whether to believe it. Some users may have stable criteria to do this kind of assessment, eg, always believe what the NYT publishes, never believe spokespersons for the US Federal Reserve.

In some cases, the user may not be very knowledgeable on the topic. When this is the case, they would want to know for example what other people consider to be the reliability of the entities that the user is currently trying to assess. This could potentially result in queries to some repository regarding what criteria others used before to dismiss or use information from the sources currently being assessed.

Other times, users may be knowledgeable enough to have their own criteria for assessing sources but may simply not find the information credible. In these cases, the user would want to know how many other independent sources can confirm this information. This could result in queries to retrieve similar assertions but exclude any resource that has similar attribution (same writer but maybe different newspaper would be considered similar attribution, while same newspaper but different spokesperson would be considered different).

Finally, attribution may not be the sole source for making these kinds of assessments. Other kinds of associations may be used as well. Examples include: cited-by, endorsed-by, criticized-by, opposed-to, endorsed-by, financed-by, etc etc.

Problems and Limitations

It is unclear that all the associations used to assess sources are based on provenance. For example, if a document was financed by an entity, it is unclear that entity was really involved and has any responsibility in producing the information.

Other kinds of associations have to be carefully represented. Consider, for example, a Web resource that recommends  a set of readings in the history of astronomy, and is  maintained by an astronomy department on a university Web  site. If the Web page is authored by a faculty member in  the astronomy department, then a user would attribute the information to the university, the department, and the authoring  professor. If the Web page is authored by a student on a  temporary internship, who happens to like astronomy as a  hobby, the user would not put as much weight in the  association of the resource with the astronomy department or  the university. This example illustrates that automatic association with entities can be tricky.

Existing Work

[Gil and Ratnakar ISWC02] describe an approach to enable users to express their  assessment of complementary and contradictory sources of information. As the  user considers information from different sources relevant to their purpose, they can view the ratings that other users assigned to the entities involved, and use those ratings to assess the information at hand.  Sources were assigned a reliability rating, and individual sources could be selected to express the criteria used to accept or dismiss information. The user could also assign credibility ratings based on other information available.