Use Case Simple Trustworthiness Assessment

From XG Provenance Wiki
Jump to: navigation, search


Olaf Hartig


Paolo Missier

Provenance Dimensions

  • Primary:
    • Use: Trust (Information Quality)
  • Secondary:
    • Content: Attribution (Responsibility), Process (Data Access), Evolution and versioning (Republishing)
    • Management: Publication, Access
    • Use: Understanding

Background and Current Practice

Trustworthiness of an information object such as a data item is the subjective belief or disbelief in the truth of the information represented by the object. The trustworthiness of information objects is often used to filter them or to make decisions while processing them.

While trustworthiness and trust is studied extensively in the context of active entities such as persons, agents, or peers, few work exist that study trustworthiness as an information quality criterion. Hence, computer systems that use the trustworthiness of information objects for filtering or decision making usually apply a very simple assessment approach: the object is related to some kind of a source for which a trust score can be determined using one of the methods that exist for active entities; this score is then adopted for the trustworthiness of the information object.


The goal is to enable users of information objects such as Web data to assess the trustworthiness of these objects in order to make informed decisions on their fitness for use.

This use case discusses provenance information as a means to enable such trustworthiness assessments.

Use Case Scenario

Alice operates facilities to provide information objects to users. Various information providers make use of this possibility. A user, Eve, consumes the provided information. However, Eve does not want to use available information objects that - even if relevant to her task - are not trustworthy enough. Since Eve does not want to check and verify all information she decides to consider these information objects trustworthy that originate from trusted providers. Based on this simple method Eve assesses the trustworthiness of relevant information objects and ignores unsuitable objects accordingly.

This general scenario is motivated by the need to apply trustworthiness based filter methods in applications that consume Linked Data from the Web. In a Linked Data specific instance of the general scenario Alice may operate a Linked Data server on which she publishes a dataset that includes RDF links to other datasets. Some of these links are provided by other parties because Alice allows others to upload relevant linksets to her server. Bob and Carol took this opportunity and use Alice's server to publish linksets to their own datasets as part of Alice's dataset. Eve uses a Linked Data based application that accesses and processes data from Alice's dataset. During processing this data the application discovers several RDF links that seem to link to further relevant data and, therefore, are worth following. However, the application has the information that Eve does not trust Bob and, thus, decides to ignore the RDF links from Bob.

In addition to trustworthiness assessment of Linked Data, many other specializations of the general scenario can be considered (e.g. trustworthiness of blog posts in a feed aggregator, trustworthiness of photos uploaded to a news portal, trustworthiness of soccer match results reported via a sports platform).

Problems and Limitations

The main technical challenges of this use case are:

  • Alice must associate the provided information with provenance-related metadata. This must include information which allows Eve to attribute the different information objects retrieved from Alice to the original providers. This is a provenance content and management issue.
  • The different kind of sources that participate in the scenario must be taken into account when representing provenance to support trustworthiness assessment as outlined. While Alice is the source from which the information has been retrieved she is not the original provider of all the information. Nonetheless, the information that Alice controls the providing service may be relevant for trustworthiness assessments too, because it may give her the chance for manipulation. This is a provenance content and use issue.
  • If an application filters information objects as the result of provenance-based trustworthiness assessments the use should get the chance to understand the decisions made. This is a provenance use issue.

Existing Work

Hartig ESWC09 describes tSPARQL which is a trust-aware extension to the query language SPARQL. tSPARQL allows to describe trust requirements in SPARQL queries. Using tSPARQL an application can filter (intermediate) solutions for graph patterns in SPARQL queries based on the trustworthiness of the data from which the solutions originate. The tRDF4Jena library provides a query engine for tSPARQL.