This wiki has been archived and is now read-only.

Use Case private data use

From XG Provenance Wiki
Jump to: navigation, search


Rocio Aldeco-Perez and Luc Moreau (contact)

(Curator: Simon Miles)

Provenance Dimensions

  • Primary: Accountability (use)
  • Secondary: Dissemination control - Law Enforcement (use), Process (content)

Background and Current Practice Scenario

A lot of on-line facilities offer personalised services by requesting private information to their users. Such private information must be used under a set of rules that describe which processing can and cannot be performed over such data. If these usage rules are not followed, personal data could be exposed and used against the interest of its owner.

Evidence of the importance of this issue can be seen in legislative frameworks related to the use of private information, such as the Data Protection Act in the UK, the European Directive on Private Data, and Hippa and Safe Harbor in the US.


Here, we adopt Weitzner's notion of accountability [1]: "accountability must become a primary means through which society addresses appropriate use..." Information accountability means the use of information should be transparent so it is possible to determine whether a particular use is appropriate under a given set of rules, and that the system enables individuals and institutions to be held accountable for misuse.

The goal of this use case is to perform auditing tasks about previous usage of private data, and check that such usage is compliant with rules regulating the use of private information. Inspired by the UK Data Protection act, we identified the following specific tasks:

  • Legal Purpose: To verify that a set of data was processed for a valid purpose.
  • Declared Purpose Compliant: To verify that a set of data was used in a processing that is compatible with the purpose by which was collected.
  • Authentication: To verify that a set of data, which was collected from a user, was used by processes that initiated such collection.
  • Minimal Set: To verify that all the data that was collected from a user was used at some point.

Use Case Scenario

The general scenario structure is as follows.

1. Alice wants to interact with an online service. In order to do so, she needs to provide personal information.

2. The online service uses that personal information for a particular, pre-stated purpose.

3. Later, Alice suspects that the personal information was used in a way other than the pre-stated purpose.

4. Upon request, an independent authority determines Alice's doubts are founded and performs equivalent check across many individuals who have used the service.

It can be applied to a particular domain below, which gave the inspiration for this use case.

1. Alice wants to buy some medicine from an on-line pharmacy. In order to get her medicine, she needs to provide her name, address, date of birth, gender, social security number, the number of her clinic and her doctor’s name.

2. The pharmacy collected that set of data with the purpose of "on-line sales". So her name, address, date of birth, social security number, the number of her clinic and her doctor’s name are used to register the sale of that medicine with the Health Service. The name and address are used to send the medicine to Alice.

3. Later, the pharmacy creates a record of the monthly sales, which includes the medicine’s name and the quantity sold.

What if the pharmacy decides to include the Alice's name next to the medicine she bought in the record of monthly sales? Alice does not provide her name to be used in a record that could be used to find specific individuals that suffer from certain illnesses related to the medicines they bought. How can Alice be sure that her information was used in a way compatible with the purpose by which she initially send it?

In practice, independent institutions, as the Information Commissioner in UK, make audits to verify that individuals or institutions that manage personal information are following the data protection rules, in that way they can be held accountable for information misuse.

If the pharmacy creates a register containing the information that plans to collect from Alice, the processes to be performed over it and the purpose of such information collection, then we can use that register as a set of rules that the pharmacy should follow when using Alice’s information.

If, at the same time, Alice and the pharmacy are asserting provenance information related to their actions, later, such provenance information can be compared against the registered set of rules to verify if the pharmacy effectively used Alice’s information in the right way.

Thus, if the pharmacy registers the creation of a record of monthly sales that includes medicines' name and the quantity sold related to the on-line sales purpose, then it can create the record but it cannot use Alice's name on it. If, despite this, the pharmacy does that, we can find it out by checking the provenance information related to such an activity to later make the pharmacy accountable for misusing Alice’s information.

Many of alternative on-line scenarios can be considered in this use case, such as, Universities, Facebook, Google, Governmental services, etc.

Problems and Limitations

Here are described the main technical challenges in this use case.

  • Institutions or individuals that manage personal information (in this case the pharmacy) should register in a well defined fashion the purposes and the way by which they plan to collect and use users’ information. This process is similar to the Notification Process established by the Information Commissioner Officer (see [2]). This registered information will be treated as the rules that such institutions should follow while processing personal information. An example of the document produced during the notification by a pharmacy can be found in [3]. This problem can be addressed by using semantic web technologies to represent purposes of collection, tasks performed over users' information and the set of information that will be collected from users. This is a metadata representation issue.
  • All the entities involved (in this case the on-line pharmacy and Alice) need to capture in a standard way the provenance information related to their actions. In that way, the analysis of the actions of the entities can be automated. This is a provenance content and management issue.
  • To effectively make entities accountable for misuse of information, we need to guarantee that the provenance information created by the involved entities implements some form of entity identification and provenance integrity. Then, if a problem is found in the processing of personal information, the right entity can be made accountable by checking its identity. At the same time, if provenance integrity is guaranteed, entities can be sure that the actions that they asserted are represented in the provenance information and any other entity was able to change it. This problem can be addressed by the use of cryptographic techniques, such as signatures to verify the entities’ identity and cryptographic hashes to check the integrity of provenance chains. This is a provenance content and management issue.
  • Provenance information created by the entities involved in a processing can be compared against the registered rules to verify if they used personal information in the right way. This is a provenance use issue.

Existing Work (optional)

Aldeco-Pérez, R. & Moreau, L. Provenance-based Auditing of Private Data Use International Academic Research Conference, Visions of Computer Science, 2008 [4]