RE: ACTION-406: Propose a new set of names around yellow state

Following up on the call today, here are the notes that I put forward 
for the minutes. (too much to paste into irc).

PII:
This standard refers to the ISO 29100 (privacy framework) definition
of personally identifiable information (PII):
any information that (a) can be used to identify the PII principal to
whom such information relates, or (b) is or might be directly or
indirectly linked to a PII principal.
NOTE To determine whether a PII principal is identifiable, account
should be taken of all the means which can reasonably be used by the
privacy stakeholder holding the data, or by any other party, to
identify that natural person.

Linkability:
Linkability is about the ability to add new data to previously 
collected data

Identifiablity:
Linkable is not the same as identifiable. To determine whether a data 
is identifiable, account
should be taken of all the means which can reasonably be used by the
privacy stakeholder holding the data, or by any other party, to 
identify
that natural person.

De-identification:
De-identification is a process towards anonymization.

De-identified data:
De-identified data is data that is not linked or
reasonably linkable to an individual or to a particular
computer or device.

To accomplished de-identification, I propose a 3 step model.
The mental model contains 3 types of data: red, orange and green data.

The RED state data may contain (a) and (b). In order to go from the red
state to the yellow state, direct identifiable information MUST be
removed, e.g. an email address or a phone number.
The YELLOW state data is partly de-identified, and MAY contain
information indirectly linked to an individual, computer or device, 
e.g.
a partly de-identified but still linkable unique identifier,
such as a hashed pseudonym.
The GREEN state data is fully de-identified data and SHOULD NOT contain
personally identifiable information (PII). Any risk for
re-identification of fully de-identified data MUST be regularly 
assessed
and mitigated through Privacy Risk Management.

In order to move from red to yellow, or from yellow to green, one
needs process the data. There are multiple ways to
do that:

1. One example is based on concatenating a random number to the
unique ID. This results in a lookup table of unique ID <-> random
number.
Getting from yellow to red is braking the link (un-linkiability) by
throwing away the unique ID. No new data can be linked to the
un-linkable data in the green.

2. Another example is based on rotating hashes. Getting from red to
yellow is applying the hash. Getting from yellow to green is braking
the link (un-linkability) by throwing away the salt. No new red data
can be linked to the un-linkable data in the green.

In terms of unlinkability versus de-identification it remains important
to seperate the two concepts:
- de-identification helps in the event of a data breach, when a dataset
is out on the street due to e.g a databreach. It is a way to address 
the
reasonable requirements of an adequate level of protection.
- an adequate level of protection is completely different from
unlinkability. Unlinkability is connected to the notion of personally
identifiable information.

Received on Wednesday, 29 May 2013 16:28:56 UTC