RE: ACTION-406: Propose a new set of names around yellow state

Rob,

I'm not very supportive of the ISO definition in this regard but let's leave that definition alone for the moment.

I follow your description fully until you introduce a net new term in the description: "anonymization".  Why do this?  If you avoid this and use your definition of "identifiability", our definitions are far closer (which I believe is a logical conclusion based on your original definition of identifiability).

Thoughts?

-----
<unchanged>
Identifiablity:
Linkable is not the same as identifiable. To determine whether a data is identifiable, account should be taken of all the means which can reasonably be used by the privacy stakeholder holding the data, or by any other party, to identify that natural person.
</unchanged>

<updated>
De-identification:
De-identification is a process towards removing identifiability.

De-identified data:
De-identified data is data that is not reasonably identifiable to a natural person
</updated>  
-----

If these changes are accepted, then we'd slightly modify the body of the proposed text:

-----
<Moved to Normative Text - Updated>
The RED state data may contain data unaltered from initial collection. In order to go from the RED state to the YELLOW state, direct identifiable information MUST be removed to move the data to a de-identified state.  YELLOW state MAY contain information indirectly linked to an individual, computer or device, but in of itself is not identifiable. GREEN state data is de-identified and unlinked data and MUST NOT contain identifiable information. Any risk for re-identification of fully de-identified data MUST be regularly assessed and mitigated through Privacy Risk Management.
</Normative>
-----

- Shane

-----Original Message-----
From: Rob van Eijk [mailto:rob@blaeu.com] 
Sent: Wednesday, May 29, 2013 9:28 AM
To: public-tracking@w3.org
Subject: RE: ACTION-406: Propose a new set of names around yellow state


Following up on the call today, here are the notes that I put forward for the minutes. (too much to paste into irc).

PII:
This standard refers to the ISO 29100 (privacy framework) definition of personally identifiable information (PII):
any information that (a) can be used to identify the PII principal to whom such information relates, or (b) is or might be directly or indirectly linked to a PII principal.
NOTE To determine whether a PII principal is identifiable, account should be taken of all the means which can reasonably be used by the privacy stakeholder holding the data, or by any other party, to identify that natural person.

Linkability:
Linkability is about the ability to add new data to previously collected data

Identifiablity:
Linkable is not the same as identifiable. To determine whether a data is identifiable, account should be taken of all the means which can reasonably be used by the privacy stakeholder holding the data, or by any other party, to identify that natural person.

De-identification:
De-identification is a process towards anonymization.

De-identified data:
De-identified data is data that is not linked or reasonably linkable to an individual or to a particular computer or device.

To accomplished de-identification, I propose a 3 step model.
The mental model contains 3 types of data: red, orange and green data.

The RED state data may contain (a) and (b). In order to go from the red state to the yellow state, direct identifiable information MUST be removed, e.g. an email address or a phone number.
The YELLOW state data is partly de-identified, and MAY contain information indirectly linked to an individual, computer or device, e.g.
a partly de-identified but still linkable unique identifier, such as a hashed pseudonym.
The GREEN state data is fully de-identified data and SHOULD NOT contain personally identifiable information (PII). Any risk for re-identification of fully de-identified data MUST be regularly assessed and mitigated through Privacy Risk Management.

In order to move from red to yellow, or from yellow to green, one needs process the data. There are multiple ways to do that:

1. One example is based on concatenating a random number to the unique ID. This results in a lookup table of unique ID <-> random number.
Getting from yellow to red is braking the link (un-linkiability) by throwing away the unique ID. No new data can be linked to the un-linkable data in the green.

2. Another example is based on rotating hashes. Getting from red to yellow is applying the hash. Getting from yellow to green is braking the link (un-linkability) by throwing away the salt. No new red data can be linked to the un-linkable data in the green.

In terms of unlinkability versus de-identification it remains important to seperate the two concepts:
- de-identification helps in the event of a data breach, when a dataset is out on the street due to e.g a databreach. It is a way to address the reasonable requirements of an adequate level of protection.
- an adequate level of protection is completely different from unlinkability. Unlinkability is connected to the notion of personally identifiable information.

Received on Wednesday, 29 May 2013 16:49:59 UTC