Re: Deidentification (ISSUE-188)

I am still in favor of a short definition that makes it very clear what
we want to achieve in terms of limiting the data.  If folks want to place
additional requirements on a party, separate from the definition of the
state we want the data to be in, then I think that should be discussed
and agreed on separately.

To that end, I have replaced my proposal with the following:

   Data is permanently de-identified when there exists a high level
   of confidence that no human subject of the data can be identified,
   directly or indirectly, by that data alone or in combination with
   other retained or available information.

If adopted, we would replace all occurrences of "de-identif(y|ied|ying)"
in TCS and TPE with permanently de-identified.

Rationale:

I adopted David's "permanently de-identified" to avoid the association
with re-identifiable data and added "combination with other retained ...
information" to exclude holding onto a key for re-identification.

I replaced "user" with "human subject of the data", since we also want
to remove data provided by the user that (inadvertently) is about
others (what most statistic-based data trimming does automatically).
However, we don't want to remove data which might be about a human
who is not the subject (e.g., recording the number of distinct visitors
to my blog is data about the visitors, not about me).

I use "directly or indirectly" to indicate that this includes anything
that might end up identifying a human subject, no matter how.
If someone thinks we should have specific text about identifiers on
user agents or devices, that can be a non-normative example without
weakening this definition.


Cheers,

Roy T. Fielding                     <http://roy.gbiv.com/>
Senior Principal Scientist, Adobe   <http://www.adobe.com/>

Received on Tuesday, 26 August 2014 18:57:41 UTC