TPWG F2F Breakout Group B

12 Feb 2013

See also: IRC log


vinay, Joanne, EricM, kulick


<npdoty> scribenick: Peter-4As

Frank: we have four questions for this discussion:


<fwagner> What term should be used to describe what is out-of-scope for DNT?  “De-identified”, “unlinkable”, some other?


<fwagner> The FTC definition of de-identified is reproduced below.  Are there any changes from it that should become the normative text for DNT on this topic?


<fwagner> What are some examples of technical measures that clearly ARE or ARE NOT strong enough to meet the de-identification standard?


<fwagner> When, if ever, should pseudonyms be permitted for information held in de-identified form?  Is that the same as asking when a unique or persistent identifier should be permitted?

Let's step through each and put the FTF language (missing on question 2) back in. Does this work for everyone?

Frank: need to define the playing field for DNT.

Nick: the first question is what we should call it.

Joanne: let's tackle the definition then the term.

David: we haven't defined tracking or what's in scope. So there is confusion around what is in scope.

Vinay: let's think about what we should consider "out of scope" which will help determine what's in.

Frank: if there is no personal data it would be out of scope.

<npdoty> data is not “reasonably linkable” to the extent that a company:

<npdoty> (1) takes reasonable measures to ensure that the data is de-identified;

<npdoty> (2) publicly commits not to try to reidentify the data; and

<npdoty> (3) contractually prohibits downstream recipients from trying to re-identify the data.

Nick: part of the reasoning as to why de-identified or unlinkable would be out of scope. It wouldn't be personal or subject to regulation. Consider the FTC approach
... FTC defines privacy framework first as scope. As a scoping mechanism. Data that is not linkable doesnt require a simplified consumer choice.

David: if all you have is a unique randomly generated identifier and the only associated attribute is a "0-1" is that in-scope or out of scope of DNT?

Nick: let's not consider every hypothetical. Better to ID what is out of scope.

Frank: if there is individual ID in it then it is linkable to a user.
... if there is static information where every user has static info and its not linkable to a user.

David: you might not know any data associated. Just the ID is not enough. It's the data associated with the iD. Isn't it a problem of what data is associated with the ID?

Vijay: I see what you're getting at.

Paul: This particular device might be You need to know a little about where you're going with the definition. It's fine to say that every cookie has the same number then we're done. But that's not the totality of the problem. If you say unlinkability implies a standard that we're not addressing maybe you go back into de-identification so that you manage data in a socially useful way.

Nick: the FTC's definition was their attempt at this. Reasonable measures to de-idenify.

Paul: These are useful buckets and these are small tools but these are reasonable measures. Prohibitions against re-id. All of that.

Vinay: when FTC references the data what are they referencing by this? Data used by profiling? Technographics?

NIck: it's consumer data in all commercial contexts. Does that answer?

<vinay> yep, Nick, thanks!

David: This particular definition is in the context of risks of de-identifiying data. Between PII and non-PII. Presumably this is data w/o a risk? Data that could become PII?

Mark: the objective of the report is to say what is in the framework and what isn't. Identifiable to an individual or device or neither. The only purpose of that discussion is a "new framework for privacy". What they were saying is that to the extend its neither identifiable to a person or device it would be outside.

Nick: this section is just that. Not transparency or choice.

Frank: what is the best term for scoping then? De-ID data? Unlinkable? Anonymous?

Nick: which do you prefer:

Frank: From an EU perspective anonymous is preferred.

David: we all agree that "elephants' are out of scope. Anonymous data would be too. The problem is trying to draw the line or middle ground

Paul: conceptually you could draw buckets around out of scope. The data manager would know he/she must de-ID, etc. That would work. Doesn't free them from obligation. But this spec deals with this and over there you're dealing with mitigation.

Frank: On the end you have de-ID that you can do whatever you want. No risk. The FTC rules that define the conditions for handling is in the middle, the managed process.

Nick: in the group yesterday David you drew a dimension: a unique ID, de-ID layer and aggregate. If we had agreement from WG that de-ID meant it was entirely out of scope that would simplify compliance.

It wouldn't matter what other data you were keeping. If you were satisfying the de-ID

David; yes but it would depend on what you're applying it to.

Nick: if we think this would apply then I say great.

Frank: what is the difference between de-ID and anonymous?

David: de-ID data maintains consistency of individual records but can separate from individual and device

Vijay: the problem with anonymous is: knowing if someone is male or female, for example. You can't identify the person. Let's describe de-ID as the process. Then further define what anonymous is. Otherwise you'll have confusion.

Brad: second that. I think of anonymous you don't even know who it was. De-ID you make known data anonymous where you don't know who it is.

Nick: among researchers, there has been a strong reaction against anonymous. We tend to avoid using the term alltogether.

<vinay> +1 to what Nick said

Paul: the example you just gave. That assumes that data minus name/address is released to wild. Not necessarily how data is managed. You can strip these out, put unique ID on it, layer access control, etc. Real world data managment. End is result is very small that it would be re-ID'd to known individual.

You need to look at the rest of the conditions associated with the data. There might be a meeting point between this and the people that manage the data for a living. Doesn't assume that all data is released.

Frank: I won't fight for the term anonymous. But are we in line with EU requirements and other requirements?

Paul: this is one of the appeals of broader definition. It gets to an acceptable EU result.

Frank: on the other hand, if de-ID means anonymous then it's no problem.

Mark: from an EU perspective what do you mean?

Frank: no possbility to link back to person, device or machine.

Mark: no posiblity. But we're talking about reasonableness.

Paul: the anonymization test is such attribute would require time and effort. All of these protections around the unique ID and it's anonymous.

Nick: is disproportianate time and effort too much>

The reasonableness applied to disproportionate.

Frank: okay, but these are also in EU law. In the whole privacy protection framework. We have the conequence that you must go into contract if you go into data and write down strictly what has to be done with the data. If not defined, not allowed in contract.

<npdoty> sounds to me like the "cannot reasonably be linked" standard is very aligned with the European "disproportionate time and effort" standard

David: does this mean that in either regime that you could argue that thte cookie ID tied to log data and behavioral data is already sufficinetly de-ID'd if there are aready controls?

<Joanne_> +1 to Nick. that is my understanding

Nick: cookie ID is linked to device

David: the risk of identifiabiliyt is low.

Nick: but the whole point of cookies is to ID back.

Paul: at least in US law, agencies move from identifying natural person to device it's with a particular goal. COPPA order: of course the child is carrying the cell phone. Device and individual. Alot of other circumstances where they're not concerned about reaching the child. It's market research, product improvement, etc. You can work with re-ID data. An idea to explore is whether concept of de-ID can assign permitted uses. Does that make sense?
... trying to get hands around FTC person and device others just person. That's the privacy issue/interest. It's not my computer it's me I'm worried about.

David: unlike a desktop, mobile is very much associated with an indvidual. Scenario just cookie ID to maintain state of browser is distinct from device identitication.

Nick: device might have multiple browsers, cookie IDs. User, user agent or device is what's been written into compliance drafts.

David: Isn't that the problem with device being associated with human individuals.

Paul: yes, an individual carrying them around so very much identifiable.

David: so very much a risk of identifying versus a cookie ID (lower risk)

Nick: because UDID is permannet.

David: permanent yes, and to an individual. Carrier records, Versus ad network to re-ID.

Frank: depends on use cases. Typical that mobile is single-use. ID to one person. Everybody has own device. This is personally relatable.

Cookie scenario is not exactly the same. Desktops are shared by household. Not as clear. Could be a coookie is assigned to person. You don't know it as ad network. Is it me or my wife?

David: yes, less user granularity. But also, not necessarily, an association of this cookie to PII. Unless you matched it and created the record.

Personally identifiable, authenticated. Higher risk of tying back.

Paul: we could, in the same way we measure a reasonable person, device could be a variable in risk assessment?

David: on a scale of risk, actual PII is high, cookie ID is lower

Paul: one of the risk factors to consider is whether unique ID = person. That's intriguing. It actually draws distinction between getting a grip on this area on a shared device not associated with an individual (low risk profile) than mobile. Analyst must consider in evaluating risk.

Nick: what David and Paul are talkinng about is that it can be linked to my name/address. Seems not to be FTC approach -nor of the working group or NAI for opt-out cookies. Not a PII opt-out. It's targeted to this device or cookie. Consistent worry among all three.

Paul yes, because they have specific use cases. COPPA = child. Who might carry cell phone. Birth of FTC paper is behaviorally targeted advertising and level of choice. Device is lumped in there. Like it's sort of PII. Not true of EVERY use case however. Not really a device/person linkabiliyt. Or a risk of privacy invasion.

David: the tracking mechanism is often the cookie. The available mechanism for ad data collection and state maintenance.

Nick: what are we going to rule as out of scope?

Frank: I have clear view on this. I would assume. We are fine with term "de-ID" but not really clear picture of what it means TO de-ID. If it's directly related to a person like a UDID recorded by a mobile provider. Linkable to me as a person. OBA scenario, based on cookie, David says there's no change to link to a person. You can't aggregate and link to customer because you can't resolve relationship to IP address, cookie, the chain of re-identification.
... is this data within scope because it is more or less handled by DNT mechanisms? I'm not sure.

David: how about the very low risk of being idenfitied Start there.

Nick: like user agent or device.

Paul: but the device might be the point of division.

David: whether it should apply?

Nick: data collected about cookie ID on my user agent and people see ads that are eerily correct or feel creeped out. We're back to the same user agent even if no guarantee it would reach the same person or reveal your name.

Frank: do you think that this data based on cookie, behavioral info has this data to be managed to become de-ID? Or is it directly de-ID?

David: depends on what is in the data. The elements. How high the risk of re-ID is. Cookie ID, male, northeast US, etc. very low risk. But add more data and the risk gets higher.
... define behavioral profile. When we talk about it we mean a set of interests associated with a profile. Segment codes that are translated to "skiing enthusiast". Extremely low risk of ID to an individual. Whereas a history of Websites, searches, etc. has a HIGH risk.

Nick: yes but cookie ID could be re-associated. Same way FTC talks about user agent.

Pau: very specific use cases. People that want to control behavioral. Having a state preserved for next use. Have to make definition "identified device" driven by the underlying goal. That's what FTC thinks. Not being able to re-ID device for every possible commercial use.

Nick: it was just a scoping question for them.

Paul: I'm reintroducing concept of different definitions of "out of scope" categories for sets of use cases in same way we created a partial white list of business uses. Similarly, you could scope out uses here if purpose of spec is not to address every single use case. These goals of what we're trying to accomplish. Easy mechanism for "don't send me targeted ad" we spell out. But not for every piece of commerce.

Peter: but there are other ciriticalactivities such as ad performance, financial reporting, analytics, etc.

PauL; yes, you can do your reports, audits, etc. and avoid inadvertent mistake of sweeping aside all commerce as being out of scope.

Paul: what we should address are things that create a "privacy harm". Folks haven't identified what that harm is in all the use cases we've discussed. If this is more than just serving an ad. Minimize the risk of identifying an individual. Then you have solved the use cases by addressing this fundamental privacy interest. A definitional tool that helps address all the "other' use cases and grounded in alot of US nd EU authorities have said. So it might [CUT]

David: do we even have consensus though? One of the big concerns is data with alot of PII could be re-ID'd. So we've talked about dropping URL parameters, for example.

Nick: some of the concerns expressed relate to ability to personally identify

David: yes, within the group I agree. BUt I don't know that there is consensus in outside policy circles. About what concerns that need to be addressed. Alot of diff concerns.

<kulick> agreed

Nick: third party retention, use of data that IS linkable to user agent or device.

Marc: but if it leads to DNT = 1 we don't have a conversation.

And in between that are permitted uses.

Nick: suggestion is not that we'll prohibit anything outside of de-ID data. Suggestion is that third party collection, retention and use.

lMarc: even if you look at FTC three-pronged standard, not re-linking, contractual obligation for downstream, etc. Reaonsable expectation that it's not linkable back to person, device or user agent?

David: are we agreeing that at least device (UDID) is sufficiently tied to an individual, and thus NOT de-ID'd?

Thanks, Nick!

David: we could treat as PII. We could agree that data associated with UDID has high risk of ID to individual. Obviously, "user" as equivalent to 'individual" then same applies.

Nick: are we good with the 'three-pronged' rule per FTC?

Paul: I'd say that UDID is too close. Not sure this would persist for future. Hard to say device is anything other than risk assessment. Very fluid. Keeps evolving.

Nick: we're trying to rule out large things from scope. Would you agree that if it can't be linked back then it's out of scope?

David: because we haven't defined tracking -despite yesterday's productive conversation- we haven't really discussed concerns about particular data uses. This accumulation of browsing history or general concerns in the privacy world a s whole such as re-ID'ng to a parficular indvidual. You develop a risk profile where you consider the risk of threshold based on

sensitivity. I get stuck without discussing specific examples.

Marc: framing another way, what is perplexing about Ed Felten's presentation or a binary lens of "in/out" of scope... Extremely senstitive data (US ID theft dB for example; people already victimized). IF you did a risk analysis, admin, physical, technical analysis. Must consider other elements of the analysis.

Nick: a responsible data manager will do all of these things. Alot of calculations are involved, yes.

Marc: you are then dis-incentivizing responsible use of data. If you don't take into account these other aspects.

Frank: we are talking about having a user say "exclude me please" from something.

Marc: but what is the "something"?
... the user can express a preference but from what?

Frank: yes, then we must see a tracking definition.

Marc: we're allowing a consumer to express a preference to be excluded from something. What we are discussing now is what is to be excluded from the "something".

Frank: then the question is DNT means there is a switch. And what happens when ON or OFF. What kind of information goes over the switch? What information goes around the switch that is in scope?

Marc: consider IP address. This has to go to the receiving entity. Is IP outside of DNT scope? You couldn't operated the Web without IP in the http request. Would this then not be part of the "something"

Frank: it depends on context [all agree]

<npdoty> Nick is lost.

<vinay> so is Vinay.

Marc: depending on context, IP address may or may not be included.

Brad: this would assume that DNT happens immediately. Is there a difference about getting the data but then not storing or processing? Timeline in which things occur: from de-ID to apply some treatment of DNT.

David: a time dimension to tracking?

Brad: yes, that's more succinct.

Frank: if you get DNT=1 raw data can become re-identified data because there is low risk of this?

David when you get a DNT=1 something happens. Probably curtailing the use of the data. This may include putting the data through re-ID process.

<Joanne_> time check?

<npdoty> I believe we are to wrap up by 1pm for lunch downstairs.

Frank: three levels of risk. UDID asignable to person, then user device then cookie ID is lowest. Will all three cases be part of the management process for DNT=1?

<npdoty> fwagner: would you keep the cookie id in the de-identified records?

<npdoty> dwainberg: yes, for the concern of accumulating the browsing history, if you reduced the data, you would keep the cookie id

<npdoty> ... regarding suppressing lifetime browsing history, wouldn't be sufficient to keep the cookie id with the URLs

<npdoty> paul: wouldn't want to limit de-id to "user, user agent or device", though I would be okay with "to an identifiable individual"

<npdoty> paul: should use a standard of risk assessment

<npdoty> joanne: and the risk assessment should take into account the data

<npdoty> fwagner: agreement on deidentification (FTC-style) to individuals, but not agreement on user agent or device

<dwainberg> De-identification is a process to lower the risk of a dataset being re-identified to a natural person.

<npdoty> ... preference for "deidentifaction" and against "anonymous"

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.137 (CVS log)
$Date: 2013-02-12 18:27:38 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.137  of Date: 2012/09/20 20:19:01  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Found ScribeNick: Peter-4As
Inferring Scribes: Peter-4As

WARNING: No "Topic:" lines found.

Default Present: vinay, Joanne, EricM, kulick
Present: vinay Joanne EricM kulick
Got date from IRC log name: 12 Feb 2013
Guessing minutes URL: http://www.w3.org/2013/02/12-dntb-minutes.html
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.

WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report

[End of scribe.perl diagnostic output]