TPWG F2F Breakout Group D

12 Feb 2013

See also: IRC log


yianni, Jeff, johnsimpson, Dan_Auerbach, BerinSzoka, Aleecia, vincent


<johnsimpson> what phone is in room? heard dan well…not so clear from room

<Yianni> http://www.w3.org/wiki/Privacy/DNT-Breakouts

<BerinSzoka> I just joined. could you send that link again?

yianni: First topic — what term should we use to describe "unlinkable" / "deidentified" / etc.?

… Concern is that there's always a chance of reidentification since always a chance.

<johnsimpson> someone typing near telephone should mute

<BerinSzoka> unlinkable = unfair & deceptive trade practice! ;)

<BerinSzoka> much like "Do Not Track"...

<BerinSzoka> link again, please?

robsherman: Deidentified.

<BerinSzoka> how about "Anonymish?"

<BerinSzoka> or "Pseudononymish?"

<BerinSzoka> Or... Deidentifish?

What's the difference between "deidentified" and "anonymous"?

<aleecia_> perfect for all your phishing needs

… "Deidentified" covers both unlinkable and what could be relinkable

yianni: Anyone on call have a view on this?

<dan_auerbach> I don't feel strongly either way

<dan_auerbach> but the substance matters more than the word we use

<aleecia_> If we've started, note on the phone I cannot hear well at all

<aleecia_> And we appear not to have a scribe

<dan_auerbach> i'm in your situation, aleecia_

Thomas: If deidentified is closer to anonymous data, then deidentified is less able to explain "unlinked." If we use "unidentified," then we are closer to both ideas - unlinkable but also acknowledge the possibility of reidentification.

Rachel_Thomas: Are we getting too far into semantics?

<BerinSzoka> again, could you please send the link again to the questions? some of us joined the IRC after it was shared

Thomas: We need to decide how far we are able to go and come up with a way to describe it.

<aleecia_> If someone in the room could please scribe, those of us on the phone might be able to keep up

<BerinSzoka> AMEN, Rachel

<aleecia_> thank you, Rob

Rachel_Thomas: Deidentified implies that you have taken reasonable steps to unlink; unlinkable is an impossibility.

<aleecia_> note that unlinkable means something specific and different in the EU

yianni: robsherman made this point - we've done this in the HIPAA context and elsewhere. What we say in the text is reasonableness.

<BerinSzoka> could whoever's typing move away from the mic? it's loud

Thomas: Explain that there's a small gray area that's not completely anonymous.

Yianni: "Reasonable deidentification" conveys that there's always the chance of reidentification, but has to be a low chance. Are you okay with that?

Thomas: Yes.

<aleecia_> "deidentification" to me suggests after a process.

Dan_Auerbach: Can we all agree that whatever word we use shouldn't inform the details of the process or standard we agree upon?

Rachel_Thomas: What do you mean?

Dan_Auerbach: Don't feel strongly about terminology, but I do care about substance.

Yianni: Agree. Just because we use the word deidentified or unlinkable.

Rachel_Thomas: But they're different concepts - we're trying to decide whether working toward deid or unlinkability.

robsherman: I think consensus is that we're not going for theoretical unlinkability but for reasonable delinking.

<dan_auerbach> i do not suggest we go for theoretical impossibility

yianni: Seems reasonable. Should we look at FTC language?

<Yianni> http://www.w3.org/wiki/Privacy/DNT-Breakouts

<BerinSzoka> #DoNotCough

<Yianni> http://www.w3.org/wiki/Privacy/DNT-Breakouts

<dan_auerbach> haha

<johnsimpson> apologies was called away from phone briefly

… What's wrong with this language? Why shouldn't we use it?

<aleecia_> FTC:

<aleecia_> data is not “reasonably linkable” to the extent that a company: (1) takes reasonable measures to ensure that the data is de-identified; (2) publicly commits not to try to reidentify the data; and (3) contractually prohibits downstream recipients from trying to re-identify the data. Commission's definition of "de-identified": "First, the company must take reasonable measures to ensure that the data is de-identified. This means that the company must [CUT]

dan_auerbach: FTC language is a good starting point. Makes sense and not too prescriptive. Also favor adding to it the idea that there should be privacy penetration testing.

<aleecia_> cut off, once more: "First, the company must take reasonable measures to ensure that the data is de-identified. This means that the company must achieve a reasonable level of justified confidence that the data cannot reasonably be used to infer information about, or otherwise be linked to, a particular consumer, computer, or other device."

… Having people try to reidentify data. Has to be some sort of normative way to distinguish companies that just try to hash IDs and companies that make a more serious effort to it.

rachel_thomas: Sounds like you're onboard with the idea as long as we come up with some clarity around what deidentification is?

Dan_Auerbach: Yes.

?: Should we be requiring people to publicly commit not to reidentify?

yianni: I think that's what we are trying to do here - get people to publicly commit to follow a standard.

bryan: Privacy policy.

robsherman: Much cleaner to not have lots of individual "public commitments," as FTC would have under Section 5. Let's just agree on what's required — for example, not reidentifying data — and treat server response as the public commitment.

yianni: Is it better to define reasonableness here? Have examples?

Rachel_Thomas: There's already a legal standard for "reasonableness."

Dan_Auerbach: Helpful to have examples.

<rachel_thomas> For reference (since we're also looking at the FTC language) here is the DAA definition of De-Identification Process: Data has been De-Identified when an entity has taken reasonable steps to ensure that the data cannot reasonably be re-associated or connected to an individual or connected to or be associated with a particular computer or device. An entity should take reasonable steps to protect the non-identifiable nature of data if it is dist[CUT]

<rachel_thomas> (cont) Affiliates and obtain satisfactory written assurance that such entities will not attempt to reconstruct the data in a way such that an individual may be re-identified and will use or disclose the de-identified data only for uses as specified by the entity. An entity should also take reasonable steps to ensure that any non-Affiliate that receives deidentified data will itself ensure that any further non-Affiliate entities to which such dat[CUT]

bryan: Regarding examples, that's something that advocacy groups or public sites will do as far as creating best practices. Maybe something that could be documented through W3C community group process, webplatform.org, etc. Agree with Rachel that we should limit non-normative language within spec itself.

<aleecia_> I think examples are helpful

<aleecia_> Asking people to implement this without context is hard

johnsimpson: Probably don't want to define "reasonable" in normative language.

<rachel_thomas> i don't feel strongly about not having examples (just depends on what they are)

<aleecia_> And we likely do NOT want to hard code what is required

… We do have specific examples, and it would be tremendously helpful to have that language included in a non-normative way.

<aleecia_> +1

<dan_auerbach> +1

robsherman: Agree w/ aleecia that hard-coding technology is really hard to implement and will break the standard 5 years from now.

<aleecia_> Could put them into an appendix if they clutter up the text

yianni: What examples do people have as "clearly good enough" or "clearly not good enough"?

<aleecia_> Not good enough: removing names, removing unique ids

Sam: Looking at Ed's examples, he talked about admin/procedural controls that leaves database or is reported outside of controlling entity. Focusing on that would be interesting. Looking at 3rd parties, how they anonymize and report out, that's really what we're talking about in terms of protecting privacy.

… Lots of technologies to do that. Will be different for every entity. So I'd like to focus on what are the controls that keep identifiable info from leaking out.

<Yianni> ack: dan-auerbach

… No specific examples.

<dan_auerbach> whoops seems i've dropped off the call

<dan_auerbach> will try back in a moment

rachel_thomas: In IRC, put in DAA language on deidentification. We don't have specific examples, but text is helpful in terms of explaining what's meant by de-identification.

<aleecia_> (The DAA text isn't bad)

unmute aleecia_

<dan_auerbach> agree that DAA text is OK, though FTC seems a little better to me

<aleecia_> I'm following via scribes - cannot hear well

<aleecia_> I'm guessing I was unmuted for a reason I missed...

rachel_thomas: [reads through DAA language]

<rachel_thomas> i'm quoting from https://www.aboutads.info/resource/download/Multi-Site-Data-Principles.pdf

yianni: Any specific technical measures?

<rachel_thomas> page 8 :)

<aleecia_> Ah, sorry. Rachel will be better taking that than I would anyway, but I note it's similar to FTC's in large measure. Same direction.

… Any comments on Ed's presentation about hashing or k-anonymity not being a good method?

Sam: I'm a big opponent of hashing. It's great but has its limits. If you're using it to anonymize or other deidentify, some day it will be broken.

… Have to come up with better ways of doing this long-term.

rachel_thomas: We don't need to identify specific standards.

<aleecia_> My hope is that DAA's members won't have to change much, though from Shane's questions, presumably Yahoo! would

<dan_auerbach> (i'm unable to rejoin the call -- it seems to be on W3C's side? -- so will follow via scribe

<aleecia_> (Dan, me too)

<bryan> robsherman: talking about specific tech is a mistake, also demonizing hashing, there are good uses in the de-id context

<vincent> it strongly depends of how often you change the seed you use to hash

<aleecia_> Vincent - and the richness of data collected

Sam: Good examples of how hashing works, but it's in the context of a specific data set. When you become able to correlate, things break down, so hard to say whether a particular technique will work.

<dan_auerbach> here's an example of something which is NOT good enough: a wide table keyed with pseudonyms (say, hashes of cookies), but which also has timestamps, urls, etc

bryan: This is a point in time where anything we write on technology today will be superseded tomorrow. Let's establish a reasonable expectation, let technologists figure out what is reasonable today, and not put it in the spec.

johnsimpson: If we know what works, it should be cited in non-normative language as an example.

… Also, in normative language, we should require that there be transparency about method you use to hash.

<vincent> for the record, Google used to change (still does) the key to hash search logs everyday (see: http://searchengineland.com/anonymizing-googles-server-log-data-hows-it-going-15036)

… People need to understand what you're doing.

rachel_thomas: Agree strongly for the sake of consumer privacy and data security. The more specific you are about specific methods of protecting data, the less secure they are.

<vincent> and we're talking about a first anonymizing its search logs, not a third party (imho third party should provide stronger garantees)

<aleecia_> security through obscurity has failed time and again

… That's why the law has been comfortable with the idea of not describing specific methods.

<dan_auerbach> I agree with bryan there shouldn't be normative language specifying technology, but non-normative examples are helpful

<BerinSzoka> hey, John, how about #DoNotSnark?

vincent: We say data cannot be used by the actor to reidentify. But don't say it can't be used by ANYONE.

… If I share data with another actor, it might use the data to reidentify.

<aleecia_> Berin, I take it you've decided not to honor that? (...says the pot to the kettle)

<BerinSzoka> as John said: "whatever" ;)

yianni: Do you mean just contractually - have you have contractual promises not to reidentify?

<aleecia_> Vincent - I could imagine daily not being enough

vincent: Just talking about language of the draft.

yianni: I'd hope contractual language already covers.

rachel_thomas: FTC language already says this.

<vincent> ok, the barebone does not reflect that :"contractually prohibits downstream recipients from trying to re-identify the data. "

<aleecia_> currently not hearing any objections to the FTC defn (and not hearing, so perhaps missing things vital)

<aleecia_> could we just adopt FTC lang and add examples?

robsherman: Wouldn't need to have specific language that says this; already covered by "reasonableness" precedent.

<dan_auerbach> aleecia_: that'd be my favored approach

rachel_thomas: DAA makes the same point. We're in violent agreement here.

<vincent> aleecia_, actually I'agree that every day is not enough

<aleecia_> any objections?

yianni: Next question - on pseudonyms.

<dan_auerbach> however, the examples should be real

<aleecia_> can we actually just nail this down (for the group) right now?

Thomas: From my presentation today, there's a split between anonymous and pseudonymous. Anonymous is nearly absolutely impossible, and with pseudonymous someone has the key to reidentify.

yianni: [summarizes Dan's comment from IRC].

<dan_auerbach> regarding pseudonyms, it's dangerous to think of data that way because it leads to the false impression that only certain fields are the "identifiers", whether they be real names or pseudonyms

SamS: Comes down to a contract that says, if I get data, I won't try to reidentify it. Even if it's pseudonymous or a weak version of anonymity, as long as I protect data, don't leak it, and don't reidentify, it's equivalently the same thing. But the burden is on me to be sure I have the right admin controls to be sure all of this happens. Ultimately that's where we are going.

<dan_auerbach> in fact, each field will have some bits of identifying information

<vincent> where are we standing on "hashing" do we agree that it's not enough or it's the opposite?

yianni: You think contractual language does a lot?

SamS: Yes.

bryan: Want to be sure we're still talking about a third party.

<vincent> I think there is a misunderstanding about pseudonym definition

<aleecia_> right now i cannot imagine a straight up pseudonym helping

<dan_auerbach> i want strong guarantees on the data itself, not just contracts

<vincent> agree with robsherman

<aleecia_> replacing one GUID with another does not help

<vincent> for instance cookieID is a pseudonym

<rachel_thomas> From Thomas' presentation this morning: “Pseudonymising” shall mean replacing the data subject’s name and other identifying features with another identifier in order to make it impossible or extremely difficult to identify the data subject.

robsherman: (Clarifies use of pseudonyms in third party context)

<dan_auerbach> aleecia_: especially when that guid is linked with tons of other identifying information

<vincent> IP address+ User Agent could be a pseudonym

<aleecia_> yes

<rachel_thomas> Pseudonymous info is...Unique identifier does not identify a specific person, but could be associated with an individual. Includes: Unique identifiers, biometric information, usage profiles not tied to a known individual. Until associated with an individual, data cannot be treated as anonymous.

<aleecia_> and that's my problem with hashing (and agree, rotation helps there, potentially)

Rachel_Thomas: Capturing info from Thomas's presentation this morning.

<vincent> imho, hasing a cookie ID does not bring a lot of garantee (if any)...

<aleecia_> +1 vincent

<dan_auerbach> getting into the mentality that certain fields are the "identifying" ones (e.g. a cookie, hash of ip+ua) is a mistake

David_Stark: Thought the conversation this morning was excellent. At my market research co, when people who participate in a survey, we assign them a pseudonymous identifier that allows for quality control. Without this, you'd have know control. Just anyone could come in and respond many times - fundamentally undermining data quality.

<dan_auerbach> if i visit the domain, webmail.danauerbach.org, that field becomes identifying

<BerinSzoka> whoever was typing but just stopped--was REALLY loud

… Only have identifiers of panel members and their numbers. Researchers have access to survey responses. But nobody in our company has access to individuals and their responses.

yianni: You're bringing up the point of administrative controls, which Sam also mentioned.

<aleecia_> as a suggestion: you could have a marker in a cookie of the number of times people have taken a survey, rather than the unique id on the person

… Currently, FTC language doesn't mention this. Should we address this in non-normative language?

<aleecia_> that isn't so bad when you're only dealing with one company, rather than trying to push out a change across multiple parties working together in an ecosystem

David_Stark: Great idea, if we can provide examples.

<bryan> with proper access controls on pseudonym mapping, I agree that this meaning of pseudonym supports the goal of de-identifying data

… Makes a lot of sense. We see that language in data protection laws - admin, technical, and physical controls. Let's say that.

yianni: Any disagreement?

<aleecia_> with?

… with the idea that organizational measures should be taken into account?

<aleecia_> for first parties, not an issue

<aleecia_> for third parties, not sure why we'd take them into account?

… One danger I can see here is organizational measures is the government problem. Even if you have great organizational measures, the govt could come in and reidentify.

<aleecia_> i think the conversation was about a survey people agreed to take, which makes for first party issues

robsherman: Shouldn't have one standard in normative and a different one in non-normative

<aleecia_> the first party might release data later and want to de-id, but no need to worry about org measures then

<aleecia_> not seeing how that's relevant to us here

vincent: One scenario people are considering is the rogue engineer.

… Someone has to have access to the data, so the org measures don't really matter.

<aleecia_> and again, my apologies for not being able to hear.

rachel_thomas: Are you suggesting different standards for protection of data for consumer privacy and for government access?

yianni: No - I don't think you can do that separately.

rachel_thomas: We can't get outside of reasonableness. You can take action against a rogue employee and that's already covered.

yianni: Would you say that admin controls go into the concept of "reasonableness"?

rachel_thomas: Yes.

<vincent> aleecia, bridge is not reachable?

Sam: We already do all of this.

yianni: There's going to be some pushback for including organizational measures within the concept of reasonable technical measures. Govt access.

… Does that mean that admin controls shouldn't count as a part of technical measures?

<vincent> rachel_thomas, you can make it very very hard (if not impossible) for someone to technically reidentify the data

<aleecia_> on bridge. call quality is less than ideal.

<vincent> not just "reasonably" hard

<dan_auerbach> vincent: i am unable to reach bridge at all

SamS: It's pretty basic - administrative measures prevent people accessing data to reidentify.

<aleecia_> it's like hearing the words without any spaces

… If I can get at data in a way that bypasses proper controls, I can maybe reidentify.

bryan: It has strong value. It's something we do for a variety of regulatory requirements already. Tons of data and admin controls are only way to meet those requirements.

<dan_auerbach> to clearly state my opposition to this, administrative controls are NOT enough

<dan_auerbach> we need real de-identification in the data itself

MikeZ: Some people have this in place. Small companies. Which is why you have a sliding scale in FTC standard. We're solving for the web, not just a handful of big companies.

yianni: That's the problem with having specific examples in text.

bryan: Same goes for sophistication of tech approaches that we mandate.

<vincent> aleecia_, dan_auerbach have you retried? call quality is fine for me

<johnsimpson> quality ok for me, too

<dan_auerbach> vincent, i've retried several times but am just unable to connect, but will do so one more time...

<aleecia_> did get back in. is somewhat better

<aleecia_> still scribe-dependent but that's ok

robsherman: Concept of reasonableness encompasses admin, tech, and physical practices, and the specific steps might vary based on circumstances.

<dan_auerbach> vincent, now connected but call quality is quite bad

yianni: Would having specific examples of what's appropriate be helpful in bridging the gap between John Simpson's demand for transparency and the need to keep it security.

<johnsimpson> +q

rachel_thomas: Security information is kept secret because disclosing it gives hackers a way in.

… Let's protect the information in the most effective way.

<aleecia_> getting enough of this - depending on "we don't tell people about our security measures" does not offer real protection

bryan: Also opens the door to social engineering around company practices.

johnsimpson: Not calling for "putting the secret sauce of everything you do out there."

<dan_auerbach> i do think we need transparency here, +1 to johnsimpson

… What I think needs to be explained publicly is the category of measures that are taken. Not enough to say "reasonable".

rachel_thomas: What would be reasonable in your opinion?

<aleecia_> if we had a high, specific standard transparency wouldn't be necessary, but that would break future-proofing

johnsimpson: Difference between security and fraud, where getting specific could tip your hand. But you could say, if you believe that hashing is reasonable, you can say you rely on hashing.

… You can describe techniques that would provide meaningful insight without giving away the store.

rachel_thomas: But you've just narrowed the world in terms of what a hacker needs to think about.

<dan_auerbach> +1 to aleecia

… You're narrowing a bunch of other things a hacker will need to think about to get into your system.

<vincent> rachel_thomas, I'd argue with taht

<SamS> +q

<aleecia_> anyone who can break your hash will be able to identify that's what you did without disclosure

vincent: Transparency doesn't provide a solution for hackers to break in. I don't see how it would allow someone to break into your system.

<aleecia_> how useful is this detail to anyone making privacy choices? likely not very. so while i disagree with Rachel on this one, i'm also not strongly pounding the table to support John

SamS: Let's say I published a privacy policy on my website that showed all of the encryption techniques that I use to anonymize data. How does that actually protect the data that I have?

… Will consumers read it and say, "They use a lot of encryption? That's really good."

yianni: So you're saying there's no information that is worth releasing.

SamS: Saying that you use reasonable technical measures to protect the data - which is what the FTC requires - then that's reasoable. And if I have a breach, people will hold me accountable.

… But using Encryption v.1 vs v.2 isn't going to make a difference.

<rachel_thomas> agree with aleecia!

aleecia: I think you're not giving away the store to announce which of the few available measures you might be taken. In terms of whether end users will understand the difference, no. If it were in a privacy policy, no.

<rachel_thomas> consumers won't know what the techniques mean anyway.

… But what this could be helpful for is to require companies themselves to write down what they do and review it periodically to make sure it's what they do.

… There's no state secret or competitive advantage.

<rachel_thomas> internal policies are different from public policies - no one is saying that companies can't review their internal policies to ensure that they're keeping their practices up to date.

… Most value of privacy policies these days is for internal analysis.

<aleecia_> smaller cos will not

rachel_thomas: Agree with aleecia that consumers won't understand what they are reading in privacy policies. I'd also note that there are endless internal policies for companies that describe in much greater detail what you do. So I don't think that leaving it out of privacy policy omits the process of internal analysis.

<aleecia_> major cos will

<dan_auerbach> rachel, "privacy and data security" is too broad to cover what we're after, which is ensuring data is de-identified

vincent: Maybe a few users might like to know this, and this data might be useful. For example, you might want to know if data you deleted can still be linked back to you.

<aleecia_> having "if you share de-id'ed data, state what you do to ensure it's actually de-id'ed" is a small ask

<vincent> disclosing an alogirhtm without data won't help hackers to break in, just help the community to assess the security of your algorithms

robsherman: We need to be clear about what problem we're trying to solve. User transparency? Challengeability? Forcing internal analysis? I tend to think over time it's bad to disclose security practices because it creates a vulnerability.

<SamS> +q

<aleecia_> so, i'd say we don't have agreement here, but could have agreement on adopting the FTC text and adding examples

… Beyond that, the more transparent you are — sufficient to allow people to test vulnerabilities — you actually become more vulnerable.

<dan_auerbach> i'd like to voice my strong objection to conflating "security" with the specific area of de-identification

<aleecia_> can we do that and leave transparency for the full group? if we're split here, odds are we're split with more people too

<aleecia_> not sure agreeing here has much value

<dan_auerbach> they are very different things

vincent: Google talks about how they anonymize their server logs. Doesn't create vulnerabilities but does help to evaluate.

SamS: We're not going to come to agreement to get to minimum standards here.

<Yianni> http://www.w3.org/wiki/Privacy/DNT-Breakouts

yianni: Aleecia made a good point. FTC language is a reasonable starting point. Examples could be helpful if they don't limit what companies can do or limit tech advancement.

… When I first read existing "unlinkable" language, I was confused about why it was here.

… Specifically, why do we say "commercially reasonable steps"?

<dan_auerbach> apologies all, i have to take off

… Should it be in the document? From a legal standpoint, "commercially" doesn't do much.

<aleecia_> i'd drop "commercially"

<johnsimpson> should just be reasonable

bryan: "Reasonable" is enough.

<aleecia_> and agree reasonable is sufficient

yianni: Anyone disagree?

<rachel_thomas> should be reasonable. no need for commercial.

[General agreement that we should cut "commercially."]

yianni: Also, "high probability"?

rachel_thomas: What is the context in which we're looking at this language? I think these breakouts are designed to take us past what's already in the text. We've been focused on de-id, so it's odd to be looking at a definition of "unlinkable."

yianni: We had a statement earlier from Dan that he doesn't care what it's called as long as substantively that doesn't affect it.

<aleecia_> Let's please please please use a different term other than unlink

… Trying to figure out why we're saying what we're saying in the text.

… If it's substantively accurate to call it "de-id," we should call it that.

<aleecia_> We've agreed to change that, but not what to -- if deid'ed works for everyone, let's run with that

rachel_thomas: We had two definitions that we looked at earlier — FTC and DAA — that we agreed were good. We shouldn't go back to draft. Peter made clear that editing text from the draft should be done in the full group.

yianni: Depends on what we decide to call it.

… So, with the text we have in the draft, we decided commercially should be cut. Should we change "high probability"?

<vincent> we could remove "high probability" as well

<aleecia_> None of these things are absolutes. High probability is about as good as I think we can reasonably get (no pun intended. This time.)

MikeZaneis: "Unlinkable" vs "de-id." I think "unlinkable" presents a unique challenge as we move forward when we talk about permitted uses, etc.

… We may as a group decide after x time, you should de-identify data.

<johnsimpson> seems to me "high probability" is necessary

… We identified reasons why companies should need to go back to re-identify.

… I know this is really about what the process for de-id might be, but if we're talking about unlinkable.

… Connotation of "unlinkable" is a more permanent break that would limit this group's long-term success.

<aleecia_> If companies *can* re-identify globally, they have not actually de-id'ed in the first place

yianni: Agree, especially if what we think of as unlinkable it wouldn't make sense to use that as the word.

<aleecia_> To beat a dead horse: unlinkable means something specific and different in Europe. We should use a different term.

<aleecia_> As another example, if you can append data to a record based on new data you've collected, you don't have deidentified data

robsherman: I think we shouldn't work off of old "unlinkable" definition if we've decided to move away from it. We've identified DAA / FTC language as reasonable baseline, so let's start there.

yianni: Do we think it's reasonable to start with FTC language? DAA? They're very similar.

<bryan> +1 to avoid word-smithing the text here, and use DAA / FTC as base

johnsimpson: Earlier discussion was about using de-id data, but the way you get to de-id is to make it unlinkable.

<aleecia_> We appear to all be agreeing

<aleecia_> So close

… It seems like the definition of unlinkable is an excellent articulation of how you get to de-identification.

yianni: Personally, looking at text as proposed, it's similar to FTC with exception of "high probability." Seems to be similar.

… For Option 1. Option 2 seems very different.

<aleecia_> I'd be happy to add "high prob" to FTC text, toss in a few examples, and go home.

<BerinSzoka> +1 to Rob

<aleecia_> Ok: that persuades me.

<aleecia_> (cannot understand current speaker, did mostly get Rob)

<aleecia_> Agree that there is utility to adopting FTC text unchanged.

robsherman: Worry about tweaking FTC definition. FTC will develop body of caselaw around de-id, and that will give guidance to industry. If we add "high prob" to make people here happy, then it introduces uncertainty about how things diverge.

SamS: Agrees.

<aleecia_> Adding examples sounds reasonable, since I'm not sure implementers will get it

<johnsimpson> seems strange to adopt FTC text when we're developing a global standard

<BerinSzoka> does anyone NOT agree with Rob?

MikeZaneis: Hesitate to add adjectives where we don't know exactly what they do. It seems like we're having a discussion in this group about whether there's any reason or possibility for re-identification. Is the goal to completely break even the possibility for re-identification?

yianni: Everyone here understands that when you de-identify data, there's a chance of reidentification.

… If you got to 100% certainty, the data would become almost useless

… So we're trying to come up with language we could move forward with. If we just adopted FTC language without modification, would people be okay with it?

<aleecia_> Q was do we adopt FTC text? Yes, with examples.

MikeZaneis: I don't think anyone's ready to sign off, but it's a good starting point for the discussion.

<aleecia_> And agree with Rob that not changing it has value

Thomas: Agrees with johnsimpson that there is legal opinion, and we need to think about whether it is globally adoptable.

johnsimpson: I thought that the proposed bare-bones language was the result of many long discussions that took a lot of things into account, and we seem to be throwing those out the window.

… I'm also puzzled about the implications of going to a specific US agency's language.

… This needs to be thought through in terms of a global standard.

yianni: Thomas, do you think FTC language won't coincide with European standards?

Thomas: We need to think about it, and also consider ongoing discussions about EU Directive.

… We have the chance to create a level playing field, and compare legal assessment of various jurisdictions.

<BerinSzoka> given the historical lack of meaningful enforcement in Europe, and the very aggressive enforcement actions taken by this FTC, does anyone really think it won't be the FTC that takes the lead on the hard work of defining what level of deidentification is reasonable?

<aleecia_> if it turns out there are problems between EU and US, that's new information and we reopen

… Also need to consider other jurisdictions.

yianni: So not sure if language works, but need to think about it?

Thomas: Yes.

<BerinSzoka> +q

<aleecia_> Berin, could you kindly knock off the EU baiting?

yianni: With FTC language, what would be wrong with it, assuming EU perspective works out?

<vincent> BerinSzoka, I'd wait for a couple of month and then answer ;)

<aleecia_> In general I'd hesitate about adopting FTC or EU or other "local" bits, but I'll go with math being global. Ideally this works the same everywhere.

berinszoka: There's no caselaw on this yet. I'm trying to point out that, whatever anyone thinks about EU vs US, it's pretty likely that the definition of that term is going to happen in US.

… I don't think it's unreasonable to start with that as a baseline.

<aleecia_> John's point is right to consider and in other places I'd agree violently.

rachel_thomas: FTC is a good starting point, but it requires looking at the whole document and the impact of changing the definition.

<aleecia_> I think we need to make decisions and get to drafts, which we will revise many, many times.

<aleecia_> But deadlocking on not making any decisions is getting boring

<MikeZaneis> Agreeing to begin the discussion with the FTC language is good progress that will allow us to iterate further

robsherman: (1) This group won't get to a place of being able to sign off on FTC definition and go home. But sounds like the consensus of the group is that we like FTC and DAA definitions as starting points, with the considerations we've discussed, and we need to think through it more. (2) Don't think we should look at this as an effort to codify all of the privacy laws in the world. It may be that people have to comply with this standard AND local law[CUT]

<aleecia_> Ok. Do we have specific issues with the FTC text, or a general "we aren't willing to say yes to anything without going back for review" as sort of a general approach to not getting anything done?

<aleecia_> I hear need for review to see how it works with the EU

<aleecia_> Anything else specific? What's the to do list to move forward?

<johnsimpson> what are you saying about transparency?

yianni: Generally like FTC and DAA language, want to think more. Okay with some examples as long as not prescriptive. Admin, tech, physical measures. Do we agree on these points?

… On transparency, significant resistance on providing specific details about security / privacy measures. Aleecia mentioned we're not going to get agreement here. So nothing decided there.

ack johnsimpson.

johnsimpson: Agree.

yianni: On examples, Rachel thought examples weren't the best way to go, but you thought maybe some language could be okay.

rachel_thomas: I'm not the tech expert on what examples would be most effective, but I agree that defining what's not acceptable - as opposed to what is acceptable - is a better way to go.

<aleecia_> sounds like an action item in the large group

MikeZaneis: In this F2F, suggest not focusing on non-normative and trying to get to high-level normative agreement.

<aleecia_> for people to write examples

yianni: Pseudonyms. Many people said that use of pseudonyms can be compatible with de-identification, if appropriate organizational and physical measures are used.

<johnsimpson> say that again please

<aleecia_> uh, not really

<aleecia_> disagree

<vincent> well I'd disagree with that

… (Repeats per John's request.)

<aleecia_> replacing one GUID with another is not de-id'ed

<johnsimpson> not comfortable with that

… Aleecia, why?

<vincent> again, I'm not sure that organizational measure should be taken into account

aleecia: If you have something that starts out with a unique identifier and replace it with another unique identifier, you haven't moved toward privacy at all.

… You rotate hashes, so you're not promoting privacy at all.

… No obvious bright line there.

… Would have to do some work to figure out if you've actually de-identified or not.

… This could be part of a useful solution, but I would not make the broad statement you made.

… Using a pseudonym is by itself not enough.

<aleecia_> thanks!

<johnsimpson> when are we back?

<aleecia_> when are we back?

<vincent> thanks! when do we resume?

<aleecia_> any change from published agenda?

<Yianni> we start back up at 2

<aleecia_> thanks!

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.137 (CVS log)
$Date: 2013-02-12 19:27:37 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.137  of Date: 2012/09/20 20:19:01  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/concept of "deidentified"/concept of "reasonableness"/
No ScribeNick specified.  Guessing ScribeNick: robsherman
Inferring Scribes: robsherman

WARNING: No "Topic:" lines found.

Default Present: yianni, Jeff, johnsimpson, Dan_Auerbach, BerinSzoka, Aleecia, vincent
Present: yianni Jeff johnsimpson Dan_Auerbach BerinSzoka Aleecia vincent

WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Got date from IRC log name: 12 Feb 2013
Guessing minutes URL: http://www.w3.org/2013/02/12-dntd-minutes.html
People with action items: 

WARNING: Input appears to use implicit continuation lines.
You may need the "-implicitContinuations" option.

WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report

[End of scribe.perl diagnostic output]