02:22:46 npdoty has joined #dntd 16:28:28 RRSAgent has joined #dntd 16:28:28 logging to http://www.w3.org/2013/02/12-dntd-irc 16:28:53 +johnsimpson 16:29:52 +Dan_Auerbach 16:30:20 +BerinSzoka 16:30:42 what phone is in room? heard dan well…not so clear from room 16:31:43 http://www.w3.org/wiki/Privacy/DNT-Breakouts 16:31:53 BerinSzoka has joined #DNTD 16:32:01 I just joined. could you send that link again? 16:32:12 yianni: First topic — what term should we use to describe "unlinkable" / "deidentified" / etc.? 16:32:31 … Concern is that there's always a chance of reidentification since always a chance. 16:32:40 someone typing near telephone should mute 16:32:52 unlinkable = unfair & deceptive trade practice! ;) 16:33:02 zakim, code? 16:33:02 the conference code is 26634 (tel:+1.617.761.6200 sip:zakim@voip.w3.org), aleecia_ 16:33:04 much like "Do Not Track"... 16:33:13 link again, please? 16:33:29 robsherman: Deidentified. 16:33:36 +Aleecia 16:34:02 how about "Anonymish?" 16:34:17 or "Pseudononymish?" 16:34:32 Or... Deidentifish? 16:34:50 What's the difference between "deidentified" and "anonymous"? 16:34:54 wseltzer has joined #dntd 16:34:55 perfect for all your phishing needs 16:35:16 … "Deidentified" covers both unlinkable and what could be relinkable 16:35:25 yianni: Anyone on call have a view on this? 16:35:27 I don't feel strongly either way 16:35:38 but the substance matters more than the word we use 16:35:42 If we've started, note on the phone I cannot hear well at all 16:35:56 And we appear not to have a scribe 16:36:05 i'm in your situation, aleecia_ 16:36:11 Thomas: If deidentified is closer to anonymous data, then deidentified is less able to explain "unlinked." If we use "unidentified," then we are closer to both ideas - unlinkable but also acknowledge the possibility of reidentification. 16:36:21 Rachel_Thomas: Are we getting too far into semantics? 16:36:24 again, could you please send the link again to the questions? some of us joined the IRC after it was shared 16:36:39 David_Stark has joined #dntd 16:36:47 Thomas: We need to decide how far we are able to go and come up with a way to describe it. 16:36:56 If someone in the room could please scribe, those of us on the phone might be able to keep up 16:36:57 AMEN, Rachel 16:37:03 thank you, Rob 16:37:04 Rachel_Thomas: Deidentified implies that you have taken reasonable steps to unlink; unlinkable is an impossibility. 16:37:30 note that unlinkable means something specific and different in the EU 16:37:30 yianni: robsherman made this point - we've done this in the HIPAA context and elsewhere. What we say in the text is reasonableness. 16:37:31 could whoever's typing move away from the mic? it's loud 16:37:51 Thomas: Explain that there's a small gray area that's not completely anonymous. 16:38:08 Yianni: "Reasonable deidentification" conveys that there's always the chance of reidentification, but has to be a low chance. Are you okay with that? 16:38:09 Thomas: Yes. 16:38:22 "deidentification" to me suggests after a process. 16:38:29 Dan_Auerbach: Can we all agree that whatever word we use shouldn't inform the details of the process or standard we agree upon? 16:38:34 Rachel_Thomas: What do you mean? 16:38:51 Dan_Auerbach: Don't feel strongly about terminology, but I do care about substance. 16:39:00 Yianni: Agree. Just because we use the word deidentified or unlinkable. 16:39:15 Rachel_Thomas: But they're different concepts - we're trying to decide whether working toward deid or unlinkability. 16:39:55 robsherman: I think consensus is that we're not going for theoretical unlinkability but for reasonable delinking. 16:39:59 i do not suggest we go for theoretical impossibility 16:40:06 yianni: Seems reasonable. Should we look at FTC language? 16:40:16 http://www.w3.org/wiki/Privacy/DNT-Breakouts 16:40:31 +vincent 16:40:49 bryan has joined #dntd 16:41:11 #DoNotCough 16:41:13 http://www.w3.org/wiki/Privacy/DNT-Breakouts 16:41:16 haha 16:41:18 apologies was called away from phone briefly 16:41:36 … What's wrong with this language? Why shouldn't we use it? 16:41:46 FTC: 16:41:47 data is not “reasonably linkable” to the extent that a company: (1) takes reasonable measures to ensure that the data is de-identified; (2) publicly commits not to try to reidentify the data; and (3) contractually prohibits downstream recipients from trying to re-identify the data. Commission's definition of "de-identified": "First, the company must take reasonable measures to ensure that the data is de-identified. This means that the company must [CUT] 16:42:03 dan_auerbach: FTC language is a good starting point. Makes sense and not too prescriptive. Also favor adding to it the idea that there should be privacy penetration testing. 16:42:31 cut off, once more: "First, the company must take reasonable measures to ensure that the data is de-identified. This means that the company must achieve a reasonable level of justified confidence that the data cannot reasonably be used to infer information about, or otherwise be linked to, a particular consumer, computer, or other device." 16:42:33 … Having people try to reidentify data. Has to be some sort of normative way to distinguish companies that just try to hash IDs and companies that make a more serious effort to it. 16:42:55 vincent has joined #dntd 16:43:00 rachel_thomas: Sounds like you're onboard with the idea as long as we come up with some clarity around what deidentification is? 16:43:04 Dan_Auerbach: Yes. 16:43:26 ?: Should we be requiring people to publicly commit not to reidentify? 16:43:46 yianni: I think that's what we are trying to do here - get people to publicly commit to follow a standard. 16:43:51 bryan: Privacy policy. 16:45:34 robsherman: Much cleaner to not have lots of individual "public commitments," as FTC would have under Section 5. Let's just agree on what's required — for example, not reidentifying data — and treat server response as the public commitment. 16:45:36 rrsagent, make record public 16:45:47 yianni: Is it better to define reasonableness here? Have examples? 16:45:59 Rachel_Thomas: There's already a legal standard for "reasonableness." 16:46:02 q? 16:46:03 q? 16:46:07 q+ 16:46:09 Dan_Auerbach: Helpful to have examples. 16:46:17 ack bryan 16:46:20 q+ 16:46:39 For reference (since we're also looking at the FTC language) here is the DAA definition of De-Identification Process: Data has been De-Identified when an entity has taken reasonable steps to ensure that the data cannot reasonably be re-associated or connected to an individual or connected to or be associated with a particular computer or device. An entity should take reasonable steps to protect the non-identifiable nature of data if it is dist[CUT] 16:46:53 (cont) Affiliates and obtain satisfactory written assurance that such entities will not attempt to reconstruct the data in a way such that an individual may be re-identified and will use or disclose the de-identified data only for uses as specified by the entity. An entity should also take reasonable steps to ensure that any non-Affiliate that receives deidentified data will itself ensure that any further non-Affiliate entities to which such dat[CUT] 16:46:58 bryan: Regarding examples, that's something that advocacy groups or public sites will do as far as creating best practices. Maybe something that could be documented through W3C community group process, webplatform.org, etc. Agree with Rachel that we should limit non-normative language within spec itself. 16:47:04 I think examples are helpful 16:47:06 ack johnsimpson 16:47:10 q- 16:47:14 q+ 16:47:14 Asking people to implement this without context is hard 16:47:20 johnsimpson: Probably don't want to define "reasonable" in normative language. 16:47:23 i don't feel strongly about not having examples (just depends on what they are) 16:47:30 And we likely do NOT want to hard code what is required 16:47:38 … We do have specific examples, and it would be tremendously helpful to have that language included in a non-normative way. 16:48:05 +1 16:48:10 +1 16:48:19 robsherman: Agree w/ aleecia that hard-coding technology is really hard to implement and will break the standard 5 years from now. 16:48:26 Could put them into an appendix if they clutter up the text 16:48:34 yianni: What examples do people have as "clearly good enough" or "clearly not good enough"? 16:48:40 q+ 16:48:47 q? 16:49:01 Not good enough: removing names, removing unique ids 16:49:18 Sam: Looking at Ed's examples, he talked about admin/procedural controls that leaves database or is reported outside of controlling entity. Focusing on that would be interesting. Looking at 3rd parties, how they anonymize and report out, that's really what we're talking about in terms of protecting privacy. 16:49:40 … Lots of technologies to do that. Will be different for every entity. So I'd like to focus on what are the controls that keep identifiable info from leaking out. 16:49:44 ack: dan-auerbach 16:49:45 … No specific examples. 16:49:47 -Dan_Auerbach 16:49:57 ack rachel_thomas 16:50:01 whoops seems i've dropped off the call 16:50:06 will try back in a moment 16:50:19 rachel_thomas: In IRC, put in DAA language on deidentification. We don't have specific examples, but text is helpful in terms of explaining what's meant by de-identification. 16:50:21 (The DAA text isn't bad) 16:50:53 unmute aleecia_ 16:50:57 zakim, unmute aleecia_ 16:50:57 sorry, robsherman, I do not know which phone connection belongs to aleecia_ 16:51:05 agree that DAA text is OK, though FTC seems a little better to me 16:51:06 I'm following via scribes - cannot hear well 16:51:15 I'm guessing I was unmuted for a reason I missed... 16:51:18 rachel_thomas: [reads through DAA language] 16:51:56 i'm quoting from https://www.aboutads.info/resource/download/Multi-Site-Data-Principles.pdf 16:52:03 yianni: Any specific technical measures? 16:52:04 page 8 :) 16:52:14 Ah, sorry. Rachel will be better taking that than I would anyway, but I note it's similar to FTC's in large measure. Same direction. 16:52:21 … Any comments on Ed's presentation about hashing or k-anonymity not being a good method? 16:52:43 Sam: I'm a big opponent of hashing. It's great but has its limits. If you're using it to anonymize or other deidentify, some day it will be broken. 16:52:49 q+ 16:52:54 q- 16:52:56 q+ 16:52:59 … Have to come up with better ways of doing this long-term. 16:53:12 rachel_thomas: We don't need to identify specific standards. 16:53:12 q+ 16:53:13 q? 16:53:16 ack robsherman 16:53:18 My hope is that DAA's members won't have to change much, though from Shane's questions, presumably Yahoo! would 16:53:19 Q+ 16:53:30 (i'm unable to rejoin the call -- it seems to be on W3C's side? -- so will follow via scribe 16:53:54 (Dan, me too) 16:53:54 robsherman: talking about specific tech is a mistake, also demonizing hashing, there are good uses in the de-id context 16:53:57 it strongly depends of how often you change the seed you use to hash 16:54:24 q? 16:54:28 q+ 16:54:42 Vincent - and the richness of data collected 16:54:44 Sam: Good examples of how hashing works, but it's in the context of a specific data set. When you become able to correlate, things break down, so hard to say whether a particular technique will work. 16:54:45 here's an example of something which is NOT good enough: a wide table keyed with pseudonyms (say, hashes of cookies), but which also has timestamps, urls, etc 16:54:46 ack bryan 16:55:16 bryan: This is a point in time where anything we write on technology today will be superseded tomorrow. Let's establish a reasonable expectation, let technologists figure out what is reasonable today, and not put it in the spec. 16:55:20 ack johnsimpson 16:55:32 johnsimpson: If we know what works, it should be cited in non-normative language as an example. 16:55:42 … Also, in normative language, we should require that there be transparency about method you use to hash. 16:55:49 q+ to respond to johnsimpson 16:55:51 for the record, Google used to change (still does) the key to hash search logs everyday (see: http://searchengineland.com/anonymizing-googles-server-log-data-hows-it-going-15036) 16:56:04 … People need to understand what you're doing. 16:56:06 ack rachel_thomas 16:56:25 rachel_thomas: Agree strongly for the sake of consumer privacy and data security. The more specific you are about specific methods of protecting data, the less secure they are. 16:56:37 and we're talking about a first anonymizing its search logs, not a third party (imho third party should provide stronger garantees) 16:56:38 security through obscurity has failed time and again 16:56:42 … That's why the law has been comfortable with the idea of not describing specific methods. 16:56:47 I agree with bryan there shouldn't be normative language specifying technology, but non-normative examples are helpful 16:56:57 ack vincent 16:57:02 q- 16:57:07 hey, John, how about #DoNotSnark? 16:57:21 vincent: We say data cannot be used by the actor to reidentify. But don't say it can't be used by ANYONE. 16:57:28 … If I share data with another actor, it might use the data to reidentify. 16:57:30 Berin, I take it you've decided not to honor that? (...says the pot to the kettle) 16:57:31 q? 16:57:31 q+ 16:57:51 as John said: "whatever" ;) 16:57:53 yianni: Do you mean just contractually - have you have contractual promises not to reidentify? 16:58:00 Vincent - I could imagine daily not being enough 16:58:03 vincent: Just talking about language of the draft. 16:58:17 yianni: I'd hope contractual language already covers. 16:58:25 rachel_thomas: FTC language already says this. 16:58:31 q? 16:58:40 ack robsherman 16:58:53 q+ 16:58:56 ok, the barebone does not reflect that :"contractually prohibits downstream recipients from trying to re-identify the data. " 16:59:11 currently not hearing any objections to the FTC defn (and not hearing, so perhaps missing things vital) 16:59:20 q? 16:59:23 could we just adopt FTC lang and add examples? 16:59:31 robsherman: Wouldn't need to have specific language that says this; already covered by "reasonableness" precedent. 16:59:36 ack rachel_thomas 16:59:37 aleecia_: that'd be my favored approach 16:59:44 rachel_thomas: DAA makes the same point. We're in violent agreement here. 16:59:46 aleecia_, actually I'agree that every day is not enough 16:59:49 any objections? 16:59:55 yianni: Next question - on pseudonyms. 16:59:59 however, the examples should be real 17:00:12 can we actually just nail this down (for the group) right now? 17:00:34 Thomas: From my presentation today, there's a split between anonymous and pseudonymous. Anonymous is nearly absolutely impossible, and with pseudonymous someone has the key to reidentify. 17:00:44 Q+ 17:01:00 q+ 17:01:01 q? 17:01:01 yianni: [summarizes Dan's comment from IRC]. 17:01:38 Thomas has joined #dntd 17:01:57 regarding pseudonyms, it's dangerous to think of data that way because it leads to the false impression that only certain fields are the "identifiers", whether they be real names or pseudonyms 17:01:57 SamS: Comes down to a contract that says, if I get data, I won't try to reidentify it. Even if it's pseudonymous or a weak version of anonymity, as long as I protect data, don't leak it, and don't reidentify, it's equivalently the same thing. But the burden is on me to be sure I have the right admin controls to be sure all of this happens. Ultimately that's where we are going. 17:02:07 in fact, each field will have some bits of identifying information 17:02:09 where are we standing on "hashing" do we agree that it's not enough or it's the opposite? 17:02:16 yianni: You think contractual language does a lot? 17:02:19 SamS: Yes. 17:02:24 ack bryan 17:02:36 q+ 17:02:39 bryan: Want to be sure we're still talking about a third party. 17:02:51 I think there is a misunderstanding about pseudonym definition 17:02:52 right now i cannot imagine a straight up pseudonym helping 17:02:55 q- 17:02:55 q+ 17:02:56 i want strong guarantees on the data itself, not just contracts 17:03:04 agree with robsherman 17:03:05 replacing one GUID with another does not help 17:03:11 Thomas_Schauf has joined #dntd 17:03:15 q? 17:03:17 for instance cookieID is a pseudonym 17:03:18 From Thomas' presentation this morning: “Pseudonymising” shall mean replacing the data subject’s name and other identifying features with another identifier in order to make it impossible or extremely difficult to identify the data subject. 17:03:19 robsherman: (Clarifies use of pseudonyms in third party context) 17:03:30 aleecia_: especially when that guid is linked with tons of other identifying information 17:03:32 IP address+ User Agent could be a pseudonym 17:03:35 yes 17:03:49 q+ 17:03:53 q? 17:03:59 Pseudonymous info is...Unique identifier does not identify a specific person, but could be associated with an individual. Includes: Unique identifiers, biometric information, usage profiles not tied to a known individual. Until associated with an individual, data cannot be treated as anonymous. 17:04:01 q- SamS 17:04:01 and that's my problem with hashing (and agree, rotation helps there, potentially) 17:04:06 ack Rachel_Thomas 17:04:20 Rachel_Thomas: Capturing info from Thomas's presentation this morning. 17:04:24 ack David_Stark 17:04:37 imho, hasing a cookie ID does not bring a lot of garantee (if any)... 17:04:52 +1 vincent 17:05:02 getting into the mentality that certain fields are the "identifying" ones (e.g. a cookie, hash of ip+ua) is a mistake 17:05:12 David_Stark: Thought the conversation this morning was excellent. At my market research co, when people who participate in a survey, we assign them a pseudonymous identifier that allows for quality control. Without this, you'd have know control. Just anyone could come in and respond many times - fundamentally undermining data quality. 17:05:36 if i visit the domain, webmail.danauerbach.org, that field becomes identifying 17:05:37 whoever was typing but just stopped--was REALLY loud 17:05:50 q? 17:06:27 … Only have identifiers of panel members and their numbers. Researchers have access to survey responses. But nobody in our company has access to individuals and their responses. 17:06:56 yianni: You're bringing up the point of administrative controls, which Sam also mentioned. 17:07:07 as a suggestion: you could have a marker in a cookie of the number of times people have taken a survey, rather than the unique id on the person 17:07:08 q? 17:07:18 … Currently, FTC language doesn't mention this. Should we address this in non-normative language? 17:07:28 q+ 17:07:33 ack David_Stark 17:07:36 that isn't so bad when you're only dealing with one company, rather than trying to push out a change across multiple parties working together in an ecosystem 17:07:40 David_Stark: Great idea, if we can provide examples. 17:07:54 with proper access controls on pseudonym mapping, I agree that this meaning of pseudonym supports the goal of de-identifying data 17:08:06 … Makes a lot of sense. We see that language in data protection laws - admin, technical, and physical controls. Let's say that. 17:08:12 yianni: Any disagreement? 17:08:17 with? 17:08:20 q? 17:08:33 … with the idea that organizational measures should be taken into account? 17:08:47 for first parties, not an issue 17:08:55 for third parties, not sure why we'd take them into account? 17:08:59 … One danger I can see here is organizational measures is the government problem. Even if you have great organizational measures, the govt could come in and reidentify. 17:09:01 q+ 17:09:17 i think the conversation was about a survey people agreed to take, which makes for first party issues 17:09:23 q+ 17:09:26 q? 17:09:34 robsherman: Shouldn't have one standard in normative and a different one in non-normative 17:09:36 ack vincent 17:09:39 the first party might release data later and want to de-id, but no need to worry about org measures then 17:09:45 not seeing how that's relevant to us here 17:09:47 vincent: One scenario people are considering is the rogue engineer. 17:10:01 q? 17:10:01 … Someone has to have access to the data, so the org measures don't really matter. 17:10:02 and again, my apologies for not being able to hear. 17:10:02 Q+ 1 17:10:04 ack rachel_thomas 17:10:26 rachel_thomas: Are you suggesting different standards for protection of data for consumer privacy and for government access? 17:10:27 Q+ 17:10:36 yianni: No - I don't think you can do that separately. 17:10:38 q- 1 17:10:51 q? 17:10:59 rachel_thomas: We can't get outside of reasonableness. You can take action against a rogue employee and that's already covered. 17:11:09 yianni: Would you say that admin controls go into the concept of "deidentified"? 17:11:12 rachel_thomas: Yes. 17:11:16 aleecia, bridge is not reachable? 17:11:26 s/concept of "deidentified"/concept of "reasonableness" 17:11:49 Sam: We already do all of this. 17:12:07 yianni: There's going to be some pushback for including organizational measures within the concept of reasonable technical measures. Govt access. 17:12:13 q? 17:12:18 … Does that mean that admin controls shouldn't count as a part of technical measures? 17:12:22 rachel_thomas, you can make it very very hard (if not impossible) for someone to technically reidentify the data 17:12:26 ack SamS 17:12:38 on bridge. call quality is less than ideal. 17:12:49 not just "reasonably" hard 17:12:50 vincent: i am unable to reach bridge at all 17:12:52 SamS: It's pretty basic - administrative measures prevent people accessing data to reidentify. 17:12:54 it's like hearing the words without any spaces 17:13:06 … If I can get at data in a way that bypasses proper controls, I can maybe reidentify. 17:13:08 q? 17:13:26 bryan: It has strong value. It's something we do for a variety of regulatory requirements already. Tons of data and admin controls are only way to meet those requirements. 17:13:27 q+ 17:13:29 to clearly state my opposition to this, administrative controls are NOT enough 17:13:38 q? 17:13:39 we need real de-identification in the data itself 17:14:00 MikeZ: Some people have this in place. Small companies. Which is why you have a sliding scale in FTC standard. We're solving for the web, not just a handful of big companies. 17:14:08 q? 17:14:08 yianni: That's the problem with having specific examples in text. 17:14:20 bryan: Same goes for sophistication of tech approaches that we mandate. 17:14:33 aleecia_, dan_auerbach have you retried? call quality is fine for me 17:14:54 quality ok for me, too 17:15:03 zakim, code? 17:15:03 the conference code is 26634 (tel:+1.617.761.6200 sip:zakim@voip.w3.org), aleecia_ 17:15:08 -Aleecia 17:15:20 vincent, i've retried several times but am just unable to connect, but will do so one more time... 17:15:30 +Aleecia 17:15:39 q? 17:15:41 ack robsherman 17:15:47 did get back in. is somewhat better 17:16:06 +Dan_Auerbach 17:16:08 still scribe-dependent but that's ok 17:16:12 robsherman: Concept of reasonableness encompasses admin, tech, and physical practices, and the specific steps might vary based on circumstances. 17:16:21 q? 17:16:51 vincent, now connected but call quality is quite bad 17:16:52 yianni: Would having specific examples of what's appropriate be helpful in bridging the gap between John Simpson's demand for transparency and the need to keep it security. 17:17:08 +q 17:17:16 rachel_thomas: Security information is kept secret because disclosing it gives hackers a way in. 17:17:21 q? 17:17:22 … Let's protect the information in the most effective way. 17:17:26 getting enough of this - depending on "we don't tell people about our security measures" does not offer real protection 17:17:29 bryan: Also opens the door to social engineering around company practices. 17:17:35 q+ 17:17:36 ack johnsimpson 17:17:47 johnsimpson: Not calling for "putting the secret sauce of everything you do out there." 17:18:05 i do think we need transparency here, +1 to johnsimpson 17:18:07 … What I think needs to be explained publicly is the category of measures that are taken. Not enough to say "reasonable". 17:18:15 rachel_thomas: What would be reasonable in your opinion? 17:18:43 if we had a high, specific standard transparency wouldn't be necessary, but that would break future-proofing 17:18:45 johnsimpson: Difference between security and fraud, where getting specific could tip your hand. But you could say, if you believe that hashing is reasonable, you can say you rely on hashing. 17:18:51 q+ 17:18:57 … You can describe techniques that would provide meaningful insight without giving away the store. 17:19:12 rachel_thomas: But you've just narrowed the world in terms of what a hacker needs to think about. 17:19:24 +1 to aleecia 17:19:26 … You're narrowing a bunch of other things a hacker will need to think about to get into your system. 17:19:27 rachel_thomas, I'd argue with taht 17:19:31 +q 17:19:31 anyone who can break your hash will be able to identify that's what you did without disclosure 17:19:32 ack vincent 17:20:06 vincent: Transparency doesn't provide a solution for hackers to break in. I don't see how it would allow someone to break into your system. 17:20:08 ack sams 17:20:18 q- 17:20:30 how useful is this detail to anyone making privacy choices? likely not very. so while i disagree with Rachel on this one, i'm also not strongly pounding the table to support John 17:20:38 SamS: Let's say I published a privacy policy on my website that showed all of the encryption techniques that I use to anonymize data. How does that actually protect the data that I have? 17:20:55 … Will consumers read it and say, "They use a lot of encryption? That's really good." 17:20:58 q+ 17:21:12 yianni: So you're saying there's no information that is worth releasing. 17:21:16 q+ 17:21:34 SamS: Saying that you use reasonable technical measures to protect the data - which is what the FTC requires - then that's reasoable. And if I have a breach, people will hold me accountable. 17:21:39 q? 17:21:40 … But using Encryption v.1 vs v.2 isn't going to make a difference. 17:21:42 ack aleecia_ 17:22:11 agree with aleecia! 17:22:14 aleecia: I think you're not giving away the store to announce which of the few available measures you might be taken. In terms of whether end users will understand the difference, no. If it were in a privacy policy, no. 17:22:26 consumers won't know what the techniques mean anyway. 17:22:32 … But what this could be helpful for is to require companies themselves to write down what they do and review it periodically to make sure it's what they do. 17:22:43 … There's no state secret or competitive advantage. 17:22:55 internal policies are different from public policies - no one is saying that companies can't review their internal policies to ensure that they're keeping their practices up to date. 17:22:56 … Most value of privacy policies these days is for internal analysis. 17:22:58 q? 17:23:02 ack rachel_thomas 17:23:21 q+ 17:23:40 smaller cos will not 17:23:44 rachel_thomas: Agree with aleecia that consumers won't understand what they are reading in privacy policies. I'd also note that there are endless internal policies for companies that describe in much greater detail what you do. So I don't think that leaving it out of privacy policy omits the process of internal analysis. 17:23:45 major cos will 17:23:46 ack vincent 17:24:00 rachel, "privacy and data security" is too broad to cover what we're after, which is ensuring data is de-identified 17:24:16 vincent: Maybe a few users might like to know this, and this data might be useful. For example, you might want to know if data you deleted can still be linked back to you. 17:24:24 q+ 17:24:25 q? 17:24:26 having "if you share de-id'ed data, state what you do to ensure it's actually de-id'ed" is a small ask 17:25:49 disclosing an alogirhtm without data won't help hackers to break in, just help the community to assess the security of your algorithms 17:26:58 q+ 17:27:00 robsherman: We need to be clear about what problem we're trying to solve. User transparency? Challengeability? Forcing internal analysis? I tend to think over time it's bad to disclose security practices because it creates a vulnerability. 17:27:00 +q 17:27:03 q? 17:27:05 so, i'd say we don't have agreement here, but could have agreement on adopting the FTC text and adding examples 17:27:18 … Beyond that, the more transparent you are — sufficient to allow people to test vulnerabilities — you actually become more vulnerable. 17:27:19 ack vincent 17:27:37 i'd like to voice my strong objection to conflating "security" with the specific area of de-identification 17:27:40 can we do that and leave transparency for the full group? if we're split here, odds are we're split with more people too 17:27:46 not sure agreeing here has much value 17:27:47 they are very different things 17:27:51 vincent: Google talks about how they anonymize their server logs. Doesn't create vulnerabilities but does help to evaluate. 17:28:11 ack SamS 17:28:18 q? 17:28:21 SamS: We're not going to come to agreement to get to minimum standards here. 17:28:25 q- 17:29:17 http://www.w3.org/wiki/Privacy/DNT-Breakouts 17:29:17 yianni: Aleecia made a good point. FTC language is a reasonable starting point. Examples could be helpful if they don't limit what companies can do or limit tech advancement. 17:29:26 MikeZaneis has joined #dntd 17:29:41 … When I first read existing "unlinkable" language, I was confused about why it was here. 17:29:54 … Specifically, why do we say "commercially reasonable steps"? 17:30:01 apologies all, i have to take off 17:30:04 -Dan_Auerbach 17:30:06 … Should it be in the document? From a legal standpoint, "commercially" doesn't do much. 17:30:07 q? 17:30:07 i'd drop "commercially" 17:30:07 should just be reasonable 17:30:13 bryan: "Reasonable" is enough. 17:30:19 and agree reasonable is sufficient 17:30:20 yianni: Anyone disagree? 17:30:20 should be reasonable. no need for commercial. 17:30:36 [General agreement that we should cut "commercially."] 17:30:48 yianni: Also, "high probability"? 17:30:56 q? 17:31:01 q+ 17:31:05 ack rachel_thomas 17:31:35 rachel_thomas: What is the context in which we're looking at this language? I think these breakouts are designed to take us past what's already in the text. We've been focused on de-id, so it's odd to be looking at a definition of "unlinkable." 17:31:52 yianni: We had a statement earlier from Dan that he doesn't care what it's called as long as substantively that doesn't affect it. 17:31:57 Let's please please please use a different term other than unlink 17:32:08 … Trying to figure out why we're saying what we're saying in the text. 17:32:20 … If it's substantively accurate to call it "de-id," we should call it that. 17:32:21 q? 17:32:25 We've agreed to change that, but not what to -- if deid'ed works for everyone, let's run with that 17:32:54 rachel_thomas: We had two definitions that we looked at earlier — FTC and DAA — that we agreed were good. We shouldn't go back to draft. Peter made clear that editing text from the draft should be done in the full group. 17:33:03 q+ 17:33:09 yianni: Depends on what we decide to call it. 17:33:20 Q+ 17:33:34 … So, with the text we have in the draft, we decided commercially should be cut. Should we change "high probability"? 17:33:37 q+ later 17:33:42 ack robsherman 17:33:52 q? 17:33:56 we could remove "high probability" as well 17:33:58 ack MikeZaneis 17:34:18 None of these things are absolutes. High probability is about as good as I think we can reasonably get (no pun intended. This time.) 17:34:28 MikeZaneis: "Unlinkable" vs "de-id." I think "unlinkable" presents a unique challenge as we move forward when we talk about permitted uses, etc. 17:34:38 … We may as a group decide after x time, you should de-identify data. 17:34:39 seems to me "high probability" is necessary 17:34:54 … We identified reasons why companies should need to go back to re-identify. 17:35:06 … I know this is really about what the process for de-id might be, but if we're talking about unlinkable. 17:35:19 … Connotation of "unlinkable" is a more permanent break that would limit this group's long-term success. 17:35:22 q? 17:35:24 If companies *can* re-identify globally, they have not actually de-id'ed in the first place 17:35:30 yianni: Agree, especially if what we think of as unlinkable it wouldn't make sense to use that as the word. 17:35:57 To beat a dead horse: unlinkable means something specific and different in Europe. We should use a different term. 17:36:30 ack robsherman 17:36:37 As another example, if you can append data to a record based on new data you've collected, you don't have deidentified data 17:36:50 Q+ 17:37:01 robsherman: I think we shouldn't work off of old "unlinkable" definition if we've decided to move away from it. We've identified DAA / FTC language as reasonable baseline, so let's start there. 17:37:15 yianni: Do we think it's reasonable to start with FTC language? DAA? They're very similar. 17:37:22 q? 17:37:27 +1 to avoid word-smithing the text here, and use DAA / FTC as base 17:37:27 ack johnsimpson 17:37:45 johnsimpson: Earlier discussion was about using de-id data, but the way you get to de-id is to make it unlinkable. 17:37:53 We appear to all be agreeing 17:38:07 So close 17:38:11 … It seems like the definition of unlinkable is an excellent articulation of how you get to de-identification. 17:38:14 q? 17:38:38 yianni: Personally, looking at text as proposed, it's similar to FTC with exception of "high probability." Seems to be similar. 17:38:48 … For Option 1. Option 2 seems very different. 17:38:53 q? 17:38:54 I'd be happy to add "high prob" to FTC text, toss in a few examples, and go home. 17:39:30 q+ 17:39:49 Q+ 17:40:37 Q+ 17:40:52 +1 to Rob 17:41:07 Ok: that persuades me. 17:41:29 (cannot understand current speaker, did mostly get Rob) 17:41:41 Agree that there is utility to adopting FTC text unchanged. 17:41:45 robsherman: Worry about tweaking FTC definition. FTC will develop body of caselaw around de-id, and that will give guidance to industry. If we add "high prob" to make people here happy, then it introduces uncertainty about how things diverge. 17:41:48 SamS: Agrees. 17:42:02 Adding examples sounds reasonable, since I'm not sure implementers will get it 17:42:23 seems strange to adopt FTC text when we're developing a global standard 17:42:29 does anyone NOT agree with Rob? 17:42:32 q? 17:42:33 MikeZaneis: Hesitate to add adjectives where we don't know exactly what they do. It seems like we're having a discussion in this group about whether there's any reason or possibility for re-identification. Is the goal to completely break even the possibility for re-identification? 17:42:51 yianni: Everyone here understands that when you de-identify data, there's a chance of reidentification. 17:42:58 q+ 17:43:04 … If you got to 100% certainty, the data would become almost useless 17:43:29 … So we're trying to come up with language we could move forward with. If we just adopted FTC language without modification, would people be okay with it? 17:43:29 q? 17:43:37 Q was do we adopt FTC text? Yes, with examples. 17:43:41 MikeZaneis: I don't think anyone's ready to sign off, but it's a good starting point for the discussion. 17:43:44 And agree with Rob that not changing it has value 17:43:52 q? 17:43:58 q- 17:44:04 Thomas: Agrees with johnsimpson that there is legal opinion, and we need to think about whether it is globally adoptable. 17:44:23 q- 17:44:30 q- MikeZaneis 17:44:33 ack johnsimpson 17:45:02 Thomas_Schauf has joined #dntd 17:45:04 johnsimpson: I thought that the proposed bare-bones language was the result of many long discussions that took a lot of things into account, and we seem to be throwing those out the window. 17:45:16 … I'm also puzzled about the implications of going to a specific US agency's language. 17:45:23 … This needs to be thought through in terms of a global standard. 17:45:23 q? 17:45:49 yianni: Thomas, do you think FTC language won't coincide with European standards? 17:46:08 Thomas: We need to think about it, and also consider ongoing discussions about EU Directive. 17:46:26 … We have the chance to create a level playing field, and compare legal assessment of various jurisdictions. 17:46:37 given the historical lack of meaningful enforcement in Europe, and the very aggressive enforcement actions taken by this FTC, does anyone really think it won't be the FTC that takes the lead on the hard work of defining what level of deidentification is reasonable? 17:46:45 if it turns out there are problems between EU and US, that's new information and we reopen 17:46:49 … Also need to consider other jurisdictions. 17:46:57 yianni: So not sure if language works, but need to think about it? 17:46:59 Thomas: Yes. 17:47:00 +q 17:47:00 Berin, could you kindly knock off the EU baiting? 17:47:17 q+ 17:47:18 yianni: With FTC language, what would be wrong with it, assuming EU perspective works out? 17:47:19 BerinSzoka, I'd wait for a couple of month and then answer ;) 17:47:20 q? 17:47:23 ack berinszoka 17:47:49 In general I'd hesitate about adopting FTC or EU or other "local" bits, but I'll go with math being global. Ideally this works the same everywhere. 17:48:00 berinszoka: There's no caselaw on this yet. I'm trying to point out that, whatever anyone thinks about EU vs US, it's pretty likely that the definition of that term is going to happen in US. 17:48:07 … I don't think it's unreasonable to start with that as a baseline. 17:48:08 q+ 17:48:08 John's point is right to consider and in other places I'd agree violently. 17:48:09 q? 17:48:13 ack rachel_thomas 17:48:33 rachel_thomas: FTC is a good starting point, but it requires looking at the whole document and the impact of changing the definition. 17:48:45 q? 17:48:47 ack robsherman 17:49:01 I think we need to make decisions and get to drafts, which we will revise many, many times. 17:49:14 But deadlocking on not making any decisions is getting boring 17:50:46 Agreeing to begin the discussion with the FTC language is good progress that will allow us to iterate further 17:50:47 q? 17:51:58 robsherman: (1) This group won't get to a place of being able to sign off on FTC definition and go home. But sounds like the consensus of the group is that we like FTC and DAA definitions as starting points, with the considerations we've discussed, and we need to think through it more. (2) Don't think we should look at this as an effort to codify all of the privacy laws in the world. It may be that people have to comply with this standard AND local law[CUT] 17:52:12 Ok. Do we have specific issues with the FTC text, or a general "we aren't willing to say yes to anything without going back for review" as sort of a general approach to not getting anything done? 17:52:23 I hear need for review to see how it works with the EU 17:52:37 Anything else specific? What's the to do list to move forward? 17:52:50 what are you saying about transparency? 17:52:51 yianni: Generally like FTC and DAA language, want to think more. Okay with some examples as long as not prescriptive. Admin, tech, physical measures. Do we agree on these points? 17:52:56 q? 17:53:35 q+ 17:53:38 q? 17:53:40 … On transparency, significant resistance on providing specific details about security / privacy measures. Aleecia mentioned we're not going to get agreement here. So nothing decided there. 17:53:43 ack johnsimpson. 17:53:49 ack johnsimpson 17:53:52 johnsimpson: Agree. 17:54:14 yianni: On examples, Rachel thought examples weren't the best way to go, but you thought maybe some language could be okay. 17:54:42 rachel_thomas: I'm not the tech expert on what examples would be most effective, but I agree that defining what's not acceptable - as opposed to what is acceptable - is a better way to go. 17:54:46 q? 17:54:54 sounds like an action item in the large group 17:54:57 MikeZaneis: In this F2F, suggest not focusing on non-normative and trying to get to high-level normative agreement. 17:54:58 for people to write examples 17:55:31 yianni: Pseudonyms. Many people said that use of pseudonyms can be compatible with de-identification, if appropriate organizational and physical measures are used. 17:55:37 say that again please 17:55:37 uh, not really 17:55:37 q? 17:55:44 disagree 17:55:46 well I'd disagree with that 17:55:52 … (Repeats per John's request.) 17:56:00 replacing one GUID with another is not de-id'ed 17:56:12 q? 17:56:13 not comfortable with that 17:56:16 … Aleecia, why? 17:56:35 again, I'm not sure that organizational measure should be taken into account 17:56:41 aleecia: If you have something that starts out with a unique identifier and replace it with another unique identifier, you haven't moved toward privacy at all. 17:56:43 q+ 17:56:55 … You rotate hashes, so you're not promoting privacy at all. 17:57:02 … No obvious bright line there. 17:57:03 q- 17:57:15 … Would have to do some work to figure out if you've actually de-identified or not. 17:57:30 … This could be part of a useful solution, but I would not make the broad statement you made. 17:57:36 q? 17:57:51 … Using a pseudonym is by itself not enough. 17:57:54 q? 17:58:01 thanks! 17:58:06 when are we back? 17:58:07 when are we back? 17:58:11 thanks! when do we resume? 17:58:12 Thomas_Schauf has left #dntd 17:58:17 -Jeff 17:58:18 -Aleecia 17:58:20 any change from published agenda? 17:58:20 we start back up at 2 17:58:21 -yianni 17:58:22 thanks! 17:58:35 -johnsimpson 17:58:37 johnsimpson has left #dntd 17:58:38 -vincent 17:59:18 -BerinSzoka 17:59:19 Team_(dntd)16:00Z has ended 17:59:19 Attendees were yianni, Jeff, johnsimpson, Dan_Auerbach, BerinSzoka, Aleecia, vincent 17:59:35 vincent has left #dntd 18:01:22 BerinSzoka has left #dntd 18:18:59 fielding has joined #dntd 18:21:28 rrsagent, draft minutes 18:21:28 I have made the request to generate http://www.w3.org/2013/02/12-dntd-minutes.html fielding 18:21:47 rrsagent, make logs world-visible 18:26:00 Meeting: TPWG F2F Breakout Group D 18:27:09 rrsagent, draft minutes 18:27:09 I have made the request to generate http://www.w3.org/2013/02/12-dntd-minutes.html fielding 18:47:33 bryan has left #dntd 19:06:10 npdoty has joined #dntd 19:27:27 rrsagent, make logs public 19:27:32 rrsagent, please draft minutes 19:27:32 I have made the request to generate http://www.w3.org/2013/02/12-dntd-minutes.html npdoty 20:00:31 Zakim has left #dntd 20:16:18 npdoty has joined #dntd 20:32:19 npdoty has left #dntd