13:57:43 RRSAgent has joined #DNT 13:57:43 logging to http://www.w3.org/2013/01/17-DNT-irc 13:57:45 bryan has joined #dnt 13:58:09 Zakim, this will be 87225 13:58:09 ok, yianni; I see Team_(dnt)14:00Z scheduled to start in 2 minutes 13:58:14 dtauerbach has joined #dnt 13:58:32 peterswire has joined #dnt 13:58:38 aleecia has joined #dnt 13:58:51 JoeHallCDT has joined #DNT 13:59:18 zakim, code? 13:59:18 the conference code is 87225 (tel:+1.617.761.6200 sip:zakim@voip.w3.org), aleecia 13:59:54 jeffwilson has joined #dnt 14:00:06 When I dial in, I do not see myself in the IRC as dialed in.. 14:00:15 Rob, neither do I 14:00:17 justin has joined #dnt 14:00:17 Paul has joined #DNT 14:00:24 Possibly just slow? 14:00:48 But I'm guessing something is broken in the Zakim world 14:00:58 Wileys has joined #dnt 14:01:03 vincent has joined #dnt 14:01:10 W3C: fixing IRC bots and taking attendance since... 14:01:15 zakim appears to be a little sleepy 14:01:18 johnsimpson has joined #dnt 14:01:21 dwainberg has joined #dnt 14:01:24 14:01:32 BAU 14:01:38 hwest has joined #dnt 14:01:47 Getting ready to dial in. 14:02:00 johnsimpson_ has joined #dnt 14:02:04 Good morning 14:02:11 I planned to before I got sick 14:02:28 peterswire has joined #dnt 14:02:29 efelten_ has joined #dnt 14:02:30 Marc_ has joined #DNT 14:02:54 johnsimpson_ has joined #dnt 14:02:55 (someone is typing & needs to mute) 14:02:57 peterswire has joined #dnt 14:03:00 john 14:03:12 hi 14:03:12 testing IRC 14:03:27 Zakim, this is dnt 14:03:27 ok, yianni; that matches Team_(dnt)14:00Z 14:03:34 efelten_ has joined #dnt 14:03:54 joe is scribe… someone remind me how to tell Zakim that and to start notes 14:03:54 + +1.215.796.aadd 14:03:55 scribe: JoeHallCDT 14:03:58 Zakim, who is on the call? 14:03:58 On the phone I see [GVoice], Jonathan_Mayer, +1.425.214.aaaa, Aleecia, +1.202.587.aabb, WileyS, ??P9, +1.631.803.aacc, rvaneijk, [CDT], +1.215.796.aadd 14:04:04 efelten_ has joined #dnt 14:04:05 johnsimpson has joined #dnt 14:04:05 present+ Bryan_Sullivan 14:04:33 zakim, aaaa is bryan 14:04:33 +bryan; got it 14:04:34 Peter Swire: goal is to discuss to what extent De-ID can remove data from scope of the standard 14:04:41 johnsimpson_ has joined #dnt 14:04:50 + +1.215.286.aaee 14:04:54 - +1.215.796.aadd 14:04:56 -??P9 14:04:59 … related: what sort of uses are consistent with compliance with the spec 14:05:05 efelten has joined #dnt 14:05:20 … if things are used for market research in ways that are entirely de-ID, that should be safe or out of scope 14:05:34 … on the other hand, if explicitly ID'd, standard should apply 14:05:40 +??P9 14:05:42 … clearly defining uses is crucial 14:05:44 peterswire_ has joined #dnt 14:05:57 … getting clear on terms, words and such is an important part of this 14:06:02 zakim, ??P9 is vincent 14:06:02 +vincent; got it 14:06:07 peterswire has joined #dnt 14:06:07 johnsimpson has joined #dnt 14:06:32 efelten_ has joined #dnt 14:06:38 johnsimpson has joined #dnt 14:06:46 … instead of having people talking past each other, we want a strong foundation of shared vocabulary 14:07:07 … delighted to have great people in the room and on the phone 14:07:12 q? 14:07:19 johnsimpson has joined #dnt 14:07:22 … agenda has been sent around 14:07:35 … ground rules for discussion 14:07:43 … this is not an official in-person meeting with 8 weeks notice 14:07:49 Zakim, who is on the call? 14:07:49 On the phone I see [GVoice], Jonathan_Mayer, bryan, Aleecia, +1.202.587.aabb, WileyS, +1.631.803.aacc, rvaneijk, [CDT], +1.215.286.aaee, vincent 14:07:58 … have been told by w3c staff that this can't make decisions towards normative language 14:08:30 johnsimpson_ has joined #dnt 14:08:31 … it would be good to agree on terms and definitions 14:08:50 … this should make people more comfortable with claims made in the world 14:08:50 +Peder_Magee 14:08:56 If you share that information externally... 14:08:57 … e.g., unsalted hashes 14:09:18 peterswire_ has joined #dnt 14:09:26 johnsimpson_ has joined #dnt 14:09:30 Could introductions include technical background? It would be helpful to understand who'll be participating from the technical side and who'll be observing from the law/policy perspective. 14:09:42 might want to q that jmayer 14:09:50 … first thing is incentives to de-ID 14:09:58 Do we need to re-introduce ourselves? 14:10:06 johnsimpson has joined #dnt 14:10:31 … Khaled El Emam will start us off with slides (jlh: not sure how phone peeps will see them) 14:10:34 johnsimpson has joined #dnt 14:10:48 … then to hashing, persistent ids, putting people in "buckets" 14:10:52 please send slides to the list and/or post them on the wiki ! 14:11:08 … Yianni will gather qs 14:11:23 + +1.202.257.aaff 14:11:30 johnsimpson has joined #dnt 14:11:31 dwainber_ has joined #dnt 14:11:49 efelten_ has joined #dnt 14:11:53 … will go around the room, please let us know any techincal experience 14:11:57 cannot hear 14:11:58 … Peter, law prof. 14:12:12 … Khaled works at U Toronto, CS background, working on health 14:12:22 efelten_ has joined #dnt 14:12:23 + +1.646.722.aagg 14:12:28 johnsimpson has joined #dnt 14:12:31 Dan Aurbach from EFF, worked at Google before doing data mining 14:12:33 Aturkel has joined #DNT 14:12:51 John Simpson, Consumer watchdog 14:12:55 peterswire has joined #dnt 14:12:58 Ed Felten, Princeton U. 14:13:00 johnsimpson has joined #dnt 14:13:05 research and teaching for 18 yuears 14:13:17 Felix Wu, prof. at Cordozo, PhD in CS from Berkeley 14:13:21 mecallahan has joined #DNT 14:13:27 Peter invited Felix based on techincal work 14:13:36 Paul Gliss, lawyer from Comcast, worked in De-ID space 14:13:46 efelten_ has joined #dnt 14:14:01 Chris Mejia, IAB, dir. of ad technology, tech dir. for DAA 14:14:04 johnsimpson has joined #dnt 14:14:10 Jeff Wilson, with AOL for 16 years 14:14:14 Marc Groman, NAI 14:14:26 David Wainberg, NAI, undergrad. at CS, web dev. for years 14:14:29 Heather West, Google 14:14:33 Justin Brookman, CDT 14:14:50 Bill Scanell, (probably a lawyer in a suit?) here to assist with communications 14:15:04 johnsimpson_ has joined #dnt 14:15:14 Peter McGee from FTC 14:15:31 Shane Wiley, Yahoo!! 14:15:32 johnsimpson has joined #dnt 14:15:42 Mary Ellen Callahan, Jenner and Block 14:15:54 Aleecia McDonald, PhD engineering 14:16:04 Bryan Sullivan, AT&T Director of Service Standards, WAP/Web browsing service architecture and mobile/web standards for AT&T since pre-2000 14:16:05 Adam Turkel, lawyer with AppNexis 14:16:16 dwainberg has joined #dnt 14:16:16 Bryan (?), AT&T director of standards 14:16:27 johnsimpson has joined #dnt 14:16:27 peterswire has joined #dnt 14:16:30 dtauerbach has joined #dnt 14:16:36 Ho Chun Ho, Comcast, data arch. 14:16:56 peterswire_ has joined #dnt 14:16:59 AHanff has joined #dnt 14:17:04 Jonathan Mayer, PhD student in CS at Stanford, at Stanford Security Lab 14:17:07 johnsimpson_ has joined #dnt 14:17:40 efelten__ has joined #dnt 14:17:43 is there a call on now? 14:18:09 Rob van Eijk, PhD student at x, (very lengthy afi. and background) 14:18:10 Yes, we're on a call now 14:18:24 Vincent Toubiana, Alcatel Lucent, PhD CS 14:18:25 s/x/Leiden University/ 14:18:28 thanks I didnt see it on the icalendar 14:18:41 efelten_ has joined #dnt 14:18:42 aff: Art. 29 Data Protection Working Party / Dutch DPA 14:18:44 Jules P, from Future of Privacy Forum 14:19:26 scribe: yianni 14:19:31 Brooks has joined #dnt 14:19:32 +[IPcaller] 14:19:38 peterswire has joined #dnt 14:19:53 johnsimpson has joined #dnt 14:19:53 Peter: Getting logistics worked out, brainstorm reasons in advertising and online space 14:20:01 peterswire_ has joined #dnt 14:20:05 ...why people have incentives to de-identify 14:20:16 ...self interest, business, or other reasons 14:20:21 +Brooks 14:20:31 pedermagee has joined #DNT 14:20:36 ...if we understand reasons, we might be able to understand what things will be done in practice 14:20:51 johnsimpson_ has joined #dnt 14:20:54 .privacy policy that says you do things in de-identified or anonymized ways 14:21:09 ...we do not use PII for certain operations, for example 14:21:13 johnsimpson_ has joined #dnt 14:21:22 ...risk for not following promises 14:22:10 Marc: people do not de-identify to avoid liability, they do it to mitigate privacy and security risk, then make the promise 14:22:12 johnsimpson has joined #dnt 14:22:12 efelten__ has joined #dnt 14:22:24 Paul:providing comfort to cusomters is a reason to de-identify 14:22:34 johnsimpson_ has joined #dnt 14:22:45 Peter: 2nd, organization have costs to data breaches, states and Europe 14:22:47 efelten_ has joined #dnt 14:23:05 ...expense of sending out notice and going through steps of data breach, if de-id you do not have to disclose 14:23:06 Encrypted is different than de-identified 14:23:09 peterswire has joined #dnt 14:23:16 johnsimpson has joined #dnt 14:23:31 Jules: big driver, beginning of NAI, big ad networks and crisis around it 14:23:38 peterswire has joined #dnt 14:23:40 In my experience, companies that say they only work with anonymous data mean it in the Latin sense -- literally without name. They do not mean that users are unidentifiable. I think we need to be very careful to keep these ideas separate. 14:24:03 +q 14:24:06 ...NAI treated PII and non PII very differently, representing in privacy policy that you tracked PII, you could make notice in opt-out notice 14:24:14 efelten__ has joined #dnt 14:24:21 ...in PIII, need more notice on web page, perhaps an opt-in 14:24:50 johnsimpson_ has joined #dnt 14:24:50 ... 7 large networks adopted, and forced other partners to follow 14:25:20 ...huge driver for ad netword that they make a specific representation of PII and non PII 14:25:32 Peter: are they other legal regimes for de-id? 14:25:33 efelten_ has joined #dnt 14:25:37 Rob, could you briefly address EU law? 14:25:55 johnsimpson_ has joined #dnt 14:25:58 Paul: regulatory treatment that is different for cable, services provided by cable providers 14:26:10 ...makes distinction between personally identified and not identified 14:26:21 Peter - are you suggesting if data is not linked to PII then it is "de-identified"? 14:26:23 peterswire has joined #dnt 14:26:26 ...much like NAI, different rules for consent and approval 14:26:47 peterswire_ has joined #dnt 14:26:52 efelten_ has joined #dnt 14:26:56 johnsimpson has joined #dnt 14:26:57 robsherman has joined #dnt 14:27:15 Marc: data security issues, beyond financial issues, reputational risk is a very large piece of it as well 14:27:53 ...privacy incident, costs are much higher than outside council and regulatory burdens, for many years talk about the x company incident 14:27:57 Shane, I think the question is whether "is" includes "can be", i.e. data not linked vs non-linkable is by definition non-PII 14:28:16 Peter: NAI, Cable Act, also have HIPAA, GLBA 14:28:30 ...if you are outside regime, you do not have regulatory burden 14:28:49 robsherman1 has joined #dnt 14:28:49 Shane - I think it's abundantly clear that no PII is not the same as non-identifiable (see Paul Ohm's summary paper) but I understand you're asking for Peter's view, which I do not know. 14:28:57 Marc: Privacy act, privacy impact assessment depends on whether you have individually identifiabe information 14:29:24 Peter: inside an organization, you have incentives of access controls, more people can tough if not PII 14:29:29 Bryan, that's my question - is it an absolute position? I've always felt de-identified was "more" than simply not PII. 14:29:35 efelten__ has joined #dnt 14:29:35 Aleecia - see above :-) 14:29:54 ...data base with financial information, many reasons for access control limits 14:30:00 peterswire has joined #dnt 14:30:12 ...for other employees there is a risk of breach if you do not De-identify 14:30:14 efelten_ has joined #dnt 14:30:32 johnsimpson_ has joined #dnt 14:30:39 efelten_ has joined #dnt 14:30:40 Khaled: opt-in consent or opt-out, evidence in health care sector for consent bias 14:30:55 ...de-identification allows you to avoid consent bias 14:31:03 johnsimpson has joined #dnt 14:31:06 efelten_ has joined #dnt 14:31:13 PII/Personal Data -> Pseudo/Anonymous -> De-Identified/Unlinkable -> No Value 14:31:30 any kind of analytics is very far streched... 14:31:32 johnsimpson has joined #dnt 14:31:35 Khaled: Beyond researchers, goes to analytics (bias data because you are missing a certain percent of population) 14:31:57 Peter: having full population better for the researchers, De-ID is a tool to get accurate analytics 14:31:58 johnsimpson has joined #dnt 14:32:09 ...Any other comments on reasons why people do de-identification? 14:32:32 Shane - I can imagine a dataset that removes PII and is also then not re-identifiable. But that's not a general rule. It's probably easier to talk about the type of data we're using. Removing PII is not going to render a server log file "safe," and indeed there might never be PII in the first place, yet still have identifiable data. 14:32:43 ...reasons for people to do this, trying to understand the terminology 14:32:46 RichLaBarca has joined #DNT 14:32:53 johnsimpson has joined #dnt 14:33:00 ...Khaled has a book on de-id coming out the beginning of April 14:33:12 Are slides available now? 14:33:12 efelten_ has joined #dnt 14:33:12 ...Khaled starting with part 2 and his slides 14:33:20 Shane, to be clear I was not stating a position, but a question. IMO identity includes a range of attributes only some of which are personal - remove/obscure the personal ones and you're home - science will always find new ways to relink and attribute data to persons, and we should not be trying to chase that rabbit 14:33:21 peterswire_ has joined #dnt 14:33:24 Slides have not come through on email yet!!! 14:33:30 johnsimpson has joined #dnt 14:33:40 yes, 14:33:41 I sent ten minutes ago, will resend. 14:33:42 difficult 14:33:48 thank you Shane 14:33:52 peterswire has joined #dnt 14:33:52 Also, lots of paper shuffling etc. 14:33:55 Khaled: walking through process of de-identification 14:34:14 johnsimpson_ has joined #dnt 14:34:34 um. 14:34:39 johnsimpson has joined #dnt 14:34:39 sounds off now 14:34:42 efelten_ has joined #dnt 14:34:58 Khaled: walk through de-identification we have been using, context will be healthcare 14:35:10 johnsimpson has joined #dnt 14:35:23 ...agree on terminology and general approach to terminology 14:35:35 ...basic process they have uses is five steps 14:35:40 Bryan, I'm mostly with you there. The key element is what is definied as "personal"... 14:35:48 ...assume we have health data set and want to release for secondary purpose 14:35:52 robsherman has joined #dnt 14:35:55 ...first step understand plausible attacks 14:36:00 johnsimpson_ has joined #dnt 14:36:03 efelten_ has joined #dnt 14:36:04 Where are these five steps sourced from? 14:36:07 vinay has joined #dnt 14:36:07 ...second, understands variable that can be used 14:36:08 + +1.917.934.aahh 14:36:13 zakim, aahh is vinay 14:36:13 +vinay; got it 14:36:19 ...measure risks, appply de-identification 14:36:31 ...Assume a public release ro releasing to a known data recipient 14:36:34 efelten_ has joined #dnt 14:36:37 johnsimpson has joined #dnt 14:36:39 Put your email in chat if you want the slides. 14:36:43 In absence of the slides, can someone copy/paste the slide content into IRC? 14:36:50 wileys@yahoo-inc.com 14:36:51 aleecia@aleecia.com 14:36:53 ...very different analysis, public have no controls, known recipient you can have controls and contracts 14:37:04 vigoel@adobe.com 14:37:07 a.hanff@think-privacy.com 14:37:10 johnsimpson has joined #dnt 14:37:17 ...For known data recipient, you have three attacks 14:37:19 vincent.toubiana@alcatel-lucent.com 14:37:25 Chris: what type of attack? 14:37:28 are we allowed to comment? 14:37:29 ed@felten.com 14:37:34 rich@addthis.com please 14:37:43 Khaled: re-identification attack 14:37:48 Slides answered, thanks. 14:37:55 got the slides, thanks 14:38:05 so can we ask questions? 14:38:07 q+ 14:38:08 q? 14:38:10 q? 14:38:17 If you have questions, please queue yourself; I'll monitor the queue 14:38:21 ack marc_ 14:38:24 ack robsherman 14:38:25 Thank you Heather! 14:38:27 q+ 14:38:49 (Reminder: to put yourself in the queue, just type q+) 14:38:54 johnsimpson has joined #dnt 14:38:57 Rob: information that is not being disclosed, storing information to make it de-identification, not planning to disclose? 14:39:16 ack AHanff 14:39:22 +q 14:39:23 typ[ing 14:39:30 I am typing lol 14:39:31 Khaled: go through same steps if you release to data recipient or internally 14:39:35 AHanff, are you just on irc? 14:39:44 Go ahead and type your question and I'll convey 14:39:45 q+ 14:39:46 no I am on phone too but not on headset 14:40:06 q+ 14:40:09 ack wileys 14:40:12 peterswire_ has joined #dnt 14:40:13 Shane: not mandating from a HIPAA perspective to de-identify, just for a risk management perspective, you would go through same process 14:40:17 Slides went to list finally, available here: http://lists.w3.org/Archives/Public/public-tracking/2013Jan/0062.html 14:40:17 johnsimpson_ has joined #dnt 14:40:18 robsherman1 has joined #dnt 14:40:28 Thank you Justin 14:40:29 q? 14:40:36 Khaled: contract, allow vendor to continue using the data, need to keep in de-identification manner 14:40:47 peterswire has joined #dnt 14:40:58 AHanff, go ahead and type question 14:41:05 Peter: HiPAA puts limits on data uses even internally 14:41:05 I would just like Khaled to acknowledge that known recipient doesn't guarantee confidentiality even with contractual observations. For example, i read recently that something like 90% of US medical authorities had data leaks in 2012, presumably contracts were in place... 14:41:24 Dan: clarifying, de-identification is a property of data? 14:41:30 ...It is not a process 14:41:37 johnsimpson_ has joined #dnt 14:41:49 Khaled: in practice you manage the risk of re-identification, re-identification is one tool in the tool box 14:41:49 efelten__ has joined #dnt 14:41:50 AHanff, feel free to share running comments as the presentation proceeds - they go in the record as well 14:41:56 thanks 14:42:14 johnsimpson_ has joined #dnt 14:42:20 q+ 14:42:24 efelten_ has joined #dnt 14:42:25 ack hwest 14:42:28 Khaled: deliberate re-identifiation by data recipient, if company signs a contract, as a corporation that company will not try to re-identificy 14:42:28 ack David_MacMillan 14:42:36 ack dtauerbach 14:42:44 q+ 14:42:49 robsherman has joined #dnt 14:42:50 ...there may be rogue employees, but probability of company re-identifying would be acceptably low 14:42:54 efelten__ has joined #dnt 14:43:02 the evidence would suggest otherwise with so many data leaks surely? 14:43:05 ...contracts are a good risk mitigating activity for first attack 14:43:09 I am aware of the q; will be calling on them at a soon moment 14:43:23 @AHanff, if you have a citation on the 90% figure, would you be so kind as to add that to the wiki? 14:43:27 ...rogue employee re-identifying an ex spouse for example is dependent on internal company controls 14:43:37 I will try and find it yes 14:43:48 peterswire has joined #dnt 14:43:48 ...first attack, as a company would you do it, do you have controls for rogue employees 14:43:51 robsherman1 has joined #dnt 14:43:52 Thanks, that's higher than I'd heard 14:43:54 efelten_ has joined #dnt 14:44:05 Peter: this is a risk management approach 14:44:14 johnsimpson_ has joined #dnt 14:44:16 peterswire has joined #dnt 14:44:39 Khaled: most recent guidance of HHS is a risk management approach, UK Commissions also talk about risk management and context based 14:44:51 q? 14:44:52 peterswire_ has joined #dnt 14:44:54 ...regulators approaching as a risk management exercise 14:44:57 ack dwainberg 14:45:02 johnsimpson has joined #dnt 14:45:20 David: De-ID is not a binary state, it is rather a description of lower risk (Khaled probability) 14:45:30 efelten__ has joined #dnt 14:45:30 peterswire_ has joined #dnt 14:45:48 Khaled: de-identification have been practiced for last 20 years, CDC, CMS, set thresholds along a continuim 14:45:55 ...that is context dependent 14:46:12 johnsimpson_ has joined #dnt 14:46:13 aleecia, it was a Ponemon study, there is an article here on it (will add to wiki) http://www2.idexpertscorp.com/press/report-94-of-us-hospitals-suffered-data-breaches-and-45-had-quintuplets/ 14:46:13 David: helpful to talk about de-identification as a process and something else as a end goal? 14:46:30 Dan: still fair to share de-identification is a property of data 14:46:37 + +1.646.654.aaii 14:46:47 David: functional definitioin of de-identification is a function of the context, could be 20 different forms 14:46:57 efelten_ has joined #dnt 14:47:01 schunter has joined #dnt 14:47:03 robsherman has joined #dnt 14:47:08 Khaled: can be multiple de-id versions for the same data base, public versus trusted party 14:47:39 Peter: binary de-identified or not? Under HHS, counts at de-identified if overall risk is low. 14:47:57 johnsimpson has joined #dnt 14:48:05 peterswire has joined #dnt 14:48:15 Khaled: once you have a spectrum, and cut off in the middle, you turn it into a binary decision 14:48:29 Peter: de-identified is a conclusion term under some regime under some set of facts 14:48:30 but the thresholds are not static, they move constantly depending on the amount of data aggregated about an individual 14:48:36 peterswire has joined #dnt 14:48:38 johnsimpson_ has joined #dnt 14:48:47 ...yes it is de-identified or no it is not, along the way there is a risk management regime 14:49:05 q? 14:49:05 ...de-identified right now is a conclusion term for a regime, we do not have that standard right now in dnt 14:49:13