IRC log of DNT on 2013-01-17
Timestamps are in UTC.
- 13:57:43 [RRSAgent]
- RRSAgent has joined #DNT
- 13:57:43 [RRSAgent]
- logging to http://www.w3.org/2013/01/17-DNT-irc
- 13:57:45 [bryan]
- bryan has joined #dnt
- 13:58:09 [yianni]
- Zakim, this will be 87225
- 13:58:09 [Zakim]
- ok, yianni; I see Team_(dnt)14:00Z scheduled to start in 2 minutes
- 13:58:14 [dtauerbach]
- dtauerbach has joined #dnt
- 13:58:32 [peterswire]
- peterswire has joined #dnt
- 13:58:38 [aleecia]
- aleecia has joined #dnt
- 13:58:51 [JoeHallCDT]
- JoeHallCDT has joined #DNT
- 13:59:18 [aleecia]
- zakim, code?
- 13:59:18 [Zakim]
- the conference code is 87225 (tel:+1.617.761.6200 sip:zakim@voip.w3.org), aleecia
- 13:59:54 [jeffwilson]
- jeffwilson has joined #dnt
- 14:00:06 [rvaneijk]
- When I dial in, I do not see myself in the IRC as dialed in..
- 14:00:15 [aleecia]
- Rob, neither do I
- 14:00:17 [justin]
- justin has joined #dnt
- 14:00:17 [Paul]
- Paul has joined #DNT
- 14:00:24 [aleecia]
- Possibly just slow?
- 14:00:48 [aleecia]
- But I'm guessing something is broken in the Zakim world
- 14:00:58 [Wileys]
- Wileys has joined #dnt
- 14:01:03 [vincent]
- vincent has joined #dnt
- 14:01:10 [jmayer]
- W3C: fixing IRC bots and taking attendance since...
- 14:01:15 [bryan]
- zakim appears to be a little sleepy
- 14:01:18 [johnsimpson]
- johnsimpson has joined #dnt
- 14:01:21 [dwainberg]
- dwainberg has joined #dnt
- 14:01:24 [aleecia]
- <groan>
- 14:01:32 [bryan]
- BAU
- 14:01:38 [hwest]
- hwest has joined #dnt
- 14:01:47 [justin]
- Getting ready to dial in.
- 14:02:00 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:02:04 [johnsimpson_]
- Good morning
- 14:02:11 [aleecia]
- I planned to before I got sick
- 14:02:28 [peterswire]
- peterswire has joined #dnt
- 14:02:29 [efelten_]
- efelten_ has joined #dnt
- 14:02:30 [Marc_]
- Marc_ has joined #DNT
- 14:02:54 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:02:55 [aleecia]
- (someone is typing & needs to mute)
- 14:02:57 [peterswire]
- peterswire has joined #dnt
- 14:03:00 [johnsimpson]
- john
- 14:03:12 [aleecia]
- hi
- 14:03:12 [johnsimpson]
- testing IRC
- 14:03:27 [yianni]
- Zakim, this is dnt
- 14:03:27 [Zakim]
- ok, yianni; that matches Team_(dnt)14:00Z
- 14:03:34 [efelten_]
- efelten_ has joined #dnt
- 14:03:54 [JoeHallCDT]
- joe is scribe… someone remind me how to tell Zakim that and to start notes
- 14:03:54 [Zakim]
- + +1.215.796.aadd
- 14:03:55 [yianni]
- scribe: JoeHallCDT
- 14:03:58 [Wileys]
- Zakim, who is on the call?
- 14:03:58 [Zakim]
- On the phone I see [GVoice], Jonathan_Mayer, +1.425.214.aaaa, Aleecia, +1.202.587.aabb, WileyS, ??P9, +1.631.803.aacc, rvaneijk, [CDT], +1.215.796.aadd
- 14:04:04 [efelten_]
- efelten_ has joined #dnt
- 14:04:05 [johnsimpson]
- johnsimpson has joined #dnt
- 14:04:05 [bryan]
- present+ Bryan_Sullivan
- 14:04:33 [bryan]
- zakim, aaaa is bryan
- 14:04:33 [Zakim]
- +bryan; got it
- 14:04:34 [JoeHallCDT]
- Peter Swire: goal is to discuss to what extent De-ID can remove data from scope of the standard
- 14:04:41 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:04:50 [Zakim]
- + +1.215.286.aaee
- 14:04:54 [Zakim]
- - +1.215.796.aadd
- 14:04:56 [Zakim]
- -??P9
- 14:04:59 [JoeHallCDT]
- … related: what sort of uses are consistent with compliance with the spec
- 14:05:05 [efelten]
- efelten has joined #dnt
- 14:05:20 [JoeHallCDT]
- … if things are used for market research in ways that are entirely de-ID, that should be safe or out of scope
- 14:05:34 [JoeHallCDT]
- … on the other hand, if explicitly ID'd, standard should apply
- 14:05:40 [Zakim]
- +??P9
- 14:05:42 [JoeHallCDT]
- … clearly defining uses is crucial
- 14:05:44 [peterswire_]
- peterswire_ has joined #dnt
- 14:05:57 [JoeHallCDT]
- … getting clear on terms, words and such is an important part of this
- 14:06:02 [vincent]
- zakim, ??P9 is vincent
- 14:06:02 [Zakim]
- +vincent; got it
- 14:06:07 [peterswire]
- peterswire has joined #dnt
- 14:06:07 [johnsimpson]
- johnsimpson has joined #dnt
- 14:06:32 [efelten_]
- efelten_ has joined #dnt
- 14:06:38 [johnsimpson]
- johnsimpson has joined #dnt
- 14:06:46 [JoeHallCDT]
- … instead of having people talking past each other, we want a strong foundation of shared vocabulary
- 14:07:07 [JoeHallCDT]
- … delighted to have great people in the room and on the phone
- 14:07:12 [justin]
- q?
- 14:07:19 [johnsimpson]
- johnsimpson has joined #dnt
- 14:07:22 [JoeHallCDT]
- … agenda has been sent around
- 14:07:35 [JoeHallCDT]
- … ground rules for discussion
- 14:07:43 [JoeHallCDT]
- … this is not an official in-person meeting with 8 weeks notice
- 14:07:49 [yianni]
- Zakim, who is on the call?
- 14:07:49 [Zakim]
- On the phone I see [GVoice], Jonathan_Mayer, bryan, Aleecia, +1.202.587.aabb, WileyS, +1.631.803.aacc, rvaneijk, [CDT], +1.215.286.aaee, vincent
- 14:07:58 [JoeHallCDT]
- … have been told by w3c staff that this can't make decisions towards normative language
- 14:08:30 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:08:31 [JoeHallCDT]
- … it would be good to agree on terms and definitions
- 14:08:50 [JoeHallCDT]
- … this should make people more comfortable with claims made in the world
- 14:08:50 [Zakim]
- +Peder_Magee
- 14:08:56 [Wileys]
- If you share that information externally...
- 14:08:57 [JoeHallCDT]
- … e.g., unsalted hashes
- 14:09:18 [peterswire_]
- peterswire_ has joined #dnt
- 14:09:26 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:09:30 [jmayer]
- Could introductions include technical background? It would be helpful to understand who'll be participating from the technical side and who'll be observing from the law/policy perspective.
- 14:09:42 [JoeHallCDT]
- might want to q that jmayer
- 14:09:50 [JoeHallCDT]
- … first thing is incentives to de-ID
- 14:09:58 [aleecia]
- Do we need to re-introduce ourselves?
- 14:10:06 [johnsimpson]
- johnsimpson has joined #dnt
- 14:10:31 [JoeHallCDT]
- … Khaled El Emam will start us off with slides (jlh: not sure how phone peeps will see them)
- 14:10:34 [johnsimpson]
- johnsimpson has joined #dnt
- 14:10:48 [JoeHallCDT]
- … then to hashing, persistent ids, putting people in "buckets"
- 14:10:52 [rvaneijk]
- please send slides to the list and/or post them on the wiki !
- 14:11:08 [JoeHallCDT]
- … Yianni will gather qs
- 14:11:23 [Zakim]
- + +1.202.257.aaff
- 14:11:30 [johnsimpson]
- johnsimpson has joined #dnt
- 14:11:31 [dwainber_]
- dwainber_ has joined #dnt
- 14:11:49 [efelten_]
- efelten_ has joined #dnt
- 14:11:53 [JoeHallCDT]
- … will go around the room, please let us know any techincal experience
- 14:11:57 [aleecia]
- cannot hear
- 14:11:58 [JoeHallCDT]
- … Peter, law prof.
- 14:12:12 [JoeHallCDT]
- … Khaled works at U Toronto, CS background, working on health
- 14:12:22 [efelten_]
- efelten_ has joined #dnt
- 14:12:23 [Zakim]
- + +1.646.722.aagg
- 14:12:28 [johnsimpson]
- johnsimpson has joined #dnt
- 14:12:31 [JoeHallCDT]
- Dan Aurbach from EFF, worked at Google before doing data mining
- 14:12:33 [Aturkel]
- Aturkel has joined #DNT
- 14:12:51 [JoeHallCDT]
- John Simpson, Consumer watchdog
- 14:12:55 [peterswire]
- peterswire has joined #dnt
- 14:12:58 [JoeHallCDT]
- Ed Felten, Princeton U.
- 14:13:00 [johnsimpson]
- johnsimpson has joined #dnt
- 14:13:05 [JoeHallCDT]
- research and teaching for 18 yuears
- 14:13:17 [JoeHallCDT]
- Felix Wu, prof. at Cordozo, PhD in CS from Berkeley
- 14:13:21 [mecallahan]
- mecallahan has joined #DNT
- 14:13:27 [JoeHallCDT]
- Peter invited Felix based on techincal work
- 14:13:36 [JoeHallCDT]
- Paul Gliss, lawyer from Comcast, worked in De-ID space
- 14:13:46 [efelten_]
- efelten_ has joined #dnt
- 14:14:01 [JoeHallCDT]
- Chris Mejia, IAB, dir. of ad technology, tech dir. for DAA
- 14:14:04 [johnsimpson]
- johnsimpson has joined #dnt
- 14:14:10 [JoeHallCDT]
- Jeff Wilson, with AOL for 16 years
- 14:14:14 [JoeHallCDT]
- Marc Groman, NAI
- 14:14:26 [JoeHallCDT]
- David Wainberg, NAI, undergrad. at CS, web dev. for years
- 14:14:29 [JoeHallCDT]
- Heather West, Google
- 14:14:33 [JoeHallCDT]
- Justin Brookman, CDT
- 14:14:50 [JoeHallCDT]
- Bill Scanell, (probably a lawyer in a suit?) here to assist with communications
- 14:15:04 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:15:14 [JoeHallCDT]
- Peter McGee from FTC
- 14:15:31 [JoeHallCDT]
- Shane Wiley, Yahoo!!
- 14:15:32 [johnsimpson]
- johnsimpson has joined #dnt
- 14:15:42 [JoeHallCDT]
- Mary Ellen Callahan, Jenner and Block
- 14:15:54 [JoeHallCDT]
- Aleecia McDonald, PhD engineering
- 14:16:04 [bryan]
- Bryan Sullivan, AT&T Director of Service Standards, WAP/Web browsing service architecture and mobile/web standards for AT&T since pre-2000
- 14:16:05 [JoeHallCDT]
- Adam Turkel, lawyer with AppNexis
- 14:16:16 [dwainberg]
- dwainberg has joined #dnt
- 14:16:16 [JoeHallCDT]
- Bryan (?), AT&T director of standards
- 14:16:27 [johnsimpson]
- johnsimpson has joined #dnt
- 14:16:27 [peterswire]
- peterswire has joined #dnt
- 14:16:30 [dtauerbach]
- dtauerbach has joined #dnt
- 14:16:36 [JoeHallCDT]
- Ho Chun Ho, Comcast, data arch.
- 14:16:56 [peterswire_]
- peterswire_ has joined #dnt
- 14:16:59 [AHanff]
- AHanff has joined #dnt
- 14:17:04 [JoeHallCDT]
- Jonathan Mayer, PhD student in CS at Stanford, at Stanford Security Lab
- 14:17:07 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:17:40 [efelten__]
- efelten__ has joined #dnt
- 14:17:43 [AHanff]
- is there a call on now?
- 14:18:09 [JoeHallCDT]
- Rob van Eijk, PhD student at x, (very lengthy afi. and background)
- 14:18:10 [aleecia]
- Yes, we're on a call now
- 14:18:24 [JoeHallCDT]
- Vincent Toubiana, Alcatel Lucent, PhD CS
- 14:18:25 [rvaneijk]
- s/x/Leiden University/
- 14:18:28 [AHanff]
- thanks I didnt see it on the icalendar
- 14:18:41 [efelten_]
- efelten_ has joined #dnt
- 14:18:42 [rvaneijk]
- aff: Art. 29 Data Protection Working Party / Dutch DPA
- 14:18:44 [JoeHallCDT]
- Jules P, from Future of Privacy Forum
- 14:19:26 [yianni]
- scribe: yianni
- 14:19:31 [Brooks]
- Brooks has joined #dnt
- 14:19:32 [Zakim]
- +[IPcaller]
- 14:19:38 [peterswire]
- peterswire has joined #dnt
- 14:19:53 [johnsimpson]
- johnsimpson has joined #dnt
- 14:19:53 [yianni]
- Peter: Getting logistics worked out, brainstorm reasons in advertising and online space
- 14:20:01 [peterswire_]
- peterswire_ has joined #dnt
- 14:20:05 [yianni]
- ...why people have incentives to de-identify
- 14:20:16 [yianni]
- ...self interest, business, or other reasons
- 14:20:21 [Zakim]
- +Brooks
- 14:20:31 [pedermagee]
- pedermagee has joined #DNT
- 14:20:36 [yianni]
- ...if we understand reasons, we might be able to understand what things will be done in practice
- 14:20:51 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:20:54 [yianni]
- .privacy policy that says you do things in de-identified or anonymized ways
- 14:21:09 [yianni]
- ...we do not use PII for certain operations, for example
- 14:21:13 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:21:22 [yianni]
- ...risk for not following promises
- 14:22:10 [yianni]
- Marc: people do not de-identify to avoid liability, they do it to mitigate privacy and security risk, then make the promise
- 14:22:12 [johnsimpson]
- johnsimpson has joined #dnt
- 14:22:12 [efelten__]
- efelten__ has joined #dnt
- 14:22:24 [yianni]
- Paul:providing comfort to cusomters is a reason to de-identify
- 14:22:34 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:22:45 [yianni]
- Peter: 2nd, organization have costs to data breaches, states and Europe
- 14:22:47 [efelten_]
- efelten_ has joined #dnt
- 14:23:05 [yianni]
- ...expense of sending out notice and going through steps of data breach, if de-id you do not have to disclose
- 14:23:06 [Wileys]
- Encrypted is different than de-identified
- 14:23:09 [peterswire]
- peterswire has joined #dnt
- 14:23:16 [johnsimpson]
- johnsimpson has joined #dnt
- 14:23:31 [yianni]
- Jules: big driver, beginning of NAI, big ad networks and crisis around it
- 14:23:38 [peterswire]
- peterswire has joined #dnt
- 14:23:40 [aleecia]
- In my experience, companies that say they only work with anonymous data mean it in the Latin sense -- literally without name. They do not mean that users are unidentifiable. I think we need to be very careful to keep these ideas separate.
- 14:24:03 [Marc_]
- +q
- 14:24:06 [yianni]
- ...NAI treated PII and non PII very differently, representing in privacy policy that you tracked PII, you could make notice in opt-out notice
- 14:24:14 [efelten__]
- efelten__ has joined #dnt
- 14:24:21 [yianni]
- ...in PIII, need more notice on web page, perhaps an opt-in
- 14:24:50 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:24:50 [yianni]
- ... 7 large networks adopted, and forced other partners to follow
- 14:25:20 [yianni]
- ...huge driver for ad netword that they make a specific representation of PII and non PII
- 14:25:32 [yianni]
- Peter: are they other legal regimes for de-id?
- 14:25:33 [efelten_]
- efelten_ has joined #dnt
- 14:25:37 [jmayer]
- Rob, could you briefly address EU law?
- 14:25:55 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:25:58 [yianni]
- Paul: regulatory treatment that is different for cable, services provided by cable providers
- 14:26:10 [yianni]
- ...makes distinction between personally identified and not identified
- 14:26:21 [Wileys]
- Peter - are you suggesting if data is not linked to PII then it is "de-identified"?
- 14:26:23 [peterswire]
- peterswire has joined #dnt
- 14:26:26 [yianni]
- ...much like NAI, different rules for consent and approval
- 14:26:47 [peterswire_]
- peterswire_ has joined #dnt
- 14:26:52 [efelten_]
- efelten_ has joined #dnt
- 14:26:56 [johnsimpson]
- johnsimpson has joined #dnt
- 14:26:57 [robsherman]
- robsherman has joined #dnt
- 14:27:15 [yianni]
- Marc: data security issues, beyond financial issues, reputational risk is a very large piece of it as well
- 14:27:53 [yianni]
- ...privacy incident, costs are much higher than outside council and regulatory burdens, for many years talk about the x company incident
- 14:27:57 [bryan]
- Shane, I think the question is whether "is" includes "can be", i.e. data not linked vs non-linkable is by definition non-PII
- 14:28:16 [yianni]
- Peter: NAI, Cable Act, also have HIPAA, GLBA
- 14:28:30 [yianni]
- ...if you are outside regime, you do not have regulatory burden
- 14:28:49 [robsherman1]
- robsherman1 has joined #dnt
- 14:28:49 [aleecia]
- Shane - I think it's abundantly clear that no PII is not the same as non-identifiable (see Paul Ohm's summary paper) but I understand you're asking for Peter's view, which I do not know.
- 14:28:57 [yianni]
- Marc: Privacy act, privacy impact assessment depends on whether you have individually identifiabe information
- 14:29:24 [yianni]
- Peter: inside an organization, you have incentives of access controls, more people can tough if not PII
- 14:29:29 [Wileys]
- Bryan, that's my question - is it an absolute position? I've always felt de-identified was "more" than simply not PII.
- 14:29:35 [efelten__]
- efelten__ has joined #dnt
- 14:29:35 [Wileys]
- Aleecia - see above :-)
- 14:29:54 [yianni]
- ...data base with financial information, many reasons for access control limits
- 14:30:00 [peterswire]
- peterswire has joined #dnt
- 14:30:12 [yianni]
- ...for other employees there is a risk of breach if you do not De-identify
- 14:30:14 [efelten_]
- efelten_ has joined #dnt
- 14:30:32 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:30:39 [efelten_]
- efelten_ has joined #dnt
- 14:30:40 [yianni]
- Khaled: opt-in consent or opt-out, evidence in health care sector for consent bias
- 14:30:55 [yianni]
- ...de-identification allows you to avoid consent bias
- 14:31:03 [johnsimpson]
- johnsimpson has joined #dnt
- 14:31:06 [efelten_]
- efelten_ has joined #dnt
- 14:31:13 [Wileys]
- PII/Personal Data -> Pseudo/Anonymous -> De-Identified/Unlinkable -> No Value
- 14:31:30 [rvaneijk]
- any kind of analytics is very far streched...
- 14:31:32 [johnsimpson]
- johnsimpson has joined #dnt
- 14:31:35 [yianni]
- Khaled: Beyond researchers, goes to analytics (bias data because you are missing a certain percent of population)
- 14:31:57 [yianni]
- Peter: having full population better for the researchers, De-ID is a tool to get accurate analytics
- 14:31:58 [johnsimpson]
- johnsimpson has joined #dnt
- 14:32:09 [yianni]
- ...Any other comments on reasons why people do de-identification?
- 14:32:32 [aleecia]
- Shane - I can imagine a dataset that removes PII and is also then not re-identifiable. But that's not a general rule. It's probably easier to talk about the type of data we're using. Removing PII is not going to render a server log file "safe," and indeed there might never be PII in the first place, yet still have identifiable data.
- 14:32:43 [yianni]
- ...reasons for people to do this, trying to understand the terminology
- 14:32:46 [RichLaBarca]
- RichLaBarca has joined #DNT
- 14:32:53 [johnsimpson]
- johnsimpson has joined #dnt
- 14:33:00 [yianni]
- ...Khaled has a book on de-id coming out the beginning of April
- 14:33:12 [aleecia]
- Are slides available now?
- 14:33:12 [efelten_]
- efelten_ has joined #dnt
- 14:33:12 [yianni]
- ...Khaled starting with part 2 and his slides
- 14:33:20 [bryan]
- Shane, to be clear I was not stating a position, but a question. IMO identity includes a range of attributes only some of which are personal - remove/obscure the personal ones and you're home - science will always find new ways to relink and attribute data to persons, and we should not be trying to chase that rabbit
- 14:33:21 [peterswire_]
- peterswire_ has joined #dnt
- 14:33:24 [Wileys]
- Slides have not come through on email yet!!!
- 14:33:30 [johnsimpson]
- johnsimpson has joined #dnt
- 14:33:40 [rvaneijk]
- yes,
- 14:33:41 [justin]
- I sent ten minutes ago, will resend.
- 14:33:42 [AHanff]
- difficult
- 14:33:48 [aleecia]
- thank you Shane
- 14:33:52 [peterswire]
- peterswire has joined #dnt
- 14:33:52 [jmayer]
- Also, lots of paper shuffling etc.
- 14:33:55 [yianni]
- Khaled: walking through process of de-identification
- 14:34:14 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:34:34 [aleecia]
- um.
- 14:34:39 [johnsimpson]
- johnsimpson has joined #dnt
- 14:34:39 [rvaneijk]
- sounds off now
- 14:34:42 [efelten_]
- efelten_ has joined #dnt
- 14:34:58 [yianni]
- Khaled: walk through de-identification we have been using, context will be healthcare
- 14:35:10 [johnsimpson]
- johnsimpson has joined #dnt
- 14:35:23 [yianni]
- ...agree on terminology and general approach to terminology
- 14:35:35 [yianni]
- ...basic process they have uses is five steps
- 14:35:40 [Wileys]
- Bryan, I'm mostly with you there. The key element is what is definied as "personal"...
- 14:35:48 [yianni]
- ...assume we have health data set and want to release for secondary purpose
- 14:35:52 [robsherman]
- robsherman has joined #dnt
- 14:35:55 [yianni]
- ...first step understand plausible attacks
- 14:36:00 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:36:03 [efelten_]
- efelten_ has joined #dnt
- 14:36:04 [jmayer]
- Where are these five steps sourced from?
- 14:36:07 [vinay]
- vinay has joined #dnt
- 14:36:07 [yianni]
- ...second, understands variable that can be used
- 14:36:08 [Zakim]
- + +1.917.934.aahh
- 14:36:13 [vinay]
- zakim, aahh is vinay
- 14:36:13 [Zakim]
- +vinay; got it
- 14:36:19 [yianni]
- ...measure risks, appply de-identification
- 14:36:31 [yianni]
- ...Assume a public release ro releasing to a known data recipient
- 14:36:34 [efelten_]
- efelten_ has joined #dnt
- 14:36:37 [johnsimpson]
- johnsimpson has joined #dnt
- 14:36:39 [justin]
- Put your email in chat if you want the slides.
- 14:36:43 [bryan]
- In absence of the slides, can someone copy/paste the slide content into IRC?
- 14:36:50 [Wileys]
- wileys@yahoo-inc.com
- 14:36:51 [aleecia]
- aleecia@aleecia.com
- 14:36:53 [yianni]
- ...very different analysis, public have no controls, known recipient you can have controls and contracts
- 14:37:04 [vinay]
- vigoel@adobe.com
- 14:37:07 [AHanff]
- a.hanff@think-privacy.com
- 14:37:10 [johnsimpson]
- johnsimpson has joined #dnt
- 14:37:17 [yianni]
- ...For known data recipient, you have three attacks
- 14:37:19 [vincent]
- vincent.toubiana@alcatel-lucent.com
- 14:37:25 [yianni]
- Chris: what type of attack?
- 14:37:28 [AHanff]
- are we allowed to comment?
- 14:37:29 [aleecia]
- ed@felten.com
- 14:37:34 [RichLaBarca]
- rich@addthis.com please
- 14:37:43 [yianni]
- Khaled: re-identification attack
- 14:37:48 [jmayer]
- Slides answered, thanks.
- 14:37:55 [bryan]
- got the slides, thanks
- 14:38:05 [AHanff]
- so can we ask questions?
- 14:38:07 [robsherman]
- q+
- 14:38:08 [justin]
- q?
- 14:38:10 [dtauerbach]
- q?
- 14:38:17 [hwest]
- If you have questions, please queue yourself; I'll monitor the queue
- 14:38:21 [justin]
- ack marc_
- 14:38:24 [justin]
- ack robsherman
- 14:38:25 [Wileys]
- Thank you Heather!
- 14:38:27 [AHanff]
- q+
- 14:38:49 [hwest]
- (Reminder: to put yourself in the queue, just type q+)
- 14:38:54 [johnsimpson]
- johnsimpson has joined #dnt
- 14:38:57 [yianni]
- Rob: information that is not being disclosed, storing information to make it de-identification, not planning to disclose?
- 14:39:16 [hwest]
- ack AHanff
- 14:39:22 [Wileys]
- +q
- 14:39:23 [AHanff]
- typ[ing
- 14:39:30 [AHanff]
- I am typing lol
- 14:39:31 [yianni]
- Khaled: go through same steps if you release to data recipient or internally
- 14:39:35 [hwest]
- AHanff, are you just on irc?
- 14:39:44 [hwest]
- Go ahead and type your question and I'll convey
- 14:39:45 [hwest]
- q+
- 14:39:46 [AHanff]
- no I am on phone too but not on headset
- 14:40:06 [dtauerbach]
- q+
- 14:40:09 [Wileys]
- ack wileys
- 14:40:12 [peterswire_]
- peterswire_ has joined #dnt
- 14:40:13 [yianni]
- Shane: not mandating from a HIPAA perspective to de-identify, just for a risk management perspective, you would go through same process
- 14:40:17 [justin]
- Slides went to list finally, available here: http://lists.w3.org/Archives/Public/public-tracking/2013Jan/0062.html
- 14:40:17 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:40:18 [robsherman1]
- robsherman1 has joined #dnt
- 14:40:28 [aleecia]
- Thank you Justin
- 14:40:29 [hwest]
- q?
- 14:40:36 [yianni]
- Khaled: contract, allow vendor to continue using the data, need to keep in de-identification manner
- 14:40:47 [peterswire]
- peterswire has joined #dnt
- 14:40:58 [hwest]
- AHanff, go ahead and type question
- 14:41:05 [yianni]
- Peter: HiPAA puts limits on data uses even internally
- 14:41:05 [AHanff]
- I would just like Khaled to acknowledge that known recipient doesn't guarantee confidentiality even with contractual observations. For example, i read recently that something like 90% of US medical authorities had data leaks in 2012, presumably contracts were in place...
- 14:41:24 [yianni]
- Dan: clarifying, de-identification is a property of data?
- 14:41:30 [yianni]
- ...It is not a process
- 14:41:37 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:41:49 [yianni]
- Khaled: in practice you manage the risk of re-identification, re-identification is one tool in the tool box
- 14:41:49 [efelten__]
- efelten__ has joined #dnt
- 14:41:50 [hwest]
- AHanff, feel free to share running comments as the presentation proceeds - they go in the record as well
- 14:41:56 [AHanff]
- thanks
- 14:42:14 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:42:20 [dwainberg]
- q+
- 14:42:24 [efelten_]
- efelten_ has joined #dnt
- 14:42:25 [hwest]
- ack hwest
- 14:42:28 [yianni]
- Khaled: deliberate re-identifiation by data recipient, if company signs a contract, as a corporation that company will not try to re-identificy
- 14:42:28 [hwest]
- ack David_MacMillan
- 14:42:36 [hwest]
- ack dtauerbach
- 14:42:44 [jmayer]
- q+
- 14:42:49 [robsherman]
- robsherman has joined #dnt
- 14:42:50 [yianni]
- ...there may be rogue employees, but probability of company re-identifying would be acceptably low
- 14:42:54 [efelten__]
- efelten__ has joined #dnt
- 14:43:02 [AHanff]
- the evidence would suggest otherwise with so many data leaks surely?
- 14:43:05 [yianni]
- ...contracts are a good risk mitigating activity for first attack
- 14:43:09 [peterswire]
- I am aware of the q; will be calling on them at a soon moment
- 14:43:23 [aleecia]
- @AHanff, if you have a citation on the 90% figure, would you be so kind as to add that to the wiki?
- 14:43:27 [yianni]
- ...rogue employee re-identifying an ex spouse for example is dependent on internal company controls
- 14:43:37 [AHanff]
- I will try and find it yes
- 14:43:48 [peterswire]
- peterswire has joined #dnt
- 14:43:48 [yianni]
- ...first attack, as a company would you do it, do you have controls for rogue employees
- 14:43:51 [robsherman1]
- robsherman1 has joined #dnt
- 14:43:52 [aleecia]
- Thanks, that's higher than I'd heard
- 14:43:54 [efelten_]
- efelten_ has joined #dnt
- 14:44:05 [yianni]
- Peter: this is a risk management approach
- 14:44:14 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:44:16 [peterswire]
- peterswire has joined #dnt
- 14:44:39 [yianni]
- Khaled: most recent guidance of HHS is a risk management approach, UK Commissions also talk about risk management and context based
- 14:44:51 [hwest]
- q?
- 14:44:52 [peterswire_]
- peterswire_ has joined #dnt
- 14:44:54 [yianni]
- ...regulators approaching as a risk management exercise
- 14:44:57 [hwest]
- ack dwainberg
- 14:45:02 [johnsimpson]
- johnsimpson has joined #dnt
- 14:45:20 [yianni]
- David: De-ID is not a binary state, it is rather a description of lower risk (Khaled probability)
- 14:45:30 [efelten__]
- efelten__ has joined #dnt
- 14:45:30 [peterswire_]
- peterswire_ has joined #dnt
- 14:45:48 [yianni]
- Khaled: de-identification have been practiced for last 20 years, CDC, CMS, set thresholds along a continuim
- 14:45:55 [yianni]
- ...that is context dependent
- 14:46:12 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:46:13 [AHanff]
- aleecia, it was a Ponemon study, there is an article here on it (will add to wiki) http://www2.idexpertscorp.com/press/report-94-of-us-hospitals-suffered-data-breaches-and-45-had-quintuplets/
- 14:46:13 [yianni]
- David: helpful to talk about de-identification as a process and something else as a end goal?
- 14:46:30 [yianni]
- Dan: still fair to share de-identification is a property of data
- 14:46:37 [Zakim]
- + +1.646.654.aaii
- 14:46:47 [yianni]
- David: functional definitioin of de-identification is a function of the context, could be 20 different forms
- 14:46:57 [efelten_]
- efelten_ has joined #dnt
- 14:47:01 [schunter]
- schunter has joined #dnt
- 14:47:03 [robsherman]
- robsherman has joined #dnt
- 14:47:08 [yianni]
- Khaled: can be multiple de-id versions for the same data base, public versus trusted party
- 14:47:39 [yianni]
- Peter: binary de-identified or not? Under HHS, counts at de-identified if overall risk is low.
- 14:47:57 [johnsimpson]
- johnsimpson has joined #dnt
- 14:48:05 [peterswire]
- peterswire has joined #dnt
- 14:48:15 [yianni]
- Khaled: once you have a spectrum, and cut off in the middle, you turn it into a binary decision
- 14:48:29 [yianni]
- Peter: de-identified is a conclusion term under some regime under some set of facts
- 14:48:30 [AHanff]
- but the thresholds are not static, they move constantly depending on the amount of data aggregated about an individual
- 14:48:36 [peterswire]
- peterswire has joined #dnt
- 14:48:38 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:48:47 [yianni]
- ...yes it is de-identified or no it is not, along the way there is a risk management regime
- 14:49:05 [dtauerbach]
- q?
- 14:49:05 [yianni]
- ...de-identified right now is a conclusion term for a regime, we do not have that standard right now in dnt
- 14:49:13 [johnsimpson]
- johnsimpson has joined #dnt
- 14:49:15 [yianni]
- ...does anyone else see it differently?
- 14:49:21 [RichLaBarca]
- Zakim, q?
- 14:49:21 [Zakim]
- I see jmayer on the speaker queue
- 14:49:33 [yianni]
- Jeff: more accurate to sa a de-identified data set has been de-identified to a degree
- 14:49:44 [yianni]
- Peter: more or less risk for re-identification
- 14:49:55 [johnsimpson]
- johnsimpson has joined #dnt
- 14:50:05 [johnsimpson]
- q?
- 14:50:16 [dwainber_]
- dwainber_ has joined #dnt
- 14:50:17 [aleecia]
- Thank you kindly, Alan. Report (rather than press coverage) available from: http://www2.idexpertscorp.com/ponemon2012/
- 14:50:18 [yianni]
- David: disagree what is identified in the first place, what's de-identified and when, we will have disagreement
- 14:50:36 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:50:47 [yianni]
- Ed: In a giving setting, you can ideally establish some scientific basis that risk is some ammount, you have a spectrum of risk
- 14:50:56 [yianni]
- ...then you are required to be somewhere on the spectrum
- 14:50:57 [AHanff]
- I think it is important to note that there are no specific types of data which can guarantee non-re-identification, in fact it is never possible to guarantee non re-identification. Data minimisation can make it less likely, but the way these systems work is the data is always increasing not decreasing, which means the risk is continually increasing as the data resolution increases...
- 14:51:14 [yianni]
- ...starting point, scientific basis that data can be exploited with a certain probability
- 14:51:17 [johnsimpson]
- johnsimpson has joined #dnt
- 14:51:28 [efelten__]
- efelten__ has joined #dnt
- 14:51:34 [yianni]
- Ed: risk analysis based on sound scientific analysis, not based on what you have done in the past
- 14:51:46 [yianni]
- Chris: process of de-identification, and de-identified data
- 14:51:54 [johnsimpson]
- johnsimpson has joined #dnt
- 14:52:21 [peterswire_]
- peterswire_ has joined #dnt
- 14:52:21 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:52:25 [yianni]
- Peter: defining what counts as de-identified sounds like normative stuff we are not agreeing on today, we are trying to develop language and ways to talk about things to have that conversation
- 14:52:42 [yianni]
- Chris: we do not know the degree, we just know de-id is a thing, so lets talk about good pratice
- 14:52:54 [hwest]
- q?
- 14:53:08 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:53:26 [yianni]
- Paul: once you accept risk, then need to put tools on tables, what are the general uses
- 14:53:35 [yianni]
- ...then have conversation of what is an acceptable level of risk
- 14:53:37 [rvaneijk]
- I agree with Ed. The goal is relevant. If you want to use the data for aggregation is different than trying to accomplish unlinkability
- 14:53:37 [aleecia]
- q?
- 14:53:38 [johnsimpson]
- johnsimpson has joined #dnt
- 14:53:48 [Chris_IAB]
- Chris_IAB has joined #dnt
- 14:53:53 [aleecia]
- ack jmayer
- 14:53:54 [Wileys]
- AHanff -> I disagree, there are levels of de-identification/minimization that guarantee non-re-identification. For example, highly aggregated data sets or highly sparce raw data can both guarantee non-re-identification.
- 14:54:14 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:54:16 [efelten_]
- efelten_ has joined #dnt
- 14:54:16 [yianni]
- Jonathan: stick to substance, universe of attack slide, third bullet pont
- 14:54:27 [AHanff]
- Wiley, show me the evidence to support that and I will show you a very famous event which shoots it down :)
- 14:54:46 [efelten__]
- efelten__ has joined #dnt
- 14:54:48 [yianni]
- ...reasonably say that risk to some sort of data breach is a lot greater if you leave on street, if only CEO can see with contract
- 14:54:53 [peterswire]
- peterswire has joined #dnt
- 14:55:01 [yianni]
- ...risk is much greater in former, shades of grey are the hard part
- 14:55:07 [Wileys]
- 3 people in the world viewed in the world viewed Yahoo.com at a specific moment in time yesterday - please tell me who those people are?
- 14:55:25 [Wileys]
- Have fun AHanff (that's an example of a highly aggregated result)
- 14:55:31 [peterswire]
- peterswire has joined #dnt
- 14:55:34 [yianni]
- ...very fact specific things, where real world challenges lie, can we reasonably estimate these sorts of attacks: being hacked, laptop out, rogue employee
- 14:55:37 [johnsimpson]
- johnsimpson has joined #dnt
- 14:55:48 [yianni]
- ...if you can predict crime, we all have a much better use of time
- 14:55:51 [justin]
- I don't think we need to argue about really-really-really-really hard to reidentify is technically impossible to reidentify. For purposes of this group, whatever you call that, it will suffice to constitute de-identified data.
- 14:55:59 [yianni]
- Khaled: not predicting crime, but good approaches to manage risk
- 14:56:08 [AHanff]
- Wiley, I am glad you chose a search engine, I refer you to the AOL search data which was used to identify anonymous users within 24 hours of being released for "research purposes"
- 14:56:15 [yianni]
- ...develop a series of cheak list to evaluate point of disclosure
- 14:56:19 [robsherman1]
- robsherman1 has joined #dnt
- 14:56:22 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:56:24 [yianni]
- ...at the end of day, probabilities can be assigned
- 14:56:28 [AHanff]
- far more anonymised than the data Yahoo has in their logs I should add :)
- 14:56:28 [Wileys]
- Thank you Justin - I agree that there arguing absolutes in this case is not helpful - that was my point. :-)
- 14:56:32 [aleecia]
- Justin - I think that's part of the question at hand
- 14:56:49 [Wileys]
- AHanff - completed apple / orange comparison
- 14:56:52 [yianni]
- ...based in part on subjective estimates, but mixtures of different things
- 14:56:53 [Wileys]
- completely
- 14:56:58 [AHanff]
- no it isn't
- 14:56:59 [aleecia]
- The AOL mess was *not* data aggregation
- 14:57:02 [johnsimpson]
- johnsimpson has joined #dnt
- 14:57:13 [yianni]
- ...the overall answer is that you can do it in a defensible way
- 14:57:16 [justin]
- The question at hand is how many "reallys" you need in front of "hard to reidentify"
- 14:57:17 [aleecia]
- Shane is right on this one. The AOL mess was replacing one unique id with another.
- 14:57:18 [Zakim]
- - +1.646.654.aaii
- 14:57:21 [felixwu]
- felixwu has joined #DNT
- 14:57:38 [Wileys]
- AHanff - AOL was row level specific data with consistent unique identifiers - my example was a highly aggregated result. Not the same
- 14:57:43 [efelten_]
- efelten_ has joined #dnt
- 14:57:49 [AHanff]
- 3 people visiting Yahoo yesterday at specific time is not data aggregation either, server logs (probably replicated multiple times for backups across their dsitributed network) provide very exact data
- 14:57:51 [Zakim]
- - +1.202.587.aabb
- 14:57:55 [yianni]
- Khaled: deliberate re-id, inadvertent - recognize someone they know (a relative)
- 14:57:58 [robsherman]
- robsherman has joined #dnt
- 14:58:09 [yianni]
- ...in health care setting, can measure probability that someone knows someone in the database
- 14:58:22 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:58:27 [hwest]
- q?
- 14:58:29 [Mike_Nolet]
- Mike_Nolet has joined #dnt
- 14:58:29 [peterswire]
- peterswire has joined #dnt
- 14:58:46 [yianni]
- ...Ex. breast cancer, we know the prevalence of breast cancer and average number of friend, we can estimate the chance of inadvertent re-identification
- 14:58:55 [peterswire]
- peterswire has joined #dnt
- 14:58:55 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 14:58:58 [robsherman1]
- robsherman1 has joined #dnt
- 14:59:13 [yianni]
- ...Data breach, organization that loses data, we know that 27% of health care providers have one breach per year
- 14:59:23 [aleecia]
- So wait: 27%, or 94%?
- 14:59:29 [yianni]
- ...there are bigger and smaller numbers, but 27% is the most defensive number
- 14:59:39 [efelten__]
- efelten__ has joined #dnt
- 14:59:41 [johnsimpson]
- johnsimpson has joined #dnt
- 14:59:48 [aleecia]
- That's a rather large change of inputs here
- 14:59:49 [jmayer]
- q+
- 14:59:56 [yianni]
- ...we can use the 27% number to assign probability
- 14:59:58 [Wileys]
- What does breach have to do with de-identification? Those breaches are to purposely non-de-identified data.
- 15:00:00 [aleecia]
- But not our problem, actually
- 15:00:15 [yianni]
- ...demonstration attack - adversary once to make a point, targeting high risk person
- 15:00:19 [efelten_]
- efelten_ has joined #dnt
- 15:00:21 [johnsimpson]
- johnsimpson has joined #dnt
- 15:00:22 [yianni]
- ...all you have to do is identify one person
- 15:00:26 [Wileys]
- +1 to Aleecia
- 15:00:44 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:00:49 [peterswire]
- I see jonathan; will call on soon
- 15:01:11 [yianni]
- Khaled: Directly identifying variables, are the fields in HIPAA
- 15:01:16 [aleecia]
- What I've learned: HIPPA's a mess. :-) But we may be able to find useful parts of HIPAA anyway as we sift through this, and it's useful to see what came before.
- 15:01:22 [efelten_]
- efelten_ has joined #dnt
- 15:01:27 [AHanff]
- q+
- 15:01:39 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:01:45 [johnsimpson]
- q?
- 15:01:48 [yianni]
- Peter: people may disagree what is directly identified and a quasi-identifier
- 15:01:55 [yianni]
- Khaled: can be different based on context
- 15:02:10 [peterswire_]
- peterswire_ has joined #dnt
- 15:02:10 [yianni]
- ...with names remove the names, randomize, generate pseudonyms
- 15:02:22 [hwest]
- q?
- 15:02:24 [dtauerbach]
- q?
- 15:02:40 [aleecia]
- Shane -- I realize I don't know what problem you're trying to solve in your dataset. When you talk about not destroying the value, what value is it you're trying to preserve?
- 15:02:41 [Wileys]
- +1 to generating pseudonyms as acceptable de-identification practice :-)
- 15:02:43 [johnsimpson]
- johnsimpson has joined #dnt
- 15:03:04 [yianni]
- Chris: quasi-identifiers, how about rangers, someone fits with a date range, or geo location? Address in HIPAA
- 15:03:13 [efelten__]
- efelten__ has joined #dnt
- 15:03:16 [Wileys]
- Aleecia - typically longitudinal analytical/research value
- 15:03:21 [yianni]
- Khaled: HIPAA safe harbor, dates converted to years
- 15:03:42 [justin]
- e.g., it's useful to know that a particular user went to Y!, then FB, then ESPN, etc.
- 15:03:48 [efelten_]
- efelten_ has joined #dnt
- 15:03:48 [yianni]
- ...when you convert to ranges, you go to expert, you could potentially go to quarter of year or increase to 10 years
- 15:03:52 [jmayer]
- q-
- 15:03:56 [dtauerbach]
- q?
- 15:04:01 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:04:02 [Wileys]
- Aleecia - You've already heard this conversation play out between Ed and I (and a few others) on the public email list. :-)
- 15:04:23 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:04:34 [aleecia]
- Yes, I've heard and read more than I care to :-) But I couldn't remember what value you were looking for, just the disagreements
- 15:04:36 [yianni]
- Khaled: if you doing anlytics treat as quasi identifiers, ex. software testings, you cannot get rid of fields, you just randomize
- 15:04:38 [AHanff]
- my questions isn't on direct dientifiers
- 15:04:54 [rvaneijk]
- q+
- 15:04:55 [AHanff]
- my question is on the 27% figure
- 15:05:01 [peterswire]
- peterswire has joined #dnt
- 15:05:02 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:05:10 [jmayer]
- Aleecia - industry participants have never explained the value they hope to achieve in detail. It's one of the reasons we haven't made progress.
- 15:05:29 [yianni]
- Khaled: in Ontario 220 John Smiths, people have common names.
- 15:05:31 [peterswire]
- peterswire has joined #dnt
- 15:05:35 [Wileys]
- Aleecia, outside of permitted uses, the core value sought is analytical (be able to learn and make changes).
- 15:05:37 [johnsimpson]
- johnsimpson has joined #dnt
- 15:05:42 [yianni]
- Ed: In practice every variable is a quasi identifier?
- 15:05:49 [Wileys]
- Jonathan, I thought we had - not sure what more you're looking for.
- 15:05:52 [efelten__]
- efelten__ has joined #dnt
- 15:06:01 [yianni]
- Khaled: no not really
- 15:06:10 [yianni]
- ...example, blood pressure
- 15:06:12 [aleecia]
- And you're likely to have a question now that can be answered from data 5 years ago? 2 years ago?
- 15:06:15 [rvaneijk]
- would like to bridge to quasi identier to EU perspective... (queue)
- 15:06:26 [efelten_]
- efelten_ has joined #dnt
- 15:06:26 [yianni]
- Ed: blood pressure is better than gender
- 15:06:29 [aleecia]
- My concern is that your answer there is you don't know
- 15:06:38 [yianni]
- Khaled: what is the chance of adversary knowing your blood pressure
- 15:06:38 [aleecia]
- Because, you likely cannot
- 15:06:51 [johnsimpson]
- johnsimpson has joined #dnt
- 15:06:57 [efelten_]
- efelten_ has joined #dnt
- 15:06:58 [yianni]
- Ed: the odds my provider will know my blood pressure is high
- 15:07:00 [robsherman]
- robsherman has joined #dnt
- 15:07:17 [Wileys]
- Aleecia - some researchers at Yahoo! find tremendous value in long-term data as an indicator for near-term data - interesting learnings and value there.
- 15:07:23 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:07:30 [yianni]
- Khaled: hospital can look at, and different controls to stop re-identification
- 15:07:57 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:07:58 [yianni]
- Peter: how likely someone on outside has access to that information and how likely it is to be a match?
- 15:08:05 [robsherman]
- robsherman has joined #dnt
- 15:08:14 [Wileys]
- Aleecia - a simple example is spelling correction - due to the long tail of possible searches it can take many years to build enough data to predict outcomes for rare terms.
- 15:08:18 [rvaneijk]
- is anyone monitoring the queue?
- 15:08:29 [yianni]
- Ed: re-identification is connecting individual to information
- 15:08:41 [aleecia]
- I'm sure there is. But if you pull back to a very simple view, you're suggesting that users ask for more privacy, Y! says they will provide more privacy, and then you will retain and study that user. That's a hard thing to explain to a user who just wants to be left alone.
- 15:08:41 [Wileys]
- Rob, Peter said in IRC that he'd be coming to the queue soon but that was quite awhile ago
- 15:08:47 [johnsimpson]
- johnsimpson has joined #dnt
- 15:08:49 [yianni]
- Khaled: all laws protects identify disclosure, no laws protect attribute disclosure
- 15:09:10 [peterswire_]
- peterswire_ has joined #dnt
- 15:09:17 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:09:24 [efelten__]
- efelten__ has joined #dnt
- 15:09:44 [yianni]
- ...If I release data set and you get attribute disclosure, laws do not prohibit, its just statistics
- 15:09:45 [peterswire_]
- peterswire_ has joined #dnt
- 15:09:48 [vincent]
- Wileys, with the spelling correction example, high level aggregation and short term retention are not enough?
- 15:09:50 [dtauerbach]
- q+
- 15:09:54 [Wileys]
- Aleecia, I'd argue that once the data is deidentified that user is being left alone - we're now just using an unlinkable data point to improve our services. What are our rights in providing the free service? The most paranoid users need not use our services if we fairly call out that we use data in this way. Fair?
- 15:09:56 [efelten_]
- efelten_ has joined #dnt
- 15:09:59 [johnsimpson]
- johnsimpson has joined #dnt
- 15:10:01 [aleecia]
- The spelling example is a nice one, thanks. I'm sure there are many, many others. I just don't know how to get you what you want while still actually honoring DNT
- 15:10:05 [yianni]
- . . .Different governance mechanisms to manage attribute disclosure, but not what we are talking about today
- 15:10:23 [justin]
- WileyS, not sure that's the best example. That's first party data that can be stripped of identifiers immediately without significantly diminishing value (like Google Flu Trends).
- 15:10:24 [johnsimpson]
- johnsimpson has joined #dnt
- 15:10:24 [yianni]
- Ed: arguably the most important aspect of privacy disclosure is not even covered?
- 15:10:43 [Wileys]
- Vincent, not short-term retention (not enough volume on rare terms) - but data minimization and de-identification do accomplish the risk minimization goal
- 15:10:49 [schunter]
- schunter has joined #dnt
- 15:10:52 [yianni]
- Khaled: cannot predice inferences of data sets, but the more you control attribute disclosure you destroy data utility, best to manage with governance
- 15:10:55 [johnsimpson]
- johnsimpson has joined #dnt
- 15:10:56 [AHanff]
- Wileys - no absolutely not fair - first of all what right do you have to label privacy aware users as paranoid - secondly, are you therefore saying people who value privacy should be excluded from digital society?
- 15:11:10 [Wileys]
- Justin, agreed - for that use case, that's a great de-identification approach.
- 15:11:17 [yianni]
- Peter: direct identifiers (phone numbers), quasi identifiers (people on outside can make guesses)
- 15:11:30 [hwest]
- q?
- 15:11:38 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:11:38 [robsherman1]
- robsherman1 has joined #dnt
- 15:11:42 [aleecia]
- I'm pretty sure that saying "we're honoring your request for privacy, but we're still logging everything you did and using it" isn't what users will consider fair. Which, to be clear, matters a lot more than what I think is fair.
- 15:11:44 [johnsimpson]
- q?
- 15:11:47 [Wileys]
- Justin, you do need to keep a few data elements around to help provide context (language, country of search, etc.)
- 15:12:00 [AHanff]
- q-
- 15:12:03 [yianni]
- ...Third thing, attribute disclosure
- 15:12:09 [peterswire_]
- I see the q
- 15:12:20 [Wileys]
- Aleecia, I believe the de-identification removes the "you" in 'everything you did' in your statement
- 15:12:30 [peterswire]
- peterswire has joined #dnt
- 15:12:32 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:12:46 [AHanff]
- what you believbe is not what regulators and the general public believe, which I think is aleecias point
- 15:12:47 [aleecia]
- Which is where you and Ed have gone many rounds, and I do disagree with your conclusions there.
- 15:12:48 [yianni]
- Ed: list of hundred records and I know one is yours, and all have that dianosis, I know the attribute without actually identifying
- 15:12:57 [peterswire]
- peterswire has joined #dnt
- 15:12:57 [justin]
- WileyS, Right, that seems fair, but the re-ID risk seems almost impossibly low.
- 15:12:58 [yianni]
- Joe: that's 100% , others are fuzzier
- 15:13:01 [peterswire]
- attribute disclosure as an important distinction says ed felten
- 15:13:11 [yianni]
- Ed: are we trying to protect against attribute disclosure?
- 15:13:19 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:13:35 [Wileys]
- Justin - agreed, for that use cases - many other use cases aren't as clean cut - that's why its a good point to start there and go deeper.
- 15:13:36 [yianni]
- Khaled: precedence in research world for attribute disclosure: IRB
- 15:13:40 [aleecia]
- I do agree that there are ways to do aggregation to a level as to remove the "you." I do not think that replacing one unique identifier with another unique identifier (hashing) is going to remove the "you"
- 15:13:50 [yianni]
- ...restricts how you do studies, committee oversees
- 15:13:59 [johnsimpson]
- johnsimpson has joined #dnt
- 15:14:03 [rvaneijk]
- q-
- 15:14:05 [Wileys]
- AHanff, could you please source your position? Regulator and general public studies?
- 15:14:09 [yianni]
- ...how mechanism to agree on type of interences you will permit, certain things would be off limits
- 15:14:16 [vincent]
- Wileys, I though Yahoo removes rare term anyway? are there examples where yahoo is actually a third party?
- 15:14:28 [yianni]
- Joe: risks to population of inference versus benefits?
- 15:14:50 [AHanff]
- wileys, regulators, a29wp, eu commission, eu parliamentarians, members of public all people I have worked with and discussed these issues with over the past 6 years
- 15:14:51 [Wileys]
- Aleecia, as long as there is no way back to the original user, then I believe the desired outcome has been met (no more 'you')
- 15:14:59 [robsherman]
- robsherman has joined #dnt
- 15:15:03 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:15:03 [yianni]
- KHaled: no legislative requirement to worry about attribute disclosure
- 15:15:05 [AHanff]
- except you of course :)
- 15:15:21 [Wileys]
- AHanff, very much an area of active disagreement - I agree that one extreme side of that debate equates to your position
- 15:15:30 [yianni]
- Felix: We are concern about inferences of large number of people, but that is different than inferences about one particular person
- 15:15:40 [robsherman1]
- robsherman1 has joined #dnt
- 15:15:40 [efelten__]
- efelten__ has joined #dnt
- 15:15:40 [johnsimpson]
- johnsimpson has joined #dnt
- 15:15:42 [peterswire]
- person is in the group, and can draw inference about them -- attribute disclosure
- 15:15:46 [yianni]
- Khaled: can draw inferences about group memberships, and you belong to that group
- 15:15:53 [Wileys]
- Vincent, Yahoo! runs one of the largest 3rd party ad networks on the internet :-)
- 15:16:07 [AHanff]
- well absolutely every person I have ever discussed these issues with apart from advertisers, is in that "extreme" - which would suggest that the extreme is actually your segment not mine ;)
- 15:16:13 [efelten_]
- efelten_ has joined #dnt
- 15:16:14 [peterswire]
- peterswire has joined #dnt
- 15:16:28 [yianni]
- Felix: IRB - mitigates discriminating against large group, not concern about attribute disclosure to specific individual, even if group is not senstive
- 15:16:41 [peterswire]
- q?
- 15:16:50 [yianni]
- Khaled: depends on type of study and what harm that can happen to those individuals or at the group level
- 15:16:58 [Wileys]
- AHanff - disagree - if everyone agreed with you then no one would be using online service supported by 3rd party advertising
- 15:17:00 [johnsimpson]
- johnsimpson has joined #dnt
- 15:17:00 [robsherman]
- robsherman has joined #dnt
- 15:17:07 [yianni]
- Dan: Quasi-identifiers: why is not everything a quasi identifiers?
- 15:17:19 [efelten__]
- efelten__ has joined #dnt
- 15:17:26 [johnsimpson]
- johnsimpson has joined #dnt
- 15:17:27 [yianni]
- Khaled: have to take into account probability that adversary will have information, some fields there are no probable path to get that information
- 15:17:29 [aleecia]
- Shane - one of the evolutions we're watching is going from "we need to identify a user by name" as what counts for a "you" to "we need to be able to distinguish a single person" such that a GUID counts for a "you"
- 15:17:37 [AHanff]
- Wiley's that is a completely invalid response - the VAST majority of digital citizens have no idea that any of this is going on and when they find out, they are outraged
- 15:17:42 [yianni]
- ...has to be information that is generally available
- 15:17:45 [AHanff]
- there are countless examples to support that
- 15:17:46 [aleecia]
- swapping one GUID for another doesn't actually advance privacy
- 15:17:53 [aleecia]
- that's not fair -
- 15:17:56 [vincent]
- Wileys, glade to hear :) but how is that related to my question? I was asking for examples of analytical/research that need pseudonymous data and where yahoo is involved as a third party, not a search engine
- 15:18:01 [aleecia]
- doesn't advance it by much.
- 15:18:19 [Wileys]
- Aleecia - GUID goes one step further than I'm suggesting as that implies it is still "linkable" in a production system.
- 15:18:27 [yianni]
- Mike: What about the practical, how difficult is that inference? (large number of records)
- 15:18:38 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:18:42 [efelten]
- efelten has joined #dnt
- 15:18:47 [Wileys]
- Vincent, anything and everything to do with being a better ad network.
- 15:18:52 [aleecia]
- That's what I was just correcting. I agree, there is a minor improvement there, but not enough as to practically matter much.
- 15:18:54 [yianni]
- Khaled: depends on fields you have in data base, and how accurate would the inference be, never count against statistics
- 15:19:03 [dwainber_]
- q?
- 15:19:16 [yianni]
- ...attribute disclosure has to be managed, cannot do so technically without destroying data
- 15:19:25 [Wileys]
- AHanff, please reference studies of consumer "outrage"
- 15:19:27 [dtauerbach]
- ack dtauerbach
- 15:19:31 [yianni]
- ...need to have different oversight, evidence so far that is what works
- 15:19:42 [hwest]
- hwest has joined #dnt
- 15:19:43 [dwainber_]
- q?
- 15:19:51 [peterswire]
- peterswire has joined #dnt
- 15:19:56 [hwest]
- hwest has joined #dnt
- 15:20:09 [yianni]
- ...In practice, you do not get all of the fields in data bases (focus on 6-10 fields), for longitudnal data, repeated over multiple visits
- 15:20:28 [yianni]
- ...surveys are more complicated, can deal with database with 100 quasi-identifiers
- 15:20:28 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:20:34 [aleecia]
- Shane - let me do a thought experiment. I think we agree that if I got my hands on the raw server logs at Y! that would contain a set of "you"s, and not be non-identified.
- 15:20:36 [yianni]
- Dan: only need to know one things
- 15:20:44 [AHanff]
- Wileys I don't need too, they are there in the public eye - instagram, path, phorm, nebuad, facebook etc etc etc
- 15:20:49 [AHanff]
- there is a new one just about every week
- 15:20:58 [yianni]
- Khaled: chance of adversary knowing 5 things or 10 things, chance they know all 100 is very low
- 15:21:07 [johnsimpson]
- johnsimpson has joined #dnt
- 15:21:42 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:21:45 [yianni]
- ...choose a number that is defensable (unlikely to know 30 fields)
- 15:22:16 [Wileys]
- Aleecia, depends - if you're suggesting a de-identified data set, you'd find a one-way secret hashed identifier that has been truncated by 50% to purposely create noise (salt). So there is "an" identifier there - but it links to nothing in production systems.
- 15:22:44 [peterswire_]
- peterswire_ has joined #dnt
- 15:22:44 [Wileys]
- AHanff - thank you for the conversation, I have a good sense of your perspective and ability to defend your statements now.
- 15:22:46 [yianni]
- Khaled: three types of risk
- 15:22:51 [johnsimpson]
- johnsimpson has joined #dnt
- 15:23:02 [efelten_]
- efelten_ has joined #dnt
- 15:23:06 [yianni]
- ...are you going to re-identify individual in data set, or are you going to match two databases
- 15:23:11 [AHanff]
- You should talk to your colleague Justin before discounting my arguments, we know each other very well
- 15:23:16 [peterswire]
- peterswire has joined #dnt
- 15:23:17 [yianni]
- ...are you considering maximum risk or average risk (very different)
- 15:23:25 [aleecia]
- If you took that raw data over a year (nothing magic, just picking a specific example) and gave me one half of the data raw, and one half you had transformed by replacing GUIDs with your hashed id, I would be able to map between the raw and the hashed data sets.
- 15:23:29 [yianni]
- ...when talking about demonstration attack worry about mximum risk
- 15:23:44 [yianni]
- ...with inadvertent, you can you use average risk
- 15:23:53 [johnsimpson]
- johnsimpson has joined #dnt
- 15:23:53 [yianni]
- ...what are the appropriate thresholds?
- 15:23:57 [aleecia]
- So when you say there is no link to the production system, I disagree.
- 15:24:00 [Wileys]
- Aleecia - we keep the datasets completely separate with strict access controls, policy, training, etc. - you wouldn't get both.
- 15:24:26 [AHanff]
- oh my, how many times have I head that one and then seen humble pie served lol
- 15:24:26 [yianni]
- ...In practice, the highest risk used is .33 to as low as .05
- 15:24:28 [aleecia]
- A different and possibly useful approach, but they *are* linked.
- 15:24:29 [Wileys]
- But that is our risk to manage since we make the statement the data is deidentified.
- 15:24:30 [AHanff]
- heard*
- 15:24:33 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:24:44 [efelten_]
- efelten_ has joined #dnt
- 15:24:48 [yianni]
- ...No one releases data with a risk higher than .33, increased precedence for other values
- 15:25:05 [johnsimpson]
- johnsimpson has joined #dnt
- 15:25:19 [yianni]
- ...practical range (court cases, regulatory authorities), choose one of four: .33, .2, .09, .05
- 15:25:28 [johnsimpson]
- johnsimpson has joined #dnt
- 15:25:32 [yianni]
- ...no scientific way to choose value, based on past use and changed over time
- 15:25:50 [hwest]
- q?
- 15:25:58 [yianni]
- ....09 and .05 are used in public disclosure
- 15:25:59 [peterswire]
- peterswire has joined #dnt
- 15:26:11 [aleecia]
- There might exist something in there I could reluctantly live with while really not liking. :-) (And there might not.) What I'll put my body on the tracks for is the idea that you could then publicly release that data.
- 15:26:13 [yianni]
- .33 and .2 are for releases to trusted business partner
- 15:26:21 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:26:28 [yianni]
- ...these thresholds are to protect against demonstration attack
- 15:26:30 [Chris_IAB]
- Has this deck (being presented currently) been placed into the W3C record?
- 15:26:42 [justin]
- Chris_IAB, it's in the mail archives.
- 15:26:44 [yianni]
- ...all known attacks have been conducted by academic and media
- 15:26:46 [Wileys]
- Aleecia - we have yet another de-identification process for data we release to researchers - so I absolutely agree with you!
- 15:26:49 [dwainber_]
- q+
- 15:27:06 [yianni]
- ...this is maximum risk, no one has a higher risk of re-identification than the level
- 15:27:07 [johnsimpson]
- johnsimpson has joined #dnt
- 15:27:11 [Wileys]
- Chris, it went out to the public mailing list so its now recorded.
- 15:27:38 [yianni]
- ...In practice, these numbers are conservative: data changes, imperfect data cause errors
- 15:27:55 [yianni]
- ...the numbers used are ceilings on risk, real risk are lower
- 15:27:57 [aleecia]
- Shane - could you describe the de-identification for researchers?
- 15:28:19 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:28:35 [yianni]
- ...Cell sizes: 3, 5, 11, 20
- 15:28:56 [yianni]
- ...the smallest cell sizes (population cell sizes), may be smaller in a sample
- 15:29:14 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:29:20 [Zakim]
- + +1.215.286.aajj
- 15:29:22 [yianni]
- ...If you create a population with cell size of 5, you can take a cample and have a lower cell size
- 15:29:29 [peterswire]
- peterswire has joined #dnt
- 15:29:37 [yianni]
- ...number of individuals with same cell of quasi identifiers
- 15:29:44 [yianni]
- Ed: have to assume quasi identifiers
- 15:29:48 [johnsimpson]
- johnsimpson has joined #dnt
- 15:29:53 [justin]
- q?
- 15:30:01 [peterswire]
- peterswire has joined #dnt
- 15:30:01 [yianni]
- Khaled: only a small subset of variables in data set are quasi identifiers
- 15:30:19 [Wileys]
- Aleecia - it varies based on the nature of the dataset but general attributes are: older data, no identifiers, data sets highly numerized (example, instead of showing actual category of music, we show only a number representing a category but give no information to provide context for that category).
- 15:30:49 [yianni]
- David: with a cell size of 11, there is a 9% probablility of a record being re-identified?
- 15:30:51 [johnsimpson]
- johnsimpson has joined #dnt
- 15:31:10 [yianni]
- ...any single record or one record out of the whole?
- 15:31:11 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:31:36 [moneill2]
- moneill2 has joined #dnt
- 15:31:48 [yianni]
- Jeff: are 9% of the records identifiable? Public databases have 9% chance of re-identification.
- 15:31:57 [johnsimpson]
- johnsimpson has joined #dnt
- 15:32:27 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:32:36 [yianni]
- Peter: there has never been a re-identification of properly de-identified database, but 9% risk?
- 15:32:40 [Zakim]
- +[IPcaller.a]
- 15:32:58 [Wileys]
- +q
- 15:33:02 [peterswire]
- peterswire has joined #dnt
- 15:33:04 [johnsimpson]
- johnsimpson has joined #dnt
- 15:33:07 [yianni]
- Joe: demonstration attack on HHS database de-identified?
- 15:33:33 [peterswire_]
- peterswire_ has joined #dnt
- 15:33:36 [johnsimpson]
- johnsimpson has joined #dnt
- 15:33:36 [moneill2]
- zakim, [ipcaller] is me
- 15:33:36 [Zakim]
- +moneill2; got it
- 15:33:45 [yianni]
- Khaled: the hit rate of re-identification are much lower that those values, never have been able to re-identify at a rate higher than the threshold.
- 15:34:08 [peterswire]
- peterswire has joined #dnt
- 15:34:10 [johnsimpson]
- johnsimpson has joined #dnt
- 15:34:24 [yianni]
- Felix: if you start guessing, you will be right 9% of time, do I care if I know?
- 15:34:37 [peterswire]
- peterswire has joined #dnt
- 15:34:52 [yianni]
- Rob: if I were to guess randomly, I would get some right randomly
- 15:34:54 [johnsimpson]
- johnsimpson has joined #dnt
- 15:35:10 [jmayer]
- q+
- 15:35:20 [johnsimpson]
- johnsimpson has joined #dnt
- 15:35:21 [peterswire_]
- peterswire_ has joined #dnt
- 15:35:27 [yianni]
- Felix: you would not know you are right, but you could guess 9%.
- 15:35:28 [jmayer]
- This is assuming complete l-diversity among the group?
- 15:35:44 [aleecia]
- Shane - that sounds a lot closer to what would be reasonable to provide to users who turn on DNT
- 15:35:49 [johnsimpson]
- johnsimpson has joined #dnt
- 15:35:50 [peterswire]
- peterswire has joined #dnt
- 15:36:10 [hwest]
- ack dwainber_
- 15:36:14 [yianni]
- Khaled: with unlimited resources, they could verify, but expensive
- 15:36:17 [johnsimpson]
- johnsimpson has joined #dnt
- 15:36:53 [yianni]
- Khaled: how do you choose one of four values?
- 15:36:59 [johnsimpson]
- johnsimpson has joined #dnt
- 15:37:15 [yianni]
- ...public you use .05 or .09. If not public, you look at a number of other factors
- 15:37:16 [mnolet]
- mnolet has joined #dnt
- 15:37:26 [yianni]
- ...if company have good controls, not as worried about a rogue employee
- 15:37:34 [johnsimpson]
- johnsimpson has joined #dnt
- 15:37:38 [dtauerbach]
- i think the wifi in the room isn't great, i suspect that's the reason
- 15:37:41 [yianni]
- David: do you look at sensitivity of data?
- 15:37:41 [justin]
- We'll see what we can do during the break.
- 15:37:46 [johnsimpson]
- I am not doing anything.. Don't know why it is happening
- 15:38:12 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:38:17 [yianni]
- Khaled: three things to look at: sensitivity, potential harm, and consent
- 15:38:20 [peterswire]
- peterswire has joined #dnt
- 15:38:32 [yianni]
- ...motives managed with contract
- 15:38:44 [yianni]
- ...with academics and journalist motive to re-identify
- 15:38:44 [peterswire]
- peterswire has joined #dnt
- 15:38:53 [johnsimpson]
- johnsimpson has joined #dnt
- 15:39:02 [yianni]
- ...they are check lists for doing this process.
- 15:39:16 [yianni]
- ...need a repetable process to evaluate all of the factors
- 15:39:36 [yianni]
- Chris: is there ever a scenario that there is zero risk if you release data?
- 15:39:37 [johnsimpson]
- johnsimpson has joined #dnt
- 15:39:46 [yianni]
- Khaled: no
- 15:40:07 [jmayer]
- ...but there are systems that can give rigorous bounds on risk if you release data.
- 15:40:12 [yianni]
- Peter: threat models, why would someone attack here, how capable (money, show your smart)
- 15:40:31 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:40:31 [yianni]
- ...might be commercial reasons, upset employees, think of all the reasons why people might attack
- 15:40:51 [yianni]
- ...why do we care here, what are the harms, are they very sensitive
- 15:41:11 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:41:23 [Wileys]
- Aleecia - I understand that are your perspective of what DNT should mean - as you know I disagree with that position and would interpret a DNT to mean something different (no profiling, not 'no analytics')
- 15:41:24 [yianni]
- ...different values of invasion of privacy: complete browsing history available to FBI may upset some advocates
- 15:41:29 [aleecia]
- I don't think the FBI is the worst thing possible - we operate in an international climate
- 15:41:36 [peterswire]
- peterswire has joined #dnt
- 15:41:44 [yianni]
- ...other specturm: not a big deal, no one would care about browsing, little harm or risk around it
- 15:41:54 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:42:04 [yianni]
- ...assume different views on invasion of privacy.
- 15:42:14 [yianni]
- ...Left slide of slide: mitigating controls
- 15:42:24 [robsherman1]
- robsherman1 has joined #dnt
- 15:42:30 [yianni]
- ...lot of discussion on de-identification have been on publically disclosed databases
- 15:42:32 [johnsimpson]
- johnsimpson has joined #dnt
- 15:42:52 [yianni]
- ...if you post on internet, smart people will attack, that is purely technical protection
- 15:43:11 [yianni]
- ...most of the stuff we are talking about is different: secret databases, set of administrative controls
- 15:43:14 [johnsimpson]
- johnsimpson has joined #dnt
- 15:43:30 [yianni]
- ...privacy act talks about technical, administrative and physical safeguards
- 15:43:36 [johnsimpson]
- johnsimpson has joined #dnt
- 15:43:39 [aleecia]
- Shane - we started this with the idea that DNT would limit collection of data. If we actually did that, I'd relax in other areas. But right now we're talking about no reduction in collection at all. My fear is that we build a system that is deceptive :-)
- 15:43:44 [efelten]
- efelten has joined #dnt
- 15:43:56 [yianni]
- ...that is how a lot of the data protections take place today
- 15:44:14 [hwest]
- q?
- 15:44:17 [aleecia]
- When I talk to users, their main concern is not profiling, it's the data collection itself
- 15:44:17 [johnsimpson]
- johnsimpson has joined #dnt
- 15:44:24 [Wileys]
- Aleecia - as long as we're clear with users and the world on exactly what DNT means and how data will be handled then we won't be deceptive
- 15:44:25 [aleecia]
- And we're not going to help them with that
- 15:44:27 [yianni]
- ...all the different variables would feed into how we think about de-identification
- 15:44:30 [Wileys]
- ack wileys
- 15:44:46 [efelten_]
- efelten_ has joined #dnt
- 15:44:50 [johnsimpson]
- johnsimpson has joined #dnt
- 15:44:59 [peterswire]
- peterswire has joined #dnt
- 15:45:00 [robsherman1]
- q?
- 15:45:02 [aleecia]
- ack jmayer
- 15:45:19 [johnsimpson]
- q?
- 15:45:29 [peterswire]
- peterswire has joined #dnt
- 15:45:30 [yianni]
- Jonathon: factors that could contribute to or mitigate risk, but no way to eliminate risk
- 15:45:32 [aleecia]
- Shane - I agree that being clear is necessary. I disagree that it is sufficient
- 15:45:43 [yianni]
- ...we do have ways to put rigorous bounds on risk develop by computer scientist
- 15:45:57 [AHanff]
- with respect privacy and data protection as not the same thing. Privacy rights don't exist merely to manage risk, there are rights based around people's desire to lead a private life. So it is irrelevant to say that if data is de-identified it is ok because there is no risk, people have a right (under law in Europe and elsewhere) to refuse to have that data collected in the first place.
- 15:46:00 [yianni]
- ...we can determine just how much the best adversary can accomplish
- 15:46:03 [aleecia]
- If we carefully document that DNT does nothing at all, that's not sufficient :-)
- 15:46:09 [johnsimpson]
- johnsimpson has joined #dnt
- 15:46:32 [Wileys]
- AHanff, you're overstating EU law
- 15:46:36 [johnsimpson]
- johnsimpson has joined #dnt
- 15:47:00 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:47:01 [AHanff]
- actually no I am not, would you like me to quote it verbatim, I worked on it so I know it pretty well...
- 15:47:04 [yianni]
- ...techniques for rigorous bounds: differential privacy, body of writing on developing advertising analytics without following users around
- 15:47:12 [Wileys]
- Aleecia, so we agree on being clear, we disagree on the level of data "scrubing" that comes with a DNT signal. Progress... :-)
- 15:47:20 [Zakim]
- +SusanIsrael
- 15:47:26 [yianni]
- ...lets make marginal gains, some are more rigorously oriented
- 15:47:28 [susanisrael]
- susanisrael has joined #dnt
- 15:47:32 [efelten_]
- efelten_ has joined #dnt
- 15:47:35 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:47:35 [justin]
- There was disagreement that we should be clear before?
- 15:47:37 [aleecia]
- I think you're even agreeing that being clear is not all that's needed
- 15:47:48 [jmayer]
- s/lets make/some propose/
- 15:47:57 [Wileys]
- AHanff, please share EU case law that supports your position - not your subjective interpretation of the written law.
- 15:47:59 [jmayer]
- q+
- 15:48:03 [jmayer]
- q-
- 15:48:09 [Wileys]
- Aleecia - agreed :-)
- 15:48:09 [yianni]
- Khaled: the managing risk slide is operational
- 15:48:27 [Zakim]
- -[IPcaller.a]
- 15:48:31 [johnsimpson]
- johnsimpson has joined #dnt
- 15:48:34 [peterswire_]
- peterswire_ has joined #dnt
- 15:48:41 [aleecia]
- breakfast time, yay
- 15:48:42 [Zakim]
- - +1.215.286.aajj
- 15:48:52 [Zakim]
- -Aleecia
- 15:48:56 [Zakim]
- -vincent
- 15:49:00 [johnsimpson]
- johnsimpson has joined #dnt
- 15:49:03 [peterswire]
- peterswire has joined #dnt
- 15:49:23 [schunter]
- schunter has joined #dnt
- 15:49:46 [dwainberg]
- dwainberg has joined #dnt
- 15:50:00 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:50:23 [robsherman1]
- robsherman1 has joined #dnt
- 15:50:34 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:51:02 [Zakim]
- -rvaneijk
- 15:51:04 [robsherman]
- robsherman has joined #dnt
- 15:51:04 [johnsimpson]
- johnsimpson has joined #dnt
- 15:51:45 [johnsimpson]
- johnsimpson has joined #dnt
- 15:51:50 [peterswire]
- peterswire has joined #dnt
- 15:52:16 [johnsimpson]
- johnsimpson has joined #dnt
- 15:52:16 [susanisrael]
- zakim, aajj is susanisrael
- 15:52:16 [Zakim]
- sorry, susanisrael, I do not recognize a party named 'aajj'
- 15:52:17 [peterswire]
- peterswire has joined #dnt
- 15:52:40 [susanisrael]
- zakim, 215 286 aajj is susanisrael
- 15:52:40 [Zakim]
- I don't understand '215 286 aajj is susanisrael', susanisrael
- 15:53:02 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:53:23 [susanisrael]
- npdoty can you help me advise zakim that my phone number is 215 286 aajj
- 15:53:42 [Zakim]
- - +1.646.722.aagg
- 15:54:02 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:54:25 [johnsimpson__]
- johnsimpson__ has joined #dnt
- 15:54:55 [johnsimpson]
- johnsimpson has joined #dnt
- 15:55:02 [peterswire]
- peterswire has joined #dnt
- 15:55:12 [Zakim]
- +[IPcaller]
- 15:55:23 [johnsimpson]
- johnsimpson has joined #dnt
- 15:55:32 [moneill2]
- zakim, [IPCaller] is me
- 15:55:32 [Zakim]
- +moneill2; got it
- 15:55:52 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:55:56 [susanisrael]
- zakim, [215 286 aajj] is me
- 15:55:56 [Zakim]
- I don't understand '[215 286 aajj] is me', susanisrael
- 15:55:58 [robsherman]
- robsherman has joined #dnt
- 15:56:00 [johnsimpson_]
- test
- 15:56:09 [efelten]
- efelten has joined #dnt
- 15:56:39 [Zakim]
- -SusanIsrael
- 15:56:40 [johnsimpson_]
- Shane, problem was the network we were on. Changed network.
- 15:56:53 [efelten_]
- efelten_ has joined #dnt
- 15:56:54 [Paul]
- Paul has joined #DNT
- 15:56:57 [robsherman1]
- robsherman1 has joined #dnt
- 15:56:58 [johnsimpson_]
- hope this is stediar
- 15:56:58 [susanisrael]
- npdoty: can you help me communicate with zakim about my phone number? i don't seem to have the syntax right.
- 15:57:18 [johnsimpson__]
- johnsimpson__ has joined #dnt
- 15:57:30 [Wileys]
- John - that didn't seem to do the trick
- 15:57:42 [vincent]
- vincent has joined #dnt
- 15:57:58 [Wileys]
- Hard to follow anything on IRC today with so many connect/disconnect events being thrown up.
- 15:58:09 [johnsimpson_]
- johnsimpson_ has joined #dnt
- 15:58:36 [peterswire_]
- peterswire_ has joined #dnt
- 15:58:43 [robsherman]
- robsherman has joined #dnt
- 15:58:49 [Zakim]
- +??P24
- 15:59:09 [peterswire_]
- peterswire_ has joined #dnt
- 15:59:10 [Zakim]
- +rvaneijk
- 15:59:16 [dwainber_]
- dwainber_ has joined #dnt
- 15:59:33 [vincent]
- zakim, ??P24 is vincent
- 15:59:33 [Zakim]
- +vincent; got it
- 15:59:35 [yianni]
- Peter: Mike had comment on last slide
- 15:59:40 [Zakim]
- +SusanIsrael
- 15:59:44 [JoeHallCDT]
- ok, how do I scribe nick me?
- 15:59:56 [yianni]
- Scribe: JoeHallCDT
- 15:59:58 [justin]
- scribenick: joehallcdt
- 16:00:06 [robsherman]
- robsherman has joined #dnt
- 16:00:14 [moneill2]
- cookies are not anonymous, they pinpoint an individual/device
- 16:00:18 [Chris_IAB]
- Chris_IAB has joined #dnt
- 16:00:19 [hwest]
- hwest has joined #dnt
- 16:00:36 [JoeHallCDT]
- scribe: JoeHallCDT
- 16:00:49 [jeffwilson]
- jeffwilson has joined #dnt
- 16:00:52 [robsherman1]
- robsherman1 has joined #dnt
- 16:01:55 [robsherman]
- robsherman has joined #dnt
- 16:02:00 [peterswire]
- peterswire has joined #dnt
- 16:02:38 [JoeHallCDT]
- q?
- 16:02:52 [JoeHallCDT]
- Peter: we're not going to debate how strict a standard is
- 16:02:59 [JoeHallCDT]
- … let's imagine a three-step model
- 16:03:20 [JoeHallCDT]
- … super strict standard for De-ID, a middle ground and no de-ID
- 16:03:27 [justin]
- Speaker was Mike Nolet from AppNexus
- 16:03:34 [mnolet]
- mnolet has joined #dnt
- 16:03:39 [JoeHallCDT]
- thx
- 16:04:07 [dwainber_]
- q+
- 16:04:26 [felixwu]
- felixwu has joined #DNT
- 16:04:32 [JoeHallCDT]
- … there are choices for businesses to give up a de-ID'd approach if the cost is too high
- 16:04:45 [JoeHallCDT]
- Mike Nolet: it's not as much cost as competition
- 16:04:55 [JoeHallCDT]
- … some companies are getting into thrid party advertising
- 16:05:50 [moneill2]
- identifiers in cookies are PII in Europe
- 16:06:06 [jmayer]
- q+
- 16:06:09 [JoeHallCDT]
- Mark Groman: truly believe that the standard we're discussing that will have unintended consequences
- 16:06:20 [JoeHallCDT]
- … some of the things we propose may have a net-negative impact on privacy
- 16:06:24 [susanisrael]
- *Joehallcdt if you want me to scribe let me know
- 16:06:25 [jmayer]
- So, about that de-identification topic...
- 16:06:42 [JoeHallCDT]
- … the notion that opt-in consent is all that's needed to over-collect
- 16:07:07 [JoeHallCDT]
- Peter: we did start with a discussion of incentives for de-ID
- 16:07:18 [JoeHallCDT]
- … one was compliance with NAI, etc, codes
- 16:07:20 [moneill2]
- You have to say what data you gather and what you intend to do with it to get consent
- 16:07:27 [justin]
- The FTC sees cookies and IP addresses as "personal information" as well. All information is personal, but some is more personal than others.
- 16:07:29 [robsherman1]
- robsherman1 has joined #dnt
- 16:07:30 [dwainberg]
- dwainberg has joined #dnt
- 16:08:06 [justin]
- There is a value in incentivizing companies to keep data at pseudonymous instead of real-name idenifiers.
- 16:08:08 [JoeHallCDT]
- gills (?): if we follow de-ID as a privacy protective tool, we can't say that a cookie is PII
- 16:08:27 [efelten]
- There is no notion of PII in this standard.
- 16:08:31 [justin]
- But this is somewhat off topic.
- 16:08:40 [JoeHallCDT]
- … you've created an incentive to create PII databases
- 16:09:10 [JoeHallCDT]
- … PII should matter, if you value de-ID as a way to break the link to the individual
- 16:09:19 [JoeHallCDT]
- Chris Mejia: agrees with Jonathan!
- 16:09:36 [jmayer]
- q-
- 16:09:44 [dtauerbach]
- dtauerbach has joined #dnt
- 16:09:47 [JoeHallCDT]
- … we are supposed to do good practices for de-ID and I want to do that.
- 16:09:51 [JoeHallCDT]
- q?
- 16:10:02 [susanisrael]
- *joehallcdt you had marc groman and paul glist speaking before chris iab
- 16:10:17 [JoeHallCDT]
- Peter: has not had that focus, wants to have comon language
- 16:11:12 [susanisrael]
- sribenick: susanisrael
- 16:11:32 [susanisrael]
- peter swire: let's start talking about hashing
- 16:11:50 [justin]
- DNT was proposed as a solution to address psuedonymous third party tracking. I don't think we're going to walk away from that idea at this point.
- 16:11:58 [susanisrael]
- khaled: understand that hashing was discussed as a way to protect against cookies or other unique identifiers
- 16:12:27 [susanisrael]
- ...if you are hashing without salting, can easily be broken and recover say ss#, so plain hashing not recommended
- 16:13:01 [Wileys]
- This makes sense for sharing data externally but not for internal storage of data
- 16:13:01 [susanisrael]
- ...if you have [something] that can be added to your value....but challenge for distributed system with salt, you don't want to distribute salt to everyone
- 16:13:16 [susanisrael]
- ....have to come up with protocol where salting happens at central location.
- 16:13:29 [susanisrael]
- [someone] need to know who can hash
- 16:13:36 [susanisrael]
- [who was speaking?]
- 16:13:53 [dtauerbach]
- efelten
- 16:13:54 [efelten]
- s/[someone]/efelten/
- 16:13:58 [susanisrael]
- khaled: one alternative is to use public keys that you can distribute and have encrypted value done say within browser
- 16:14:05 [susanisrael]
- ...instead of hashing you encrypt
- 16:14:19 [Zakim]
- +Aleecia
- 16:14:32 [susanisrael]
- ...other consideration even with salted values is that you can have frequency attacks...certain names more common...can guess.
- 16:15:05 [susanisrael]
- ....so can recover names by looking at frequency. even ss#s. so salting not adequate where there is frequency distribution
- 16:15:32 [susanisrael]
- .....with encryption [?] would do it differently each time, frequency not an issue
- 16:15:59 [peterswire]
- peterswire has joined #dnt
- 16:16:05 [susanisrael]
- .....to the extent its a problem certain fields may be too long to process or transmit [with encryption?].....
- 16:16:37 [susanisrael]
- ...so for example you can get encrypted ss# with same character set as actual ss# so you avoid long strings. sometimes practical advantage
- 16:17:17 [susanisrael]
- peter swire: have some observations: lots of hashing in commercial ecosystem. heard yesterday at hhs that unsalted ss# not ok bc easy to do dictionary attack
- 16:17:25 [Wileys]
- Good resource on the technical and security details in this area: http://crackstation.net/hashing-security.htm
- 16:17:33 [susanisrael]
- .....turning to ed, you have expressed cautions re: hashing.
- 16:17:50 [susanisrael]
- ed felten: different scenarios in which hashing fails. doesn't do much without salt.
- 16:18:19 [susanisrael]
- ...even with salted hash someone who knows the salt can generally break it or someone who can cause salted function to be evaluated on their behalf.
- 16:18:41 [susanisrael]
- ....gives example where you ask one server to compute hash on another. [simplified]
- 16:18:47 [rvaneijk]
- A hash turns user data into a pseudonymous identifier
- 16:19:10 [susanisrael]
- ...if multiple records contain same salted hash value they can be linked. need to use probablistic encryption or something like that
- 16:19:18 [susanisrael]
- chris iab: there is hashing then access to salt
- 16:19:48 [Wileys]
- We should discuss keyed hashes as being superior to salted hashes (although in the same universe)
- 16:19:52 [susanisrael]
- ed felten: not just access to salt. if you have value hash then you can do same dictionary attacks as if you knew salt so not enough to ask if you know salt
- 16:20:17 [susanisrael]
- ed felten: can make sophisticated argument .....rare case where hashing is secure
- 16:20:36 [susanisrael]
- peter swire: assume people will use hashing and will be long enough not to be broken
- 16:20:43 [susanisrael]
- chris iab: how reliable?
- 16:20:44 [Wileys]
- One-way hashes don't allow direct reverse identification by themselves - access to the salt/key allows someone to perform a dictionary attack
- 16:21:03 [susanisrael]
- ed felten: if you can have hash computed for you just the same as if you can break it
- 16:21:06 [Wileys]
- Requires access to the original raw data (if it still exists) and the salt/key
- 16:21:08 [susanisrael]
- what are we hashing?
- 16:21:30 [rvaneijk]
- In the EU organizational measures are not enough to make hashed values of user data anonymous.
- 16:21:38 [susanisrael]
- someone [who is speaking?]: will use admin controls with hashing
- 16:21:59 [susanisrael]
- ed: if you can make up inputs and ask people to hash them that is just as good as if you had the salt
- 16:22:11 [susanisrael]
- someone: but that is form knowing input and output
- 16:22:34 [Wileys]
- Rob, if paired with administrative, technical, and policy/educational, then keyed hashing is considered enough to reach the point of "likely reasonable" to no longer be personal data (de-identified), correct?
- 16:22:47 [susanisrael]
- ed felten: what if you take value with identifier and cookie, ask someone to make salted hash, don't tell you the salt, but put it back in your data base
- 16:22:51 [Wileys]
- Rob, add "safeguards" after "policy/educational"
- 16:23:12 [susanisrael]
- someone: but that assumes you know input and output
- 16:23:25 [rvaneijk]
- shane: if you throw away the key, then yes. TomTom was a nice example.
- 16:23:38 [susanisrael]
- peter swire: i have observed lots of hashing in ad world. for most sophisticated attackers they may be able to break them
- 16:24:03 [susanisrael]
- ...we will eventually have to come to view of how we will discuss all this. so common hashes might be of email address? cookie value?
- 16:24:41 [Wileys]
- Rob, if you keep the key in a safeguarded location, limited access, technical controls, etc. - I believe you still reach the bar per the A29WP Option from April 2011.
- 16:24:45 [susanisrael]
- peter swire: let's take email addresses. if my email is hashed using proper salt, and someone gets output, they can eventually figure out hash and salt
- 16:24:58 [Wileys]
- Rob, or was that 2010 - I'll look it up.
- 16:25:12 [susanisrael]
- ed felten: can ask that hash be done on known value, and record hashed value in database then can correllate
- 16:25:25 [rvaneijk]
- Well, that safeguard is a very high bar, ie a notary, who has a legal obligation to not disclose
- 16:25:29 [susanisrael]
- [someone] qu is from whom you are trying to secure the data
- 16:25:38 [Wileys]
- Rob, I agree throwing away the key is an absolute end-point, but I'm aiming for the 'likely reasonable' standard
- 16:25:44 [susanisrael]
- is it protection at all wrt a particular party that has particular data
- 16:26:35 [susanisrael]
- david w. not hashing for hashing's sake. need to figure out from whom you are trying to protect the data from, and tailor approach to that
- 16:26:38 [rvaneijk]
- Shane, the point is, that if I should not be able to calculate a hash after let's say a year, and expect the same output, such that users can be re-identified.
- 16:26:53 [rvaneijk]
- s/if/_/
- 16:27:09 [susanisrael]
- khaled: even if we go back to previous model using hash or salted hash, probability of recovering original value is 1, certain
- 16:27:30 [Wileys]
- Rob, why? As long as the original key is secure, then there is very low risk of user re-identification
- 16:27:31 [robsherman]
- robsherman has joined #dnt
- 16:27:34 [aleecia]
- Rob, is that an art 29 position, or your own? (Both are valuable, I'm just trying to get which is what)
- 16:27:36 [susanisrael]
- chris iab: assuming you have access to data in first place, right?
- 16:27:55 [susanisrael]
- khaled: so final result at end of all risk assessment is still high, still has to be further mitigated
- 16:28:05 [Wileys]
- Aleecia, the A29WP position in the opinion paper is not as strict as Rob is stating (in my opinion)
- 16:28:06 [vincent]
- Wileys, in the DNT case, are we just considering hashing cookie IDs? if so, I'm not sure it brings any real protection: cookie IDs are opaque anyway
- 16:28:09 [susanisrael]
- peter swire: let's see why people might feel strongly
- 16:28:41 [susanisrael]
- ...if db is publicly accessible and people can get access then probability of breaking is higher, but david and chris are saying you can limit access
- 16:29:09 [Wileys]
- Vincent, keyed hashing coupled with other measures, as well as the cessation of certain business activities (profiling), does meet the goals of DNT in my opinion.
- 16:29:15 [susanisrael]
- .[someone]..but ed is saying if you have access to hash and salt -if disconnected doesn't work
- 16:29:28 [yianni]
- Jeff Wilson
- 16:29:46 [peterswire_]
- peterswire_ has joined #dnt
- 16:29:56 [susanisrael]
- david w: i think what we are talking about is that using some form of oneway hash was a useful method of de-identifying
- 16:30:21 [susanisrael]
- khaled: depends. must be done in such a way that you can protect against attacks ed is describing which are quite trivial
- 16:30:26 [vincent]
- Wileys, well that's not my question :). What type of protection does it bring with regard to the risk of re-identifiication?
- 16:30:37 [susanisrael]
- david and khaled back and forth a bit
- 16:30:55 [yianni]
- q?
- 16:30:56 [rvaneijk]
- Shane, let's have this discussion in Boston
- 16:31:06 [susanisrael]
- khaled: probability that someone attempts to attack, then that they can break hash
- 16:31:13 [robsherman1]
- robsherman1 has joined #dnt
- 16:31:25 [Wileys]
- Vincent, as long as the original data is not accessible and neither is the key to the hash, then there is very low risk of re-identification (depending on the details housed within the de-identified dataset)
- 16:31:29 [rvaneijk]
- Aleecia: formal position within this DNT debate
- 16:31:32 [susanisrael]
- ...if low probability of attempt ....hard to make that case
- 16:31:32 [dwainber_]
- dwainber_ has joined #dnt
- 16:31:50 [susanisrael]
- [someone] isn't probability of reidentification only 1 if you have access to the computer?
- 16:31:51 [Wileys]
- Rob - agreed - looking forward to it (the conversation that is, not the horrible weather we're likely to encounter in Boston :-) )
- 16:32:05 [rvaneijk]
- :)
- 16:32:08 [susanisrael]
- khaled: depends on workflow. may be hashed then go to central db
- 16:32:14 [yianni]
- s/someone/Mike Nolet
- 16:32:32 [aleecia]
- We need to recruit a new WG member with a big office in the Florida Keys
- 16:32:44 [Wileys]
- +1 to Aleecia!
- 16:32:47 [peterswire_]
- peterswire_ has joined #dnt
- 16:32:53 [aleecia]
- Rob - thanks, that's exactly what I was asking, thank you
- 16:33:14 [susanisrael]
- mike nolet : i have unique cookie id on ed. need to get totally random integer, if someone is snooping on all net traffic or has access to pc or net connection
- 16:33:17 [peterswire]
- peterswire has joined #dnt
- 16:33:34 [vincent]
- Wileys, how is the re-identification risk lower with the hased cookie ID rather than with the unhashed cookie ID? (that's actually what's discussed right now)
- 16:33:43 [susanisrael]
- peter swire: is there a scenario where hashing matters? mike was saying you have to have access to cookie
- 16:33:47 [Chris_IAB]
- Chris_IAB has joined #dnt
- 16:34:12 [susanisrael]
- chris iab: does it matter if transferring to another party or internally?
- 16:34:18 [susanisrael]
- peter swire: we are learning something
- 16:34:20 [Chris_IAB]
- this was the equation put on the board: pr (re-identification) = pr (re-id/attempt) x pr (attempt)
- 16:35:00 [susanisrael]
- jeff? there is industry practice where you hash, independent party enriches by matching, and there is permission to share 7 matches
- 16:35:08 [rvaneijk]
- Cookie exchanges are interesting in this context..
- 16:35:08 [Wileys]
- Vincent, its lower only if coupled with other factors (multi-factor test) such as seclusion of the key/salt and removal of access/existance from the original dataset.
- 16:35:12 [susanisrael]
- ....common identifier can be hashed
- 16:35:21 [Wileys]
- +q
- 16:35:25 [susanisrael]
- peter: so that is one scenario, do you see usefulness ed?
- 16:35:42 [dwainber_]
- q?
- 16:35:47 [aleecia]
- ack Wileys
- 16:35:51 [dwainber_]
- ack dwainber_
- 16:35:53 [robsherman]
- robsherman has joined #dnt
- 16:36:03 [susanisrael]
- shane: the core purpose at yahoo for hashing/keys, is to disconnect that data from use in actual production systems
- 16:36:11 [justin]
- "destroy"?
- 16:36:36 [peterswire_]
- peterswire_ has joined #dnt
- 16:36:37 [susanisrael]
- ...destroys possibility for profiling, targeting. can not be used to modify users experience. but still useful for analysis..
- 16:36:47 [susanisrael]
- peter swire: ed or dan does that make sense to you?
- 16:36:52 [rvaneijk]
- WileyS, right. the goal is to break the re-identification
- 16:36:57 [susanisrael]
- dan: i am confused by that
- 16:37:06 [aleecia]
- sigh
- 16:37:32 [susanisrael]
- shane: these are always multifactor tests. your purpose in hashing is to not do this. once you add multifactors, it serves purpose
- 16:37:46 [susanisrael]
- [someone] if you can get hash function or key it doesn't matter
- 16:37:50 [robsherman1]
- robsherman1 has joined #dnt
- 16:37:56 [susanisrael]
- shane: good luck. we make key very inaccessible
- 16:38:07 [yianni]
- s/someone/Joe Hall
- 16:38:07 [susanisrael]
- ed felten: who knows keys?
- 16:38:29 [vincent]
- vincent has joined #dnt
- 16:38:34 [susanisrael]
- shane: keys are very large. systems that are set up to de-identify know key, but human connection to key is not allowed
- 16:38:58 [susanisrael]
- felix: so if i understand correctly usefulness is to separate one part of company to another?
- 16:39:00 [Chris_IAB]
- dwainberg, in case you missed it, "the key is on a post-it on Shane's desk" (that's a JOKE, btw.. lol)
- 16:39:14 [susanisrael]
- shane: really to separate info from another context
- 16:39:25 [aleecia]
- Chris - love it!
- 16:39:29 [susanisrael]
- felix: 2 people (one w key) are separate
- 16:39:44 [susanisrael]
- shane: isolation of key is not only factor.
- 16:39:59 [peterswire]
- peterswire has joined #dnt
- 16:40:05 [Wileys]
- Chris, LOL
- 16:40:18 [johnsimpson]
- q?
- 16:40:20 [susanisrael]
- peter swire: i think its relevant bc hashing and its uses have been talked about in a lot of context. people in ad industry at one end of table, others at other
- 16:40:31 [dwainberg]
- dwainberg has joined #dnt
- 16:40:49 [susanisrael]
- khaled: if that separation is strong and defensible, then at least under hipaa that would be ok. if you have good procedures for controlling access to key that's ok
- 16:40:51 [Wileys]
- Yay for Yahoo!, we're good by HIPPA standards (too bad we don't handle PHI :-) )
- 16:41:04 [susanisrael]
- ....scenarios where regulators have accepted that
- 16:41:16 [susanisrael]
- dan auerbach: rotating salt helps a lot
- 16:41:21 [Chris_IAB]
- rotating salt is a good practice
- 16:41:40 [aleecia]
- rotating salts kills everything shane wants out of the data
- 16:41:59 [Wileys]
- Aleecia - we do rotate, but not daily.
- 16:42:03 [susanisrael]
- david wainberg: we are saying its not binary, hashing is not perfect, question is how hard does it make it? how hard do we want to make it? what is the context/data involved?
- 16:42:09 [justin]
- Rotating salts kills longitudinal view, which is a feature or bug depending on how you look at it.
- 16:42:14 [Chris_IAB]
- aleecia, it means Yahoo buys LOTS of post-its (again, marked as a JOKE folks :)
- 16:42:16 [susanisrael]
- someone: sounds like its trivial to break it
- 16:42:22 [aleecia]
- I go with feature, Shane goes with bug :-)
- 16:42:26 [susanisrael]
- david wainberg: what do you mean by trivial
- 16:42:30 [yianni]
- s/someone/Joe
- 16:42:39 [rvaneijk]
- what really hard means also depends on the purpose, not only on the context
- 16:42:42 [Wileys]
- Aleecia, :-)
- 16:42:45 [susanisrael]
- david w: depends on combination of technical and administrative
- 16:42:51 [aleecia]
- buy stock in 3M, folks! you heard it here first.
- 16:43:14 [peterswire_]
- peterswire_ has joined #dnt
- 16:43:16 [susanisrael]
- someone: shane is describing intentional inadvertent viewing of data
- 16:43:38 [yianni]
- s/someone/mike nolet
- 16:43:46 [susanisrael]
- shane: purpose is more than just personal protection--disconnect data from operational systems so utility limited and therefore privacy is increased
- 16:44:28 [susanisrael]
- jeff: everyone agrees with ed or should. if you have access to salt, it doesn't work. but if we say salting/hashing does not work, then we are saying passwords on internet don't work
- 16:44:46 [susanisrael]
- ....if you have access to hash and salt you could access hashed stored passwords
- 16:44:55 [aleecia]
- daily rotated salts is at least a step forward. but having it change only when the janitor tosses out the post its by mistake once a year isn't going to make me happy :-)
- 16:44:57 [jmayer]
- q+
- 16:45:10 [susanisrael]
- chris iab: what would the alternative? put all raw data out on internet? or not collect any data?
- 16:45:16 [vincent]
- WIleys, would not a request like "SELECT User from DB where user visited site1,site2,...,siteN" recreate the link that the hash just deleted?
- 16:45:24 [Wileys]
- Aleecia - its a bit more formal/regular than that. Note - I don't use post-its :-)
- 16:45:26 [susanisrael]
- ed felten: i have not heard an example here where hashing really helps
- 16:46:00 [susanisrael]
- peter swire: i spent 2 years working on crypto policy. if system broken it doesn't work, but in practice it works 99 percent of the time
- 16:46:14 [Wileys]
- Vincent, the hash was not meant to hide activity but rather to disconnect identity from operational systems.
- 16:46:22 [susanisrael]
- ...i have heard that there are attacks that could be made, but i have heard about administrative controls
- 16:46:23 [peterswire]
- peterswire has joined #dnt
- 16:46:46 [rvaneijk]
- Passwords are used to verify an identity, based on a shared secret, which is a totally different mechanism
- 16:46:48 [peterswire]
- peterswire has joined #dnt
- 16:46:51 [susanisrael]
- ....all those seem like things in real world where protection is more than zero though might still be subject to some kinds of attacks
- 16:47:04 [susanisrael]
- ed felten: no because these attacks are trivial
- 16:47:17 [jmayer]
- q+
- 16:47:30 [vincent]
- Wileys, yes but the history of websites visited by a user would help to reconnect the different operational system (the list of website is used as a unique identifier)
- 16:47:34 [susanisrael]
- si question: do these attacks in fact happen in companies all the time in the real world?
- 16:47:39 [peterswire]
- jonathan -- I see you;
- 16:47:42 [aleecia]
- Shane - 3M weeps
- 16:48:24 [Wileys]
- Vincent, agreed - so some URL cleansing helps remove this issue - or in the case of searches, attempts to cleanse personal data in queries helps.
- 16:48:24 [susanisrael]
- ed felten: if we say we will separate our data base into 2 pieces and only one is hashed, whatever analysis someone wants to do they just need to do one more step
- 16:48:28 [yianni]
- ack jmayer
- 16:48:32 [susanisrael]
- chris iab: but they would have to have access right?
- 16:48:37 [dtauerbach]
- q?
- 16:49:02 [Wileys]
- Vicent, my approach can't guarantee 100% certainty but does meet the "very low risk" bar - or in the EU context, the "likely reasonable" bar.
- 16:49:13 [susanisrael]
- jmayer: concrete example: ad company i studied tried to use hashing to do follow on analysis. user had id cookie. then had another cookie. "anonymous"
- 16:49:41 [peterswire_]
- peterswire_ has joined #dnt
- 16:50:01 [justin]
- If we the spec allows for a 30 day short-term retention period, presumably the group would be OK if the salts were rotated at least every 30 days.
- 16:50:01 [susanisrael]
- ...idea was that anonymous one was hash with secret salt and would be used for long term things and more private but susceptible to same attacks because you could always correlate with original cookie
- 16:50:06 [peterswire]
- peterswire has joined #dnt
- 16:50:49 [susanisrael]
- peter swire: jmayer you were giving example, and jeff and crhis had questions or comments
- 16:51:02 [susanisrael]
- chris iab: you described a bad practice
- 16:51:30 [David]
- David has joined #dnt
- 16:51:42 [susanisrael]
- ...you don't throw out baby with bath water. Just bc there is one bad practice doesn't mean all hashing worthless
- 16:51:51 [vincent]
- Wileys, I don't the "very low risk" bar well enough :) just trying to see what is the type of threat that cookie hashing address
- 16:52:00 [efelten]
- We have yet to hear an example where hashing makes any attack appreciably more difficult.
- 16:52:27 [David_MacMillan_]
- David_MacMillan_ has joined #dnt
- 16:52:33 [Wileys]
- Justin, the spec should not be prescriptive on timeframes and rather, much like HIPPA, should focus on acceptable risk thresholds.
- 16:52:36 [susanisrael]
- jmayer: agree there are better engineering practices; but pretty predictable failures; have heard things like figuring out salt or doing dictionary attacks,
- 16:53:00 [susanisrael]
- ...but these are not only attacks. there are enormous re-identifiability problems.
- 16:53:03 [Wileys]
- Vincent, you don't "?" the "very low risk" bar well enough?
- 16:53:04 [peterswire]
- peterswire has joined #dnt
- 16:53:28 [rvaneijk]
- Ed, hashing makes sense, if you take out information such that enough collissions appear, that meat a k-anonimity bar.
- 16:53:30 [peterswire]
- peterswire has joined #dnt
- 16:53:35 [aleecia]
- Justin, I think you're saying: if we're going to have 30 (or more) days for people to take first-logged data to figure out what they have and if they're first or third party while collecting, then we should also be ok with a company holding all data indefinitely, so long as they rotate every 30 days.
- 16:53:39 [rvaneijk]
- s/meat/meet/
- 16:53:56 [dtauerbach]
- I think the point is that in all the examples so far, hashing is purely a method of operational control, and it is not a great one given engineering challenges
- 16:53:57 [vincent]
- Wileys, I don't "know" it well enough, sorry
- 16:53:58 [susanisrael]
- ....i think we have an error in the way some people are approaching this. you have fact pattern, try to apply approach. start with specific problem and way to solve and ask if hashing get you there...
- 16:54:07 [dtauerbach]
- e.g. you can't hvae an oracle and that is hard to control in practice
- 16:54:11 [Zakim]
- - +1.631.803.aacc
- 16:54:33 [susanisrael]
- ....ed is not asking straight up;/down vote on metaphysics of hashing...and ihave not heard concrete problem and proposed hashing solution that solves the problem
- 16:54:43 [justin]
- aleecia, well, we've had different interpretations of the point of the short-term period over time, but basically yes.
- 16:54:50 [Wileys]
- Ed, if a dataset were breached in isolation (a single data table), wouldn't you agree that hashing of identifiers in that table (depending on what additional feeds were available) would help deter re-identification?
- 16:54:55 [susanisrael]
- peter swire: can industry explain use case where hasing helps?
- 16:55:33 [susanisrael]
- david wainberg: can we identify risk thta ed and jonathan are concerned about it and see if that can be addressed
- 16:55:37 [aleecia]
- Justin - ok. So I'm ok with a single short period, but may not be ok with infinite retention even with rotation
- 16:56:07 [Zakim]
- +DAvid
- 16:56:10 [jmayer]
- q+
- 16:56:11 [susanisrael]
- felix? : sounds like we are concerned about internal controls. valuable if you have company where not everyone or no one is careless or malicious
- 16:56:26 [efelten]
- What I'm looking for is a specific example--a specific use of hashing, and a specific attack that is made more difficult because of the use of hashing.
- 16:56:32 [susanisrael]
- jeff: 3 scenarios where hashing helps. 1: passwords
- 16:56:36 [peterswire]
- peterswire has joined #dnt
- 16:56:48 [susanisrael]
- 2. if you want to do research internally in large company.....
- 16:56:50 [dtauerbach]
- Shane, it depends on the details of the hashing. For example, an unsalted hash of social security numbers in that isolated table does not help at all
- 16:57:02 [Chris_IAB]
- new (related) subject: are toilet seat covers effective? (again, humor is my defense mechanism :)
- 16:57:09 [justin]
- aleecia, Fair enough, to the extent there is an inherent risk that a delinked 30-day set of urls is inherently identifiable and/or tiable to other 30-day sets.
- 16:57:11 [Zakim]
- -[GVoice]
- 16:57:21 [Wileys]
- dtauerbach, agreed - I'm speaking only of salted or keyed hashes.
- 16:57:43 [Zakim]
- -Jonathan_Mayer
- 16:57:45 [susanisrael]
- peter swire: so if some risk of internal misuse, but hash passwords or separate research database from where it came from, you reduce risk even.,..
- 16:58:04 [susanisrael]
- if doesn't protect against sophisticated attacks, reduces risk from normal people.
- 16:58:04 [Zakim]
- +Jonathan_Mayer
- 16:58:12 [aleecia]
- Justin - exactly
- 16:58:20 [vincent]
- vincent has joined #dnt
- 16:58:28 [susanisrael]
- felix: i think we are seeing risk reduction in normal ways. seeing qu from ed re: scenarios
- 16:58:58 [aleecia]
- I would guess that at 24 hours I'd be ok. But I'd need to know more. And I think the right way to get at this is not a timeframe, but rather the ability to chain across datasets
- 16:59:07 [susanisrael]
- in some sense from tech perspective does not help much but if the data just requires an extra step that may be enough to deter or detect attack from pt of view of internal controls
- 16:59:26 [susanisrael]
- mike nolet: re: david's question. what is risk you are talking of reducing
- 16:59:39 [susanisrael]
- someone: risk that info on research side is then used to target
- 16:59:52 [susanisrael]
- felix? if dnt is 1?
- 16:59:55 [Zakim]
- - +1.202.257.aaff
- 16:59:56 [jmayer]
- -q
- 16:59:59 [susanisrael]
- yes:
- 17:00:08 [aleecia]
- q?
- 17:00:18 [peterswire_]
- peterswire_ has joined #dnt
- 17:00:43 [susanisrael]
- ed felten: cs views attacks at 3 levels. started discussion bc broad claims were made that hashed data should be treated as per se de-identified.
- 17:00:48 [Wileys]
- Ed, It was never stated in isolation but as one factor of multiple steps to achieve unlinkability.
- 17:00:54 [Wileys]
- Ed, at least not by me
- 17:01:28 [susanisrael]
- ...we don't have to talk about hashing or micromanage how people protect, but i don't think we should talk about hashing as total protection
- 17:02:09 [susanisrael]
- paul glist: broad claims on both sides. have looked at this as dial. can reduce risk to socially acceptable levels. hashing is not nothing...
- 17:02:20 [Chris_IAB]
- +1 to current speaker's point
- 17:02:29 [susanisrael]
- ...and not everything. it's a tool. add other tools. it's useful.
- 17:02:53 [jmayer]
- There are protections that are effective even if an attacker controls the terminal. That's part of the point.
- 17:03:08 [susanisrael]
- johnsimpson: still having trouble figuring out how this relates to DNT. have been talking about protecting data sets with pii.
- 17:03:14 [dwainber_]
- dwainber_ has joined #dnt
- 17:03:16 [peterswire]
- peterswire has joined #dnt
- 17:03:29 [dtauerbach]
- jmayer, for example: hard disk encryption
- 17:03:34 [susanisrael]
- chris iab: you may want to have access to uri's for example. but don't need it connected to unique users
- 17:03:34 [justin]
- Right, the deidentification method has to take into account the internal misuse angle.
- 17:03:50 [peterswire]
- peterswire has joined #dnt
- 17:03:55 [susanisrael]
- john simpson: but that's the disconnect bc most people saying that dnt is do not collect
- 17:03:58 [jmayer]
- q+
- 17:04:00 [susanisrael]
- someone: is that right?
- 17:04:22 [susanisrael]
- someon: if there is any identifier you still have a problem
- 17:04:39 [justin]
- Someone is justin, someon is jmayer :)
- 17:04:45 [susanisrael]
- peter swire: we heard different perspectives:
- 17:04:53 [susanisrael]
- * thanks justin
- 17:05:31 [susanisrael]
- peter swire...unique identifiers. can you enlighten me? how is going into buckets relevant?
- 17:06:20 [susanisrael]
- someone asks if adding attributes and using those is unique identifiers
- 17:06:32 [yianni]
- s/someone/joe hall
- 17:06:48 [peterswire]
- peterswire has joined #dnt
- 17:06:51 [susanisrael]
- dan auerbach: better privacy friendly way to add advertising that is targeted. need minimum number of people in a bucket
- 17:07:05 [rvaneijk]
- Dan, the minimum buckets make nice micro-segments.
- 17:07:19 [peterswire]
- peterswire has joined #dnt
- 17:07:26 [susanisrael]
- ...we suggested 1024 is a minimum bar. with that don't need unique identifier, just low entropy cookies
- 17:07:48 [susanisrael]
- heather: might be useful to look at transcript of previous discussion
- 17:07:49 [jmayer]
- If you're interested in advertising, analytics, etc. without unique IDs... https://air.mozilla.org/tracking-not-required/
- 17:07:59 [susanisrael]
- peterswire: room is not catching fire on this
- 17:08:41 [susanisrael]
- chris mejia: i do agree with dan's core premise, that much harder to identify person from a few attributes distilled from all the uris that people visited
- 17:08:47 [aleecia]
- q+
- 17:08:56 [jmayer]
- q- later
- 17:08:56 [susanisrael]
- dan auerbach: can keep those collections without unique identifers
- 17:09:21 [peterswire]
- ok, I see aleecia and jonathan
- 17:09:37 [susanisrael]
- chris: we agree on that part (harder to identify that way-with quasi identifiers), not necessarily the second part
- 17:09:43 [dwainber_]
- q?
- 17:09:45 [susanisrael]
- .....that is sort of an industry practice
- 17:09:47 [dwainber_]
- q+
- 17:09:51 [yianni]
- ack aleecia
- 17:10:10 [susanisrael]
- aleecia: i think we are all getting there. want to separate 2 different parts of dan's description. one is how to do ads without tracking....
- 17:10:32 [susanisrael]
- ...but pertinent is here's how you can do de-identification, suggest we focus on the de-id half
- 17:10:33 [peterswire_]
- peterswire_ has joined #dnt
- 17:10:50 [susanisrael]
- aleecia: ....interesting re: reduced identificaiton risk
- 17:11:05 [Zakim]
- + +1.631.803.aakk
- 17:11:06 [peterswire]
- peterswire has joined #dnt
- 17:11:16 [dwainberg]
- dwainberg has joined #dnt
- 17:11:20 [susanisrael]
- david wainberg: outline of discusison, 3 general models: 1. random unique identifier, interest buckets
- 17:11:47 [susanisrael]
- 2. unique identifier associated with buckets, dan proposing buckets only, no identifiers
- 17:11:58 [susanisrael]
- dan: maybe what aleecia proposed make sense
- 17:12:43 [susanisrael]
- davd w: as discussed earlier, what we mean by de-identified requires setting threshold, and we're just jumping to let's break the connection instead of
- 17:12:58 [dwainbe__]
- dwainbe__ has joined #dnt
- 17:13:16 [susanisrael]
- ...discussing what is a level of acceptable risk. there are significant consequences to forcing ad industry to do this
- 17:13:22 [aleecia]
- what does "not linked at all" mean here?
- 17:13:32 [peterswire_]
- peterswire_ has joined #dnt
- 17:13:32 [susanisrael]
- peter swire: if not linked at all then outside dnt
- 17:13:32 [aleecia]
- q?
- 17:13:40 [susanisrael]
- davd w: but still some risk
- 17:14:00 [susanisrael]
- ed: but gets to idea of attribute disclosure vs record re-identificaiton
- 17:14:01 [peterswire_]
- peterswire_ has joined #dnt
- 17:14:03 [yianni]
- ack dwainber
- 17:14:17 [aleecia]
- q+
- 17:14:22 [susanisrael]
- ed: matters a lot what the bucket is: soccer dad vs. aids patient
- 17:14:23 [jmayer]
- q- later
- 17:14:40 [aleecia]
- would like to respond to Ed
- 17:14:48 [susanisrael]
- ed: need more than knowing that there is a bucket, some sensitive info has to not be used
- 17:15:01 [susanisrael]
- ed: but combos of attributes could identify
- 17:15:09 [jmayer]
- Just to be clear, the DAA principles do not prohibit inferences about medical conditions.
- 17:15:34 [jmayer]
- q+
- 17:15:37 [jmayer]
- q+ earlier
- 17:15:42 [susanisrael]
- mike: want to come back to theme: understanding what we're trying to accomplish. what is bad stuff we are trying to prevent. seeing a relevant ad?
- 17:15:43 [jmayer]
- q- earlier
- 17:15:49 [aleecia]
- could we please stay on topic?
- 17:15:56 [peterswire_]
- jonathan -- I'm unclear -- are you in the q?
- 17:16:02 [aleecia]
- this is an interesting discussion, but not today's agenda
- 17:16:02 [susanisrael]
- ...what other bad stuff, scary outcomes, than seeing an ad for something i bought on amazon?
- 17:16:08 [jmayer]
- Yep, just testing the limits of Zakim.
- 17:16:12 [rvaneijk]
- The HARM is not a relevant factor when it comes to unlinkability
- 17:16:26 [dtauerbach]
- q?
- 17:16:28 [johnsimpson]
- q?
- 17:16:33 [susanisrael]
- peter swire: what the harm is in tracking comes up in a lot of settings but not main topic today
- 17:16:33 [yianni]
- ack aleecia
- 17:17:07 [susanisrael]
- aleecia: want to respond to ed re: which buckets you might care more about, but group decided we would not distinguish, say re: childrens data
- 17:17:14 [peterswire]
- peterswire has joined #dnt
- 17:17:23 [susanisrael]
- ....treating all data same here, which is different than iab daa position
- 17:17:42 [aleecia]
- ack jmayer
- 17:17:43 [susanisrael]
- peter swire: thank you for history but some people do not acknowledge they agreed to that
- 17:17:44 [yianni]
- ack jmayer
- 17:17:47 [susanisrael]
- jmayer passes
- 17:17:54 [peterswire_]
- peterswire_ has joined #dnt
- 17:18:29 [susanisrael]
- peter swire: had initial discussions on buckets and learned a bit on dimensions there. talked with mike at break re: example of something you think it would beuseful to look at
- 17:18:39 [aleecia]
- of note: this is not me *objecting* to treating some data as of more concern. just what the group decided many months ago.
- 17:19:00 [susanisrael]
- david wainberg: i thought next step would be taking approach of your favorite slide and start thinking through risks and how to apply techniques to mitigage
- 17:19:06 [aleecia]
- if there is new information before the group, Peter & Matthias have the option to reopen
- 17:19:12 [Zakim]
- -moneill2.a
- 17:19:18 [Wileys]
- Aleecia - my memory matches yours - we decided to not get bogged down in the "sensitivity" debate and allow self-regulation and laws deal with that item
- 17:19:24 [susanisrael]
- peter swire: that is one possible work flow. use khaled's checklist
- 17:19:51 [susanisrael]
- ...maybe there are subsets of people willing to do work on that and come back with a draft. let peter know after meeting if you want to work on
- 17:20:04 [justin]
- Yes, there has never been anything about "sensitive" data in the compliance spec.
- 17:20:05 [aleecia]
- thanks Shane. it was a while ago and pre-dates many folks joining the group. if needed the minutes are out there, but my eagerness to volunteer to find it is not particularly high this week
- 17:20:12 [jmayer]
- q+
- 17:20:15 [susanisrael]
- chris: i have not gotten an answer to what works and protects data if hashing does not work, assuming we will have data
- 17:20:18 [justin]
- Well, apart from that one geolocation section . . .
- 17:20:45 [susanisrael]
- khaled: in health context use probablistic encryption that permits mathematical operations on data
- 17:20:45 [peterswire_]
- peterswire_ has joined #dnt
- 17:20:54 [Wileys]
- Aleecia, I likewise have not desire to volunteer on that point :-) But would be happy to argue to the same outcome as I believe it was a good decision by the group
- 17:20:58 [susanisrael]
- ...encrypt at source in browser....
- 17:21:10 [Wileys]
- Justin, agreed - not sure how that snuck through...
- 17:21:28 [susanisrael]
- if you want to use those values to do lookup in db not possible for db owner to determine lookup result
- 17:21:35 [efelten]
- efelten has joined #dnt
- 17:21:43 [susanisrael]
- ....efficient process. not much slower than hashing.
- 17:21:52 [susanisrael]
- ...using for lookup in large database
- 17:22:12 [susanisrael]
- peter swire: on a wednesday call could learn about homomorphic encryption. seeing nods on this
- 17:22:28 [susanisrael]
- dan auerbach. talking about fully homomorphic encryption? we are not close?
- 17:22:33 [susanisrael]
- khaled: partial
- 17:23:14 [dwainberg]
- dwainberg has joined #dnt
- 17:23:21 [susanisrael]
- felix: also techniques like differential privacy, adding noise to data. questions whether data still useful, but also protects against some attribute disclosure:
- 17:23:21 [aleecia]
- My recollection is Jeff was alone at the time, perhaps one or two people with him at most, and the rest of the group either had the view you have, Shane, or came up with "we don't care, let's talk about something more interesting"
- 17:23:30 [efelten_]
- efelten_ has joined #dnt
- 17:23:38 [peterswire]
- Q?
- 17:23:54 [susanisrael]
- jeff: with encryption or data modificiaton the criticism of hashing is that if you have key or access you can get around, and same is true for other methods, for example keys
- 17:24:00 [peterswire]
- peterswire has joined #dnt
- 17:24:14 [susanisrael]
- felix: not wrt noise, which you can't figure out even if you know how noise was added
- 17:24:22 [peterswire]
- peterswire has joined #dnt
- 17:24:34 [susanisrael]
- ed: lets put off discussion on how works
- 17:24:49 [susanisrael]
- david : interesting but jumping to solution without identifying problems
- 17:25:18 [susanisrael]
- felix: noticing that there is symmetry to this. many techniques improve privacy but limit value of data.
- 17:25:22 [justin]
- WileyS, at some point we'll have to go back and revisit that piece.
- 17:25:37 [susanisrael]
- ....homomorphic encryption does not presreve ability to do many things with data
- 17:25:51 [Wileys]
- Justin, we'll never finish this standard if we attempt to define what is "sensitive" in a global marketplace - good luck with that.
- 17:26:04 [susanisrael]
- felix: what use are we trying to preserve once data is de-identified. some uses will be preserved, others not
- 17:26:09 [yianni]
- ack jmayer
- 17:26:11 [aleecia]
- The geoIP part was well locked down, and then Ian rejoined and *did* have new information.
- 17:26:23 [susanisrael]
- jmayer: will postpone since postponing methodology discussion
- 17:26:44 [justin]
- WileyS, I am not arguing that we should.
- 17:26:53 [susanisrael]
- peter swire: thanks to khaled for coming and providing expertise. there was clear explanation of risk based approach used in other settings
- 17:27:15 [aleecia]
- We cannot bar geoIP since knowing where people are affects what to do if DNT is unset
- 17:27:28 [peterswire]
- peterswire has joined #dnt
- 17:27:31 [susanisrael]
- ...we also i think has some terminology gain in a lot of places. de-identified or de-linked are conclusion terms that apply once you have a standard, for example in hipaa....
- 17:27:42 [aleecia]
- So we were trying to find a way to say "fine, fine, just pick a large enough geography," and then were hung up in the details on what that means
- 17:27:50 [susanisrael]
- .....we also had variety of other terms about direct identifiers and quasi identifiers that will be helpful....
- 17:28:05 [susanisrael]
- ....heard interest in presentation for homomorphic encryption...
- 17:28:30 [susanisrael]
- ...also heard suggestion re: doing pieces of that one slide--what are harms, risks, people are concerned about, and
- 17:28:36 [jmayer]
- If we're going to discuss methodologies, differential privacy and privacy-preserving implementations should make the cut.
- 17:28:50 [susanisrael]
- ...in particular for online setting develop use cases we should care about if we are to get to homomorphic encryption.
- 17:29:07 [susanisrael]
- ....any other action items?
- 17:29:08 [mnolet_]
- mnolet_ has joined #dnt
- 17:29:37 [susanisrael]
- ...if you have them after the meeting i welcome those. we are heading to f2f mtg, and want to make progress on this in advance...
- 17:29:45 [Zakim]
- -bryan
- 17:29:47 [aleecia]
- thanks, Peter!
- 17:29:49 [susanisrael]
- ....thanks to cdt, khaled, all who came
- 17:29:50 [Zakim]
- - +1.631.803.aakk
- 17:29:51 [johnsimpson]
- johnsimpson has left #dnt
- 17:29:52 [Zakim]
- -Brooks
- 17:29:54 [Zakim]
- -Peder_Magee
- 17:29:55 [aleecia]
- and thanks Susan for scribing so much!
- 17:30:04 [Zakim]
- - +1.215.286.aaee
- 17:30:05 [Zakim]
- -rvaneijk
- 17:30:07 [Zakim]
- -Aleecia
- 17:30:12 [yianni]
- rrsagent, make logs public
- 17:30:18 [Zakim]
- -vincent
- 17:30:25 [yianni]
- rrsagent, set logs would visible
- 17:30:36 [aleecia]
- (you want public)
- 17:30:40 [yianni]
- rrsagent, draft minutes
- 17:30:40 [RRSAgent]
- I have made the request to generate http://www.w3.org/2013/01/17-DNT-minutes.html yianni
- 17:31:14 [Zakim]
- -Jonathan_Mayer
- 17:31:22 [Zakim]
- -vinay
- 17:31:39 [Ho-Chun_Ho_]
- Ho-Chun_Ho_ has left #dnt
- 17:31:58 [Zakim]
- -SusanIsrael
- 17:34:21 [Zakim]
- -WileyS
- 17:45:39 [Zakim]
- -DAvid
- 17:50:11 [peterswire]
- peterswire has joined #dnt
- 17:56:13 [Zakim]
- -moneill2
- 18:05:00 [Zakim]
- disconnecting the lone participant, [CDT], in Team_(dnt)14:00Z
- 18:05:02 [Zakim]
- Team_(dnt)14:00Z has ended
- 18:05:02 [Zakim]
- Attendees were Jonathan_Mayer, [GVoice], rvaneijk, +1.425.214.aaaa, Aleecia, +1.202.587.aabb, WileyS, +1.631.803.aacc, [CDT], +1.215.796.aadd, bryan, +1.215.286.aaee, vincent,
- 18:05:02 [Zakim]
- ... Peder_Magee, +1.202.257.aaff, +1.646.722.aagg, Brooks, +1.917.934.aahh, vinay, +1.646.654.aaii, +1.215.286.aajj, moneill2, SusanIsrael, DAvid, +1.631.803.aakk
- 18:05:08 [efelten]
- efelten has joined #dnt
- 18:05:55 [JoeHallCDT]
- JoeHallCDT has joined #DNT
- 18:16:54 [efelten]
- efelten has joined #dnt
- 18:35:59 [JoeHallCDT]
- JoeHallCDT has left #dnt
- 18:40:12 [dwainberg]
- dwainberg has joined #dnt
- 18:43:50 [dwainber_]
- dwainber_ has joined #dnt
- 18:51:42 [efelten]
- efelten has joined #dnt
- 18:55:10 [mnolet]
- mnolet has joined #dnt
- 19:12:16 [robsherman]
- robsherman has joined #dnt
- 19:31:46 [Zakim]
- Zakim has left #dnt
- 19:43:27 [npdoty]
- npdoty has joined #dnt
- 20:00:43 [dsinger]
- dsinger has joined #dnt
- 21:50:51 [hwest]
- hwest has joined #dnt