IRC log of DNT on 2013-01-17

Timestamps are in UTC.

13:57:43 [RRSAgent]
RRSAgent has joined #DNT
13:57:43 [RRSAgent]
logging to http://www.w3.org/2013/01/17-DNT-irc
13:57:45 [bryan]
bryan has joined #dnt
13:58:09 [yianni]
Zakim, this will be 87225
13:58:09 [Zakim]
ok, yianni; I see Team_(dnt)14:00Z scheduled to start in 2 minutes
13:58:14 [dtauerbach]
dtauerbach has joined #dnt
13:58:32 [peterswire]
peterswire has joined #dnt
13:58:38 [aleecia]
aleecia has joined #dnt
13:58:51 [JoeHallCDT]
JoeHallCDT has joined #DNT
13:59:18 [aleecia]
zakim, code?
13:59:18 [Zakim]
the conference code is 87225 (tel:+1.617.761.6200 sip:zakim@voip.w3.org), aleecia
13:59:54 [jeffwilson]
jeffwilson has joined #dnt
14:00:06 [rvaneijk]
When I dial in, I do not see myself in the IRC as dialed in..
14:00:15 [aleecia]
Rob, neither do I
14:00:17 [justin]
justin has joined #dnt
14:00:17 [Paul]
Paul has joined #DNT
14:00:24 [aleecia]
Possibly just slow?
14:00:48 [aleecia]
But I'm guessing something is broken in the Zakim world
14:00:58 [Wileys]
Wileys has joined #dnt
14:01:03 [vincent]
vincent has joined #dnt
14:01:10 [jmayer]
W3C: fixing IRC bots and taking attendance since...
14:01:15 [bryan]
zakim appears to be a little sleepy
14:01:18 [johnsimpson]
johnsimpson has joined #dnt
14:01:21 [dwainberg]
dwainberg has joined #dnt
14:01:24 [aleecia]
<groan>
14:01:32 [bryan]
BAU
14:01:38 [hwest]
hwest has joined #dnt
14:01:47 [justin]
Getting ready to dial in.
14:02:00 [johnsimpson_]
johnsimpson_ has joined #dnt
14:02:04 [johnsimpson_]
Good morning
14:02:11 [aleecia]
I planned to before I got sick
14:02:28 [peterswire]
peterswire has joined #dnt
14:02:29 [efelten_]
efelten_ has joined #dnt
14:02:30 [Marc_]
Marc_ has joined #DNT
14:02:54 [johnsimpson_]
johnsimpson_ has joined #dnt
14:02:55 [aleecia]
(someone is typing & needs to mute)
14:02:57 [peterswire]
peterswire has joined #dnt
14:03:00 [johnsimpson]
john
14:03:12 [aleecia]
hi
14:03:12 [johnsimpson]
testing IRC
14:03:27 [yianni]
Zakim, this is dnt
14:03:27 [Zakim]
ok, yianni; that matches Team_(dnt)14:00Z
14:03:34 [efelten_]
efelten_ has joined #dnt
14:03:54 [JoeHallCDT]
joe is scribe… someone remind me how to tell Zakim that and to start notes
14:03:54 [Zakim]
+ +1.215.796.aadd
14:03:55 [yianni]
scribe: JoeHallCDT
14:03:58 [Wileys]
Zakim, who is on the call?
14:03:58 [Zakim]
On the phone I see [GVoice], Jonathan_Mayer, +1.425.214.aaaa, Aleecia, +1.202.587.aabb, WileyS, ??P9, +1.631.803.aacc, rvaneijk, [CDT], +1.215.796.aadd
14:04:04 [efelten_]
efelten_ has joined #dnt
14:04:05 [johnsimpson]
johnsimpson has joined #dnt
14:04:05 [bryan]
present+ Bryan_Sullivan
14:04:33 [bryan]
zakim, aaaa is bryan
14:04:33 [Zakim]
+bryan; got it
14:04:34 [JoeHallCDT]
Peter Swire: goal is to discuss to what extent De-ID can remove data from scope of the standard
14:04:41 [johnsimpson_]
johnsimpson_ has joined #dnt
14:04:50 [Zakim]
+ +1.215.286.aaee
14:04:54 [Zakim]
- +1.215.796.aadd
14:04:56 [Zakim]
-??P9
14:04:59 [JoeHallCDT]
… related: what sort of uses are consistent with compliance with the spec
14:05:05 [efelten]
efelten has joined #dnt
14:05:20 [JoeHallCDT]
… if things are used for market research in ways that are entirely de-ID, that should be safe or out of scope
14:05:34 [JoeHallCDT]
… on the other hand, if explicitly ID'd, standard should apply
14:05:40 [Zakim]
+??P9
14:05:42 [JoeHallCDT]
… clearly defining uses is crucial
14:05:44 [peterswire_]
peterswire_ has joined #dnt
14:05:57 [JoeHallCDT]
… getting clear on terms, words and such is an important part of this
14:06:02 [vincent]
zakim, ??P9 is vincent
14:06:02 [Zakim]
+vincent; got it
14:06:07 [peterswire]
peterswire has joined #dnt
14:06:07 [johnsimpson]
johnsimpson has joined #dnt
14:06:32 [efelten_]
efelten_ has joined #dnt
14:06:38 [johnsimpson]
johnsimpson has joined #dnt
14:06:46 [JoeHallCDT]
… instead of having people talking past each other, we want a strong foundation of shared vocabulary
14:07:07 [JoeHallCDT]
… delighted to have great people in the room and on the phone
14:07:12 [justin]
q?
14:07:19 [johnsimpson]
johnsimpson has joined #dnt
14:07:22 [JoeHallCDT]
… agenda has been sent around
14:07:35 [JoeHallCDT]
… ground rules for discussion
14:07:43 [JoeHallCDT]
… this is not an official in-person meeting with 8 weeks notice
14:07:49 [yianni]
Zakim, who is on the call?
14:07:49 [Zakim]
On the phone I see [GVoice], Jonathan_Mayer, bryan, Aleecia, +1.202.587.aabb, WileyS, +1.631.803.aacc, rvaneijk, [CDT], +1.215.286.aaee, vincent
14:07:58 [JoeHallCDT]
… have been told by w3c staff that this can't make decisions towards normative language
14:08:30 [johnsimpson_]
johnsimpson_ has joined #dnt
14:08:31 [JoeHallCDT]
… it would be good to agree on terms and definitions
14:08:50 [JoeHallCDT]
… this should make people more comfortable with claims made in the world
14:08:50 [Zakim]
+Peder_Magee
14:08:56 [Wileys]
If you share that information externally...
14:08:57 [JoeHallCDT]
… e.g., unsalted hashes
14:09:18 [peterswire_]
peterswire_ has joined #dnt
14:09:26 [johnsimpson_]
johnsimpson_ has joined #dnt
14:09:30 [jmayer]
Could introductions include technical background? It would be helpful to understand who'll be participating from the technical side and who'll be observing from the law/policy perspective.
14:09:42 [JoeHallCDT]
might want to q that jmayer
14:09:50 [JoeHallCDT]
… first thing is incentives to de-ID
14:09:58 [aleecia]
Do we need to re-introduce ourselves?
14:10:06 [johnsimpson]
johnsimpson has joined #dnt
14:10:31 [JoeHallCDT]
… Khaled El Emam will start us off with slides (jlh: not sure how phone peeps will see them)
14:10:34 [johnsimpson]
johnsimpson has joined #dnt
14:10:48 [JoeHallCDT]
… then to hashing, persistent ids, putting people in "buckets"
14:10:52 [rvaneijk]
please send slides to the list and/or post them on the wiki !
14:11:08 [JoeHallCDT]
… Yianni will gather qs
14:11:23 [Zakim]
+ +1.202.257.aaff
14:11:30 [johnsimpson]
johnsimpson has joined #dnt
14:11:31 [dwainber_]
dwainber_ has joined #dnt
14:11:49 [efelten_]
efelten_ has joined #dnt
14:11:53 [JoeHallCDT]
… will go around the room, please let us know any techincal experience
14:11:57 [aleecia]
cannot hear
14:11:58 [JoeHallCDT]
… Peter, law prof.
14:12:12 [JoeHallCDT]
… Khaled works at U Toronto, CS background, working on health
14:12:22 [efelten_]
efelten_ has joined #dnt
14:12:23 [Zakim]
+ +1.646.722.aagg
14:12:28 [johnsimpson]
johnsimpson has joined #dnt
14:12:31 [JoeHallCDT]
Dan Aurbach from EFF, worked at Google before doing data mining
14:12:33 [Aturkel]
Aturkel has joined #DNT
14:12:51 [JoeHallCDT]
John Simpson, Consumer watchdog
14:12:55 [peterswire]
peterswire has joined #dnt
14:12:58 [JoeHallCDT]
Ed Felten, Princeton U.
14:13:00 [johnsimpson]
johnsimpson has joined #dnt
14:13:05 [JoeHallCDT]
research and teaching for 18 yuears
14:13:17 [JoeHallCDT]
Felix Wu, prof. at Cordozo, PhD in CS from Berkeley
14:13:21 [mecallahan]
mecallahan has joined #DNT
14:13:27 [JoeHallCDT]
Peter invited Felix based on techincal work
14:13:36 [JoeHallCDT]
Paul Gliss, lawyer from Comcast, worked in De-ID space
14:13:46 [efelten_]
efelten_ has joined #dnt
14:14:01 [JoeHallCDT]
Chris Mejia, IAB, dir. of ad technology, tech dir. for DAA
14:14:04 [johnsimpson]
johnsimpson has joined #dnt
14:14:10 [JoeHallCDT]
Jeff Wilson, with AOL for 16 years
14:14:14 [JoeHallCDT]
Marc Groman, NAI
14:14:26 [JoeHallCDT]
David Wainberg, NAI, undergrad. at CS, web dev. for years
14:14:29 [JoeHallCDT]
Heather West, Google
14:14:33 [JoeHallCDT]
Justin Brookman, CDT
14:14:50 [JoeHallCDT]
Bill Scanell, (probably a lawyer in a suit?) here to assist with communications
14:15:04 [johnsimpson_]
johnsimpson_ has joined #dnt
14:15:14 [JoeHallCDT]
Peter McGee from FTC
14:15:31 [JoeHallCDT]
Shane Wiley, Yahoo!!
14:15:32 [johnsimpson]
johnsimpson has joined #dnt
14:15:42 [JoeHallCDT]
Mary Ellen Callahan, Jenner and Block
14:15:54 [JoeHallCDT]
Aleecia McDonald, PhD engineering
14:16:04 [bryan]
Bryan Sullivan, AT&T Director of Service Standards, WAP/Web browsing service architecture and mobile/web standards for AT&T since pre-2000
14:16:05 [JoeHallCDT]
Adam Turkel, lawyer with AppNexis
14:16:16 [dwainberg]
dwainberg has joined #dnt
14:16:16 [JoeHallCDT]
Bryan (?), AT&T director of standards
14:16:27 [johnsimpson]
johnsimpson has joined #dnt
14:16:27 [peterswire]
peterswire has joined #dnt
14:16:30 [dtauerbach]
dtauerbach has joined #dnt
14:16:36 [JoeHallCDT]
Ho Chun Ho, Comcast, data arch.
14:16:56 [peterswire_]
peterswire_ has joined #dnt
14:16:59 [AHanff]
AHanff has joined #dnt
14:17:04 [JoeHallCDT]
Jonathan Mayer, PhD student in CS at Stanford, at Stanford Security Lab
14:17:07 [johnsimpson_]
johnsimpson_ has joined #dnt
14:17:40 [efelten__]
efelten__ has joined #dnt
14:17:43 [AHanff]
is there a call on now?
14:18:09 [JoeHallCDT]
Rob van Eijk, PhD student at x, (very lengthy afi. and background)
14:18:10 [aleecia]
Yes, we're on a call now
14:18:24 [JoeHallCDT]
Vincent Toubiana, Alcatel Lucent, PhD CS
14:18:25 [rvaneijk]
s/x/Leiden University/
14:18:28 [AHanff]
thanks I didnt see it on the icalendar
14:18:41 [efelten_]
efelten_ has joined #dnt
14:18:42 [rvaneijk]
aff: Art. 29 Data Protection Working Party / Dutch DPA
14:18:44 [JoeHallCDT]
Jules P, from Future of Privacy Forum
14:19:26 [yianni]
scribe: yianni
14:19:31 [Brooks]
Brooks has joined #dnt
14:19:32 [Zakim]
+[IPcaller]
14:19:38 [peterswire]
peterswire has joined #dnt
14:19:53 [johnsimpson]
johnsimpson has joined #dnt
14:19:53 [yianni]
Peter: Getting logistics worked out, brainstorm reasons in advertising and online space
14:20:01 [peterswire_]
peterswire_ has joined #dnt
14:20:05 [yianni]
...why people have incentives to de-identify
14:20:16 [yianni]
...self interest, business, or other reasons
14:20:21 [Zakim]
+Brooks
14:20:31 [pedermagee]
pedermagee has joined #DNT
14:20:36 [yianni]
...if we understand reasons, we might be able to understand what things will be done in practice
14:20:51 [johnsimpson_]
johnsimpson_ has joined #dnt
14:20:54 [yianni]
.privacy policy that says you do things in de-identified or anonymized ways
14:21:09 [yianni]
...we do not use PII for certain operations, for example
14:21:13 [johnsimpson_]
johnsimpson_ has joined #dnt
14:21:22 [yianni]
...risk for not following promises
14:22:10 [yianni]
Marc: people do not de-identify to avoid liability, they do it to mitigate privacy and security risk, then make the promise
14:22:12 [johnsimpson]
johnsimpson has joined #dnt
14:22:12 [efelten__]
efelten__ has joined #dnt
14:22:24 [yianni]
Paul:providing comfort to cusomters is a reason to de-identify
14:22:34 [johnsimpson_]
johnsimpson_ has joined #dnt
14:22:45 [yianni]
Peter: 2nd, organization have costs to data breaches, states and Europe
14:22:47 [efelten_]
efelten_ has joined #dnt
14:23:05 [yianni]
...expense of sending out notice and going through steps of data breach, if de-id you do not have to disclose
14:23:06 [Wileys]
Encrypted is different than de-identified
14:23:09 [peterswire]
peterswire has joined #dnt
14:23:16 [johnsimpson]
johnsimpson has joined #dnt
14:23:31 [yianni]
Jules: big driver, beginning of NAI, big ad networks and crisis around it
14:23:38 [peterswire]
peterswire has joined #dnt
14:23:40 [aleecia]
In my experience, companies that say they only work with anonymous data mean it in the Latin sense -- literally without name. They do not mean that users are unidentifiable. I think we need to be very careful to keep these ideas separate.
14:24:03 [Marc_]
+q
14:24:06 [yianni]
...NAI treated PII and non PII very differently, representing in privacy policy that you tracked PII, you could make notice in opt-out notice
14:24:14 [efelten__]
efelten__ has joined #dnt
14:24:21 [yianni]
...in PIII, need more notice on web page, perhaps an opt-in
14:24:50 [johnsimpson_]
johnsimpson_ has joined #dnt
14:24:50 [yianni]
... 7 large networks adopted, and forced other partners to follow
14:25:20 [yianni]
...huge driver for ad netword that they make a specific representation of PII and non PII
14:25:32 [yianni]
Peter: are they other legal regimes for de-id?
14:25:33 [efelten_]
efelten_ has joined #dnt
14:25:37 [jmayer]
Rob, could you briefly address EU law?
14:25:55 [johnsimpson_]
johnsimpson_ has joined #dnt
14:25:58 [yianni]
Paul: regulatory treatment that is different for cable, services provided by cable providers
14:26:10 [yianni]
...makes distinction between personally identified and not identified
14:26:21 [Wileys]
Peter - are you suggesting if data is not linked to PII then it is "de-identified"?
14:26:23 [peterswire]
peterswire has joined #dnt
14:26:26 [yianni]
...much like NAI, different rules for consent and approval
14:26:47 [peterswire_]
peterswire_ has joined #dnt
14:26:52 [efelten_]
efelten_ has joined #dnt
14:26:56 [johnsimpson]
johnsimpson has joined #dnt
14:26:57 [robsherman]
robsherman has joined #dnt
14:27:15 [yianni]
Marc: data security issues, beyond financial issues, reputational risk is a very large piece of it as well
14:27:53 [yianni]
...privacy incident, costs are much higher than outside council and regulatory burdens, for many years talk about the x company incident
14:27:57 [bryan]
Shane, I think the question is whether "is" includes "can be", i.e. data not linked vs non-linkable is by definition non-PII
14:28:16 [yianni]
Peter: NAI, Cable Act, also have HIPAA, GLBA
14:28:30 [yianni]
...if you are outside regime, you do not have regulatory burden
14:28:49 [robsherman1]
robsherman1 has joined #dnt
14:28:49 [aleecia]
Shane - I think it's abundantly clear that no PII is not the same as non-identifiable (see Paul Ohm's summary paper) but I understand you're asking for Peter's view, which I do not know.
14:28:57 [yianni]
Marc: Privacy act, privacy impact assessment depends on whether you have individually identifiabe information
14:29:24 [yianni]
Peter: inside an organization, you have incentives of access controls, more people can tough if not PII
14:29:29 [Wileys]
Bryan, that's my question - is it an absolute position? I've always felt de-identified was "more" than simply not PII.
14:29:35 [efelten__]
efelten__ has joined #dnt
14:29:35 [Wileys]
Aleecia - see above :-)
14:29:54 [yianni]
...data base with financial information, many reasons for access control limits
14:30:00 [peterswire]
peterswire has joined #dnt
14:30:12 [yianni]
...for other employees there is a risk of breach if you do not De-identify
14:30:14 [efelten_]
efelten_ has joined #dnt
14:30:32 [johnsimpson_]
johnsimpson_ has joined #dnt
14:30:39 [efelten_]
efelten_ has joined #dnt
14:30:40 [yianni]
Khaled: opt-in consent or opt-out, evidence in health care sector for consent bias
14:30:55 [yianni]
...de-identification allows you to avoid consent bias
14:31:03 [johnsimpson]
johnsimpson has joined #dnt
14:31:06 [efelten_]
efelten_ has joined #dnt
14:31:13 [Wileys]
PII/Personal Data -> Pseudo/Anonymous -> De-Identified/Unlinkable -> No Value
14:31:30 [rvaneijk]
any kind of analytics is very far streched...
14:31:32 [johnsimpson]
johnsimpson has joined #dnt
14:31:35 [yianni]
Khaled: Beyond researchers, goes to analytics (bias data because you are missing a certain percent of population)
14:31:57 [yianni]
Peter: having full population better for the researchers, De-ID is a tool to get accurate analytics
14:31:58 [johnsimpson]
johnsimpson has joined #dnt
14:32:09 [yianni]
...Any other comments on reasons why people do de-identification?
14:32:32 [aleecia]
Shane - I can imagine a dataset that removes PII and is also then not re-identifiable. But that's not a general rule. It's probably easier to talk about the type of data we're using. Removing PII is not going to render a server log file "safe," and indeed there might never be PII in the first place, yet still have identifiable data.
14:32:43 [yianni]
...reasons for people to do this, trying to understand the terminology
14:32:46 [RichLaBarca]
RichLaBarca has joined #DNT
14:32:53 [johnsimpson]
johnsimpson has joined #dnt
14:33:00 [yianni]
...Khaled has a book on de-id coming out the beginning of April
14:33:12 [aleecia]
Are slides available now?
14:33:12 [efelten_]
efelten_ has joined #dnt
14:33:12 [yianni]
...Khaled starting with part 2 and his slides
14:33:20 [bryan]
Shane, to be clear I was not stating a position, but a question. IMO identity includes a range of attributes only some of which are personal - remove/obscure the personal ones and you're home - science will always find new ways to relink and attribute data to persons, and we should not be trying to chase that rabbit
14:33:21 [peterswire_]
peterswire_ has joined #dnt
14:33:24 [Wileys]
Slides have not come through on email yet!!!
14:33:30 [johnsimpson]
johnsimpson has joined #dnt
14:33:40 [rvaneijk]
yes,
14:33:41 [justin]
I sent ten minutes ago, will resend.
14:33:42 [AHanff]
difficult
14:33:48 [aleecia]
thank you Shane
14:33:52 [peterswire]
peterswire has joined #dnt
14:33:52 [jmayer]
Also, lots of paper shuffling etc.
14:33:55 [yianni]
Khaled: walking through process of de-identification
14:34:14 [johnsimpson_]
johnsimpson_ has joined #dnt
14:34:34 [aleecia]
um.
14:34:39 [johnsimpson]
johnsimpson has joined #dnt
14:34:39 [rvaneijk]
sounds off now
14:34:42 [efelten_]
efelten_ has joined #dnt
14:34:58 [yianni]
Khaled: walk through de-identification we have been using, context will be healthcare
14:35:10 [johnsimpson]
johnsimpson has joined #dnt
14:35:23 [yianni]
...agree on terminology and general approach to terminology
14:35:35 [yianni]
...basic process they have uses is five steps
14:35:40 [Wileys]
Bryan, I'm mostly with you there. The key element is what is definied as "personal"...
14:35:48 [yianni]
...assume we have health data set and want to release for secondary purpose
14:35:52 [robsherman]
robsherman has joined #dnt
14:35:55 [yianni]
...first step understand plausible attacks
14:36:00 [johnsimpson_]
johnsimpson_ has joined #dnt
14:36:03 [efelten_]
efelten_ has joined #dnt
14:36:04 [jmayer]
Where are these five steps sourced from?
14:36:07 [vinay]
vinay has joined #dnt
14:36:07 [yianni]
...second, understands variable that can be used
14:36:08 [Zakim]
+ +1.917.934.aahh
14:36:13 [vinay]
zakim, aahh is vinay
14:36:13 [Zakim]
+vinay; got it
14:36:19 [yianni]
...measure risks, appply de-identification
14:36:31 [yianni]
...Assume a public release ro releasing to a known data recipient
14:36:34 [efelten_]
efelten_ has joined #dnt
14:36:37 [johnsimpson]
johnsimpson has joined #dnt
14:36:39 [justin]
Put your email in chat if you want the slides.
14:36:43 [bryan]
In absence of the slides, can someone copy/paste the slide content into IRC?
14:36:50 [Wileys]
wileys@yahoo-inc.com
14:36:51 [aleecia]
aleecia@aleecia.com
14:36:53 [yianni]
...very different analysis, public have no controls, known recipient you can have controls and contracts
14:37:04 [vinay]
vigoel@adobe.com
14:37:07 [AHanff]
a.hanff@think-privacy.com
14:37:10 [johnsimpson]
johnsimpson has joined #dnt
14:37:17 [yianni]
...For known data recipient, you have three attacks
14:37:19 [vincent]
vincent.toubiana@alcatel-lucent.com
14:37:25 [yianni]
Chris: what type of attack?
14:37:28 [AHanff]
are we allowed to comment?
14:37:29 [aleecia]
ed@felten.com
14:37:34 [RichLaBarca]
rich@addthis.com please
14:37:43 [yianni]
Khaled: re-identification attack
14:37:48 [jmayer]
Slides answered, thanks.
14:37:55 [bryan]
got the slides, thanks
14:38:05 [AHanff]
so can we ask questions?
14:38:07 [robsherman]
q+
14:38:08 [justin]
q?
14:38:10 [dtauerbach]
q?
14:38:17 [hwest]
If you have questions, please queue yourself; I'll monitor the queue
14:38:21 [justin]
ack marc_
14:38:24 [justin]
ack robsherman
14:38:25 [Wileys]
Thank you Heather!
14:38:27 [AHanff]
q+
14:38:49 [hwest]
(Reminder: to put yourself in the queue, just type q+)
14:38:54 [johnsimpson]
johnsimpson has joined #dnt
14:38:57 [yianni]
Rob: information that is not being disclosed, storing information to make it de-identification, not planning to disclose?
14:39:16 [hwest]
ack AHanff
14:39:22 [Wileys]
+q
14:39:23 [AHanff]
typ[ing
14:39:30 [AHanff]
I am typing lol
14:39:31 [yianni]
Khaled: go through same steps if you release to data recipient or internally
14:39:35 [hwest]
AHanff, are you just on irc?
14:39:44 [hwest]
Go ahead and type your question and I'll convey
14:39:45 [hwest]
q+
14:39:46 [AHanff]
no I am on phone too but not on headset
14:40:06 [dtauerbach]
q+
14:40:09 [Wileys]
ack wileys
14:40:12 [peterswire_]
peterswire_ has joined #dnt
14:40:13 [yianni]
Shane: not mandating from a HIPAA perspective to de-identify, just for a risk management perspective, you would go through same process
14:40:17 [justin]
Slides went to list finally, available here: http://lists.w3.org/Archives/Public/public-tracking/2013Jan/0062.html
14:40:17 [johnsimpson_]
johnsimpson_ has joined #dnt
14:40:18 [robsherman1]
robsherman1 has joined #dnt
14:40:28 [aleecia]
Thank you Justin
14:40:29 [hwest]
q?
14:40:36 [yianni]
Khaled: contract, allow vendor to continue using the data, need to keep in de-identification manner
14:40:47 [peterswire]
peterswire has joined #dnt
14:40:58 [hwest]
AHanff, go ahead and type question
14:41:05 [yianni]
Peter: HiPAA puts limits on data uses even internally
14:41:05 [AHanff]
I would just like Khaled to acknowledge that known recipient doesn't guarantee confidentiality even with contractual observations. For example, i read recently that something like 90% of US medical authorities had data leaks in 2012, presumably contracts were in place...
14:41:24 [yianni]
Dan: clarifying, de-identification is a property of data?
14:41:30 [yianni]
...It is not a process
14:41:37 [johnsimpson_]
johnsimpson_ has joined #dnt
14:41:49 [yianni]
Khaled: in practice you manage the risk of re-identification, re-identification is one tool in the tool box
14:41:49 [efelten__]
efelten__ has joined #dnt
14:41:50 [hwest]
AHanff, feel free to share running comments as the presentation proceeds - they go in the record as well
14:41:56 [AHanff]
thanks
14:42:14 [johnsimpson_]
johnsimpson_ has joined #dnt
14:42:20 [dwainberg]
q+
14:42:24 [efelten_]
efelten_ has joined #dnt
14:42:25 [hwest]
ack hwest
14:42:28 [yianni]
Khaled: deliberate re-identifiation by data recipient, if company signs a contract, as a corporation that company will not try to re-identificy
14:42:28 [hwest]
ack David_MacMillan
14:42:36 [hwest]
ack dtauerbach
14:42:44 [jmayer]
q+
14:42:49 [robsherman]
robsherman has joined #dnt
14:42:50 [yianni]
...there may be rogue employees, but probability of company re-identifying would be acceptably low
14:42:54 [efelten__]
efelten__ has joined #dnt
14:43:02 [AHanff]
the evidence would suggest otherwise with so many data leaks surely?
14:43:05 [yianni]
...contracts are a good risk mitigating activity for first attack
14:43:09 [peterswire]
I am aware of the q; will be calling on them at a soon moment
14:43:23 [aleecia]
@AHanff, if you have a citation on the 90% figure, would you be so kind as to add that to the wiki?
14:43:27 [yianni]
...rogue employee re-identifying an ex spouse for example is dependent on internal company controls
14:43:37 [AHanff]
I will try and find it yes
14:43:48 [peterswire]
peterswire has joined #dnt
14:43:48 [yianni]
...first attack, as a company would you do it, do you have controls for rogue employees
14:43:51 [robsherman1]
robsherman1 has joined #dnt
14:43:52 [aleecia]
Thanks, that's higher than I'd heard
14:43:54 [efelten_]
efelten_ has joined #dnt
14:44:05 [yianni]
Peter: this is a risk management approach
14:44:14 [johnsimpson_]
johnsimpson_ has joined #dnt
14:44:16 [peterswire]
peterswire has joined #dnt
14:44:39 [yianni]
Khaled: most recent guidance of HHS is a risk management approach, UK Commissions also talk about risk management and context based
14:44:51 [hwest]
q?
14:44:52 [peterswire_]
peterswire_ has joined #dnt
14:44:54 [yianni]
...regulators approaching as a risk management exercise
14:44:57 [hwest]
ack dwainberg
14:45:02 [johnsimpson]
johnsimpson has joined #dnt
14:45:20 [yianni]
David: De-ID is not a binary state, it is rather a description of lower risk (Khaled probability)
14:45:30 [efelten__]
efelten__ has joined #dnt
14:45:30 [peterswire_]
peterswire_ has joined #dnt
14:45:48 [yianni]
Khaled: de-identification have been practiced for last 20 years, CDC, CMS, set thresholds along a continuim
14:45:55 [yianni]
...that is context dependent
14:46:12 [johnsimpson_]
johnsimpson_ has joined #dnt
14:46:13 [AHanff]
aleecia, it was a Ponemon study, there is an article here on it (will add to wiki) http://www2.idexpertscorp.com/press/report-94-of-us-hospitals-suffered-data-breaches-and-45-had-quintuplets/
14:46:13 [yianni]
David: helpful to talk about de-identification as a process and something else as a end goal?
14:46:30 [yianni]
Dan: still fair to share de-identification is a property of data
14:46:37 [Zakim]
+ +1.646.654.aaii
14:46:47 [yianni]
David: functional definitioin of de-identification is a function of the context, could be 20 different forms
14:46:57 [efelten_]
efelten_ has joined #dnt
14:47:01 [schunter]
schunter has joined #dnt
14:47:03 [robsherman]
robsherman has joined #dnt
14:47:08 [yianni]
Khaled: can be multiple de-id versions for the same data base, public versus trusted party
14:47:39 [yianni]
Peter: binary de-identified or not? Under HHS, counts at de-identified if overall risk is low.
14:47:57 [johnsimpson]
johnsimpson has joined #dnt
14:48:05 [peterswire]
peterswire has joined #dnt
14:48:15 [yianni]
Khaled: once you have a spectrum, and cut off in the middle, you turn it into a binary decision
14:48:29 [yianni]
Peter: de-identified is a conclusion term under some regime under some set of facts
14:48:30 [AHanff]
but the thresholds are not static, they move constantly depending on the amount of data aggregated about an individual
14:48:36 [peterswire]
peterswire has joined #dnt
14:48:38 [johnsimpson_]
johnsimpson_ has joined #dnt
14:48:47 [yianni]
...yes it is de-identified or no it is not, along the way there is a risk management regime
14:49:05 [dtauerbach]
q?
14:49:05 [yianni]
...de-identified right now is a conclusion term for a regime, we do not have that standard right now in dnt
14:49:13 [johnsimpson]
johnsimpson has joined #dnt
14:49:15 [yianni]
...does anyone else see it differently?
14:49:21 [RichLaBarca]
Zakim, q?
14:49:21 [Zakim]
I see jmayer on the speaker queue
14:49:33 [yianni]
Jeff: more accurate to sa a de-identified data set has been de-identified to a degree
14:49:44 [yianni]
Peter: more or less risk for re-identification
14:49:55 [johnsimpson]
johnsimpson has joined #dnt
14:50:05 [johnsimpson]
q?
14:50:16 [dwainber_]
dwainber_ has joined #dnt
14:50:17 [aleecia]
Thank you kindly, Alan. Report (rather than press coverage) available from: http://www2.idexpertscorp.com/ponemon2012/
14:50:18 [yianni]
David: disagree what is identified in the first place, what's de-identified and when, we will have disagreement
14:50:36 [johnsimpson_]
johnsimpson_ has joined #dnt
14:50:47 [yianni]
Ed: In a giving setting, you can ideally establish some scientific basis that risk is some ammount, you have a spectrum of risk
14:50:56 [yianni]
...then you are required to be somewhere on the spectrum
14:50:57 [AHanff]
I think it is important to note that there are no specific types of data which can guarantee non-re-identification, in fact it is never possible to guarantee non re-identification. Data minimisation can make it less likely, but the way these systems work is the data is always increasing not decreasing, which means the risk is continually increasing as the data resolution increases...
14:51:14 [yianni]
...starting point, scientific basis that data can be exploited with a certain probability
14:51:17 [johnsimpson]
johnsimpson has joined #dnt
14:51:28 [efelten__]
efelten__ has joined #dnt
14:51:34 [yianni]
Ed: risk analysis based on sound scientific analysis, not based on what you have done in the past
14:51:46 [yianni]
Chris: process of de-identification, and de-identified data
14:51:54 [johnsimpson]
johnsimpson has joined #dnt
14:52:21 [peterswire_]
peterswire_ has joined #dnt
14:52:21 [johnsimpson_]
johnsimpson_ has joined #dnt
14:52:25 [yianni]
Peter: defining what counts as de-identified sounds like normative stuff we are not agreeing on today, we are trying to develop language and ways to talk about things to have that conversation
14:52:42 [yianni]
Chris: we do not know the degree, we just know de-id is a thing, so lets talk about good pratice
14:52:54 [hwest]
q?
14:53:08 [johnsimpson_]
johnsimpson_ has joined #dnt
14:53:26 [yianni]
Paul: once you accept risk, then need to put tools on tables, what are the general uses
14:53:35 [yianni]
...then have conversation of what is an acceptable level of risk
14:53:37 [rvaneijk]
I agree with Ed. The goal is relevant. If you want to use the data for aggregation is different than trying to accomplish unlinkability
14:53:37 [aleecia]
q?
14:53:38 [johnsimpson]
johnsimpson has joined #dnt
14:53:48 [Chris_IAB]
Chris_IAB has joined #dnt
14:53:53 [aleecia]
ack jmayer
14:53:54 [Wileys]
AHanff -> I disagree, there are levels of de-identification/minimization that guarantee non-re-identification. For example, highly aggregated data sets or highly sparce raw data can both guarantee non-re-identification.
14:54:14 [johnsimpson_]
johnsimpson_ has joined #dnt
14:54:16 [efelten_]
efelten_ has joined #dnt
14:54:16 [yianni]
Jonathan: stick to substance, universe of attack slide, third bullet pont
14:54:27 [AHanff]
Wiley, show me the evidence to support that and I will show you a very famous event which shoots it down :)
14:54:46 [efelten__]
efelten__ has joined #dnt
14:54:48 [yianni]
...reasonably say that risk to some sort of data breach is a lot greater if you leave on street, if only CEO can see with contract
14:54:53 [peterswire]
peterswire has joined #dnt
14:55:01 [yianni]
...risk is much greater in former, shades of grey are the hard part
14:55:07 [Wileys]
3 people in the world viewed in the world viewed Yahoo.com at a specific moment in time yesterday - please tell me who those people are?
14:55:25 [Wileys]
Have fun AHanff (that's an example of a highly aggregated result)
14:55:31 [peterswire]
peterswire has joined #dnt
14:55:34 [yianni]
...very fact specific things, where real world challenges lie, can we reasonably estimate these sorts of attacks: being hacked, laptop out, rogue employee
14:55:37 [johnsimpson]
johnsimpson has joined #dnt
14:55:48 [yianni]
...if you can predict crime, we all have a much better use of time
14:55:51 [justin]
I don't think we need to argue about really-really-really-really hard to reidentify is technically impossible to reidentify. For purposes of this group, whatever you call that, it will suffice to constitute de-identified data.
14:55:59 [yianni]
Khaled: not predicting crime, but good approaches to manage risk
14:56:08 [AHanff]
Wiley, I am glad you chose a search engine, I refer you to the AOL search data which was used to identify anonymous users within 24 hours of being released for "research purposes"
14:56:15 [yianni]
...develop a series of cheak list to evaluate point of disclosure
14:56:19 [robsherman1]
robsherman1 has joined #dnt
14:56:22 [johnsimpson_]
johnsimpson_ has joined #dnt
14:56:24 [yianni]
...at the end of day, probabilities can be assigned
14:56:28 [AHanff]
far more anonymised than the data Yahoo has in their logs I should add :)
14:56:28 [Wileys]
Thank you Justin - I agree that there arguing absolutes in this case is not helpful - that was my point. :-)
14:56:32 [aleecia]
Justin - I think that's part of the question at hand
14:56:49 [Wileys]
AHanff - completed apple / orange comparison
14:56:52 [yianni]
...based in part on subjective estimates, but mixtures of different things
14:56:53 [Wileys]
completely
14:56:58 [AHanff]
no it isn't
14:56:59 [aleecia]
The AOL mess was *not* data aggregation
14:57:02 [johnsimpson]
johnsimpson has joined #dnt
14:57:13 [yianni]
...the overall answer is that you can do it in a defensible way
14:57:16 [justin]
The question at hand is how many "reallys" you need in front of "hard to reidentify"
14:57:17 [aleecia]
Shane is right on this one. The AOL mess was replacing one unique id with another.
14:57:18 [Zakim]
- +1.646.654.aaii
14:57:21 [felixwu]
felixwu has joined #DNT
14:57:38 [Wileys]
AHanff - AOL was row level specific data with consistent unique identifiers - my example was a highly aggregated result. Not the same
14:57:43 [efelten_]
efelten_ has joined #dnt
14:57:49 [AHanff]
3 people visiting Yahoo yesterday at specific time is not data aggregation either, server logs (probably replicated multiple times for backups across their dsitributed network) provide very exact data
14:57:51 [Zakim]
- +1.202.587.aabb
14:57:55 [yianni]
Khaled: deliberate re-id, inadvertent - recognize someone they know (a relative)
14:57:58 [robsherman]
robsherman has joined #dnt
14:58:09 [yianni]
...in health care setting, can measure probability that someone knows someone in the database
14:58:22 [johnsimpson_]
johnsimpson_ has joined #dnt
14:58:27 [hwest]
q?
14:58:29 [Mike_Nolet]
Mike_Nolet has joined #dnt
14:58:29 [peterswire]
peterswire has joined #dnt
14:58:46 [yianni]
...Ex. breast cancer, we know the prevalence of breast cancer and average number of friend, we can estimate the chance of inadvertent re-identification
14:58:55 [peterswire]
peterswire has joined #dnt
14:58:55 [johnsimpson_]
johnsimpson_ has joined #dnt
14:58:58 [robsherman1]
robsherman1 has joined #dnt
14:59:13 [yianni]
...Data breach, organization that loses data, we know that 27% of health care providers have one breach per year
14:59:23 [aleecia]
So wait: 27%, or 94%?
14:59:29 [yianni]
...there are bigger and smaller numbers, but 27% is the most defensive number
14:59:39 [efelten__]
efelten__ has joined #dnt
14:59:41 [johnsimpson]
johnsimpson has joined #dnt
14:59:48 [aleecia]
That's a rather large change of inputs here
14:59:49 [jmayer]
q+
14:59:56 [yianni]
...we can use the 27% number to assign probability
14:59:58 [Wileys]
What does breach have to do with de-identification? Those breaches are to purposely non-de-identified data.
15:00:00 [aleecia]
But not our problem, actually
15:00:15 [yianni]
...demonstration attack - adversary once to make a point, targeting high risk person
15:00:19 [efelten_]
efelten_ has joined #dnt
15:00:21 [johnsimpson]
johnsimpson has joined #dnt
15:00:22 [yianni]
...all you have to do is identify one person
15:00:26 [Wileys]
+1 to Aleecia
15:00:44 [johnsimpson_]
johnsimpson_ has joined #dnt
15:00:49 [peterswire]
I see jonathan; will call on soon
15:01:11 [yianni]
Khaled: Directly identifying variables, are the fields in HIPAA
15:01:16 [aleecia]
What I've learned: HIPPA's a mess. :-) But we may be able to find useful parts of HIPAA anyway as we sift through this, and it's useful to see what came before.
15:01:22 [efelten_]
efelten_ has joined #dnt
15:01:27 [AHanff]
q+
15:01:39 [johnsimpson_]
johnsimpson_ has joined #dnt
15:01:45 [johnsimpson]
q?
15:01:48 [yianni]
Peter: people may disagree what is directly identified and a quasi-identifier
15:01:55 [yianni]
Khaled: can be different based on context
15:02:10 [peterswire_]
peterswire_ has joined #dnt
15:02:10 [yianni]
...with names remove the names, randomize, generate pseudonyms
15:02:22 [hwest]
q?
15:02:24 [dtauerbach]
q?
15:02:40 [aleecia]
Shane -- I realize I don't know what problem you're trying to solve in your dataset. When you talk about not destroying the value, what value is it you're trying to preserve?
15:02:41 [Wileys]
+1 to generating pseudonyms as acceptable de-identification practice :-)
15:02:43 [johnsimpson]
johnsimpson has joined #dnt
15:03:04 [yianni]
Chris: quasi-identifiers, how about rangers, someone fits with a date range, or geo location? Address in HIPAA
15:03:13 [efelten__]
efelten__ has joined #dnt
15:03:16 [Wileys]
Aleecia - typically longitudinal analytical/research value
15:03:21 [yianni]
Khaled: HIPAA safe harbor, dates converted to years
15:03:42 [justin]
e.g., it's useful to know that a particular user went to Y!, then FB, then ESPN, etc.
15:03:48 [efelten_]
efelten_ has joined #dnt
15:03:48 [yianni]
...when you convert to ranges, you go to expert, you could potentially go to quarter of year or increase to 10 years
15:03:52 [jmayer]
q-
15:03:56 [dtauerbach]
q?
15:04:01 [johnsimpson_]
johnsimpson_ has joined #dnt
15:04:02 [Wileys]
Aleecia - You've already heard this conversation play out between Ed and I (and a few others) on the public email list. :-)
15:04:23 [johnsimpson_]
johnsimpson_ has joined #dnt
15:04:34 [aleecia]
Yes, I've heard and read more than I care to :-) But I couldn't remember what value you were looking for, just the disagreements
15:04:36 [yianni]
Khaled: if you doing anlytics treat as quasi identifiers, ex. software testings, you cannot get rid of fields, you just randomize
15:04:38 [AHanff]
my questions isn't on direct dientifiers
15:04:54 [rvaneijk]
q+
15:04:55 [AHanff]
my question is on the 27% figure
15:05:01 [peterswire]
peterswire has joined #dnt
15:05:02 [johnsimpson_]
johnsimpson_ has joined #dnt
15:05:10 [jmayer]
Aleecia - industry participants have never explained the value they hope to achieve in detail. It's one of the reasons we haven't made progress.
15:05:29 [yianni]
Khaled: in Ontario 220 John Smiths, people have common names.
15:05:31 [peterswire]
peterswire has joined #dnt
15:05:35 [Wileys]
Aleecia, outside of permitted uses, the core value sought is analytical (be able to learn and make changes).
15:05:37 [johnsimpson]
johnsimpson has joined #dnt
15:05:42 [yianni]
Ed: In practice every variable is a quasi identifier?
15:05:49 [Wileys]
Jonathan, I thought we had - not sure what more you're looking for.
15:05:52 [efelten__]
efelten__ has joined #dnt
15:06:01 [yianni]
Khaled: no not really
15:06:10 [yianni]
...example, blood pressure
15:06:12 [aleecia]
And you're likely to have a question now that can be answered from data 5 years ago? 2 years ago?
15:06:15 [rvaneijk]
would like to bridge to quasi identier to EU perspective... (queue)
15:06:26 [efelten_]
efelten_ has joined #dnt
15:06:26 [yianni]
Ed: blood pressure is better than gender
15:06:29 [aleecia]
My concern is that your answer there is you don't know
15:06:38 [yianni]
Khaled: what is the chance of adversary knowing your blood pressure
15:06:38 [aleecia]
Because, you likely cannot
15:06:51 [johnsimpson]
johnsimpson has joined #dnt
15:06:57 [efelten_]
efelten_ has joined #dnt
15:06:58 [yianni]
Ed: the odds my provider will know my blood pressure is high
15:07:00 [robsherman]
robsherman has joined #dnt
15:07:17 [Wileys]
Aleecia - some researchers at Yahoo! find tremendous value in long-term data as an indicator for near-term data - interesting learnings and value there.
15:07:23 [johnsimpson_]
johnsimpson_ has joined #dnt
15:07:30 [yianni]
Khaled: hospital can look at, and different controls to stop re-identification
15:07:57 [johnsimpson_]
johnsimpson_ has joined #dnt
15:07:58 [yianni]
Peter: how likely someone on outside has access to that information and how likely it is to be a match?
15:08:05 [robsherman]
robsherman has joined #dnt
15:08:14 [Wileys]
Aleecia - a simple example is spelling correction - due to the long tail of possible searches it can take many years to build enough data to predict outcomes for rare terms.
15:08:18 [rvaneijk]
is anyone monitoring the queue?
15:08:29 [yianni]
Ed: re-identification is connecting individual to information
15:08:41 [aleecia]
I'm sure there is. But if you pull back to a very simple view, you're suggesting that users ask for more privacy, Y! says they will provide more privacy, and then you will retain and study that user. That's a hard thing to explain to a user who just wants to be left alone.
15:08:41 [Wileys]
Rob, Peter said in IRC that he'd be coming to the queue soon but that was quite awhile ago
15:08:47 [johnsimpson]
johnsimpson has joined #dnt
15:08:49 [yianni]
Khaled: all laws protects identify disclosure, no laws protect attribute disclosure
15:09:10 [peterswire_]
peterswire_ has joined #dnt
15:09:17 [johnsimpson_]
johnsimpson_ has joined #dnt
15:09:24 [efelten__]
efelten__ has joined #dnt
15:09:44 [yianni]
...If I release data set and you get attribute disclosure, laws do not prohibit, its just statistics
15:09:45 [peterswire_]
peterswire_ has joined #dnt
15:09:48 [vincent]
Wileys, with the spelling correction example, high level aggregation and short term retention are not enough?
15:09:50 [dtauerbach]
q+
15:09:54 [Wileys]
Aleecia, I'd argue that once the data is deidentified that user is being left alone - we're now just using an unlinkable data point to improve our services. What are our rights in providing the free service? The most paranoid users need not use our services if we fairly call out that we use data in this way. Fair?
15:09:56 [efelten_]
efelten_ has joined #dnt
15:09:59 [johnsimpson]
johnsimpson has joined #dnt
15:10:01 [aleecia]
The spelling example is a nice one, thanks. I'm sure there are many, many others. I just don't know how to get you what you want while still actually honoring DNT
15:10:05 [yianni]
. . .Different governance mechanisms to manage attribute disclosure, but not what we are talking about today
15:10:23 [justin]
WileyS, not sure that's the best example. That's first party data that can be stripped of identifiers immediately without significantly diminishing value (like Google Flu Trends).
15:10:24 [johnsimpson]
johnsimpson has joined #dnt
15:10:24 [yianni]
Ed: arguably the most important aspect of privacy disclosure is not even covered?
15:10:43 [Wileys]
Vincent, not short-term retention (not enough volume on rare terms) - but data minimization and de-identification do accomplish the risk minimization goal
15:10:49 [schunter]
schunter has joined #dnt
15:10:52 [yianni]
Khaled: cannot predice inferences of data sets, but the more you control attribute disclosure you destroy data utility, best to manage with governance
15:10:55 [johnsimpson]
johnsimpson has joined #dnt
15:10:56 [AHanff]
Wileys - no absolutely not fair - first of all what right do you have to label privacy aware users as paranoid - secondly, are you therefore saying people who value privacy should be excluded from digital society?
15:11:10 [Wileys]
Justin, agreed - for that use case, that's a great de-identification approach.
15:11:17 [yianni]
Peter: direct identifiers (phone numbers), quasi identifiers (people on outside can make guesses)
15:11:30 [hwest]
q?
15:11:38 [johnsimpson_]
johnsimpson_ has joined #dnt
15:11:38 [robsherman1]
robsherman1 has joined #dnt
15:11:42 [aleecia]
I'm pretty sure that saying "we're honoring your request for privacy, but we're still logging everything you did and using it" isn't what users will consider fair. Which, to be clear, matters a lot more than what I think is fair.
15:11:44 [johnsimpson]
q?
15:11:47 [Wileys]
Justin, you do need to keep a few data elements around to help provide context (language, country of search, etc.)
15:12:00 [AHanff]
q-
15:12:03 [yianni]
...Third thing, attribute disclosure
15:12:09 [peterswire_]
I see the q
15:12:20 [Wileys]
Aleecia, I believe the de-identification removes the "you" in 'everything you did' in your statement
15:12:30 [peterswire]
peterswire has joined #dnt
15:12:32 [johnsimpson_]
johnsimpson_ has joined #dnt
15:12:46 [AHanff]
what you believbe is not what regulators and the general public believe, which I think is aleecias point
15:12:47 [aleecia]
Which is where you and Ed have gone many rounds, and I do disagree with your conclusions there.
15:12:48 [yianni]
Ed: list of hundred records and I know one is yours, and all have that dianosis, I know the attribute without actually identifying
15:12:57 [peterswire]
peterswire has joined #dnt
15:12:57 [justin]
WileyS, Right, that seems fair, but the re-ID risk seems almost impossibly low.
15:12:58 [yianni]
Joe: that's 100% , others are fuzzier
15:13:01 [peterswire]
attribute disclosure as an important distinction says ed felten
15:13:11 [yianni]
Ed: are we trying to protect against attribute disclosure?
15:13:19 [johnsimpson_]
johnsimpson_ has joined #dnt
15:13:35 [Wileys]
Justin - agreed, for that use cases - many other use cases aren't as clean cut - that's why its a good point to start there and go deeper.
15:13:36 [yianni]
Khaled: precedence in research world for attribute disclosure: IRB
15:13:40 [aleecia]
I do agree that there are ways to do aggregation to a level as to remove the "you." I do not think that replacing one unique identifier with another unique identifier (hashing) is going to remove the "you"
15:13:50 [yianni]
...restricts how you do studies, committee oversees
15:13:59 [johnsimpson]
johnsimpson has joined #dnt
15:14:03 [rvaneijk]
q-
15:14:05 [Wileys]
AHanff, could you please source your position? Regulator and general public studies?
15:14:09 [yianni]
...how mechanism to agree on type of interences you will permit, certain things would be off limits
15:14:16 [vincent]
Wileys, I though Yahoo removes rare term anyway? are there examples where yahoo is actually a third party?
15:14:28 [yianni]
Joe: risks to population of inference versus benefits?
15:14:50 [AHanff]
wileys, regulators, a29wp, eu commission, eu parliamentarians, members of public all people I have worked with and discussed these issues with over the past 6 years
15:14:51 [Wileys]
Aleecia, as long as there is no way back to the original user, then I believe the desired outcome has been met (no more 'you')
15:14:59 [robsherman]
robsherman has joined #dnt
15:15:03 [johnsimpson_]
johnsimpson_ has joined #dnt
15:15:03 [yianni]
KHaled: no legislative requirement to worry about attribute disclosure
15:15:05 [AHanff]
except you of course :)
15:15:21 [Wileys]
AHanff, very much an area of active disagreement - I agree that one extreme side of that debate equates to your position
15:15:30 [yianni]
Felix: We are concern about inferences of large number of people, but that is different than inferences about one particular person
15:15:40 [robsherman1]
robsherman1 has joined #dnt
15:15:40 [efelten__]
efelten__ has joined #dnt
15:15:40 [johnsimpson]
johnsimpson has joined #dnt
15:15:42 [peterswire]
person is in the group, and can draw inference about them -- attribute disclosure
15:15:46 [yianni]
Khaled: can draw inferences about group memberships, and you belong to that group
15:15:53 [Wileys]
Vincent, Yahoo! runs one of the largest 3rd party ad networks on the internet :-)
15:16:07 [AHanff]
well absolutely every person I have ever discussed these issues with apart from advertisers, is in that "extreme" - which would suggest that the extreme is actually your segment not mine ;)
15:16:13 [efelten_]
efelten_ has joined #dnt
15:16:14 [peterswire]
peterswire has joined #dnt
15:16:28 [yianni]
Felix: IRB - mitigates discriminating against large group, not concern about attribute disclosure to specific individual, even if group is not senstive
15:16:41 [peterswire]
q?
15:16:50 [yianni]
Khaled: depends on type of study and what harm that can happen to those individuals or at the group level
15:16:58 [Wileys]
AHanff - disagree - if everyone agreed with you then no one would be using online service supported by 3rd party advertising
15:17:00 [johnsimpson]
johnsimpson has joined #dnt
15:17:00 [robsherman]
robsherman has joined #dnt
15:17:07 [yianni]
Dan: Quasi-identifiers: why is not everything a quasi identifiers?
15:17:19 [efelten__]
efelten__ has joined #dnt
15:17:26 [johnsimpson]
johnsimpson has joined #dnt
15:17:27 [yianni]
Khaled: have to take into account probability that adversary will have information, some fields there are no probable path to get that information
15:17:29 [aleecia]
Shane - one of the evolutions we're watching is going from "we need to identify a user by name" as what counts for a "you" to "we need to be able to distinguish a single person" such that a GUID counts for a "you"
15:17:37 [AHanff]
Wiley's that is a completely invalid response - the VAST majority of digital citizens have no idea that any of this is going on and when they find out, they are outraged
15:17:42 [yianni]
...has to be information that is generally available
15:17:45 [AHanff]
there are countless examples to support that
15:17:46 [aleecia]
swapping one GUID for another doesn't actually advance privacy
15:17:53 [aleecia]
that's not fair -
15:17:56 [vincent]
Wileys, glade to hear :) but how is that related to my question? I was asking for examples of analytical/research that need pseudonymous data and where yahoo is involved as a third party, not a search engine
15:18:01 [aleecia]
doesn't advance it by much.
15:18:19 [Wileys]
Aleecia - GUID goes one step further than I'm suggesting as that implies it is still "linkable" in a production system.
15:18:27 [yianni]
Mike: What about the practical, how difficult is that inference? (large number of records)
15:18:38 [johnsimpson_]
johnsimpson_ has joined #dnt
15:18:42 [efelten]
efelten has joined #dnt
15:18:47 [Wileys]
Vincent, anything and everything to do with being a better ad network.
15:18:52 [aleecia]
That's what I was just correcting. I agree, there is a minor improvement there, but not enough as to practically matter much.
15:18:54 [yianni]
Khaled: depends on fields you have in data base, and how accurate would the inference be, never count against statistics
15:19:03 [dwainber_]
q?
15:19:16 [yianni]
...attribute disclosure has to be managed, cannot do so technically without destroying data
15:19:25 [Wileys]
AHanff, please reference studies of consumer "outrage"
15:19:27 [dtauerbach]
ack dtauerbach
15:19:31 [yianni]
...need to have different oversight, evidence so far that is what works
15:19:42 [hwest]
hwest has joined #dnt
15:19:43 [dwainber_]
q?
15:19:51 [peterswire]
peterswire has joined #dnt
15:19:56 [hwest]
hwest has joined #dnt
15:20:09 [yianni]
...In practice, you do not get all of the fields in data bases (focus on 6-10 fields), for longitudnal data, repeated over multiple visits
15:20:28 [yianni]
...surveys are more complicated, can deal with database with 100 quasi-identifiers
15:20:28 [johnsimpson_]
johnsimpson_ has joined #dnt
15:20:34 [aleecia]
Shane - let me do a thought experiment. I think we agree that if I got my hands on the raw server logs at Y! that would contain a set of "you"s, and not be non-identified.
15:20:36 [yianni]
Dan: only need to know one things
15:20:44 [AHanff]
Wileys I don't need too, they are there in the public eye - instagram, path, phorm, nebuad, facebook etc etc etc
15:20:49 [AHanff]
there is a new one just about every week
15:20:58 [yianni]
Khaled: chance of adversary knowing 5 things or 10 things, chance they know all 100 is very low
15:21:07 [johnsimpson]
johnsimpson has joined #dnt
15:21:42 [johnsimpson_]
johnsimpson_ has joined #dnt
15:21:45 [yianni]
...choose a number that is defensable (unlikely to know 30 fields)
15:22:16 [Wileys]
Aleecia, depends - if you're suggesting a de-identified data set, you'd find a one-way secret hashed identifier that has been truncated by 50% to purposely create noise (salt). So there is "an" identifier there - but it links to nothing in production systems.
15:22:44 [peterswire_]
peterswire_ has joined #dnt
15:22:44 [Wileys]
AHanff - thank you for the conversation, I have a good sense of your perspective and ability to defend your statements now.
15:22:46 [yianni]
Khaled: three types of risk
15:22:51 [johnsimpson]
johnsimpson has joined #dnt
15:23:02 [efelten_]
efelten_ has joined #dnt
15:23:06 [yianni]
...are you going to re-identify individual in data set, or are you going to match two databases
15:23:11 [AHanff]
You should talk to your colleague Justin before discounting my arguments, we know each other very well
15:23:16 [peterswire]
peterswire has joined #dnt
15:23:17 [yianni]
...are you considering maximum risk or average risk (very different)
15:23:25 [aleecia]
If you took that raw data over a year (nothing magic, just picking a specific example) and gave me one half of the data raw, and one half you had transformed by replacing GUIDs with your hashed id, I would be able to map between the raw and the hashed data sets.
15:23:29 [yianni]
...when talking about demonstration attack worry about mximum risk
15:23:44 [yianni]
...with inadvertent, you can you use average risk
15:23:53 [johnsimpson]
johnsimpson has joined #dnt
15:23:53 [yianni]
...what are the appropriate thresholds?
15:23:57 [aleecia]
So when you say there is no link to the production system, I disagree.
15:24:00 [Wileys]
Aleecia - we keep the datasets completely separate with strict access controls, policy, training, etc. - you wouldn't get both.
15:24:26 [AHanff]
oh my, how many times have I head that one and then seen humble pie served lol
15:24:26 [yianni]
...In practice, the highest risk used is .33 to as low as .05
15:24:28 [aleecia]
A different and possibly useful approach, but they *are* linked.
15:24:29 [Wileys]
But that is our risk to manage since we make the statement the data is deidentified.
15:24:30 [AHanff]
heard*
15:24:33 [johnsimpson_]
johnsimpson_ has joined #dnt
15:24:44 [efelten_]
efelten_ has joined #dnt
15:24:48 [yianni]
...No one releases data with a risk higher than .33, increased precedence for other values
15:25:05 [johnsimpson]
johnsimpson has joined #dnt
15:25:19 [yianni]
...practical range (court cases, regulatory authorities), choose one of four: .33, .2, .09, .05
15:25:28 [johnsimpson]
johnsimpson has joined #dnt
15:25:32 [yianni]
...no scientific way to choose value, based on past use and changed over time
15:25:50 [hwest]
q?
15:25:58 [yianni]
....09 and .05 are used in public disclosure
15:25:59 [peterswire]
peterswire has joined #dnt
15:26:11 [aleecia]
There might exist something in there I could reluctantly live with while really not liking. :-) (And there might not.) What I'll put my body on the tracks for is the idea that you could then publicly release that data.
15:26:13 [yianni]
.33 and .2 are for releases to trusted business partner
15:26:21 [johnsimpson_]
johnsimpson_ has joined #dnt
15:26:28 [yianni]
...these thresholds are to protect against demonstration attack
15:26:30 [Chris_IAB]
Has this deck (being presented currently) been placed into the W3C record?
15:26:42 [justin]
Chris_IAB, it's in the mail archives.
15:26:44 [yianni]
...all known attacks have been conducted by academic and media
15:26:46 [Wileys]
Aleecia - we have yet another de-identification process for data we release to researchers - so I absolutely agree with you!
15:26:49 [dwainber_]
q+
15:27:06 [yianni]
...this is maximum risk, no one has a higher risk of re-identification than the level
15:27:07 [johnsimpson]
johnsimpson has joined #dnt
15:27:11 [Wileys]
Chris, it went out to the public mailing list so its now recorded.
15:27:38 [yianni]
...In practice, these numbers are conservative: data changes, imperfect data cause errors
15:27:55 [yianni]
...the numbers used are ceilings on risk, real risk are lower
15:27:57 [aleecia]
Shane - could you describe the de-identification for researchers?
15:28:19 [johnsimpson_]
johnsimpson_ has joined #dnt
15:28:35 [yianni]
...Cell sizes: 3, 5, 11, 20
15:28:56 [yianni]
...the smallest cell sizes (population cell sizes), may be smaller in a sample
15:29:14 [johnsimpson_]
johnsimpson_ has joined #dnt
15:29:20 [Zakim]
+ +1.215.286.aajj
15:29:22 [yianni]
...If you create a population with cell size of 5, you can take a cample and have a lower cell size
15:29:29 [peterswire]
peterswire has joined #dnt
15:29:37 [yianni]
...number of individuals with same cell of quasi identifiers
15:29:44 [yianni]
Ed: have to assume quasi identifiers
15:29:48 [johnsimpson]
johnsimpson has joined #dnt
15:29:53 [justin]
q?
15:30:01 [peterswire]
peterswire has joined #dnt
15:30:01 [yianni]
Khaled: only a small subset of variables in data set are quasi identifiers
15:30:19 [Wileys]
Aleecia - it varies based on the nature of the dataset but general attributes are: older data, no identifiers, data sets highly numerized (example, instead of showing actual category of music, we show only a number representing a category but give no information to provide context for that category).
15:30:49 [yianni]
David: with a cell size of 11, there is a 9% probablility of a record being re-identified?
15:30:51 [johnsimpson]
johnsimpson has joined #dnt
15:31:10 [yianni]
...any single record or one record out of the whole?
15:31:11 [johnsimpson_]
johnsimpson_ has joined #dnt
15:31:36 [moneill2]
moneill2 has joined #dnt
15:31:48 [yianni]
Jeff: are 9% of the records identifiable? Public databases have 9% chance of re-identification.
15:31:57 [johnsimpson]
johnsimpson has joined #dnt
15:32:27 [johnsimpson_]
johnsimpson_ has joined #dnt
15:32:36 [yianni]
Peter: there has never been a re-identification of properly de-identified database, but 9% risk?
15:32:40 [Zakim]
+[IPcaller.a]
15:32:58 [Wileys]
+q
15:33:02 [peterswire]
peterswire has joined #dnt
15:33:04 [johnsimpson]
johnsimpson has joined #dnt
15:33:07 [yianni]
Joe: demonstration attack on HHS database de-identified?
15:33:33 [peterswire_]
peterswire_ has joined #dnt
15:33:36 [johnsimpson]
johnsimpson has joined #dnt
15:33:36 [moneill2]
zakim, [ipcaller] is me
15:33:36 [Zakim]
+moneill2; got it
15:33:45 [yianni]
Khaled: the hit rate of re-identification are much lower that those values, never have been able to re-identify at a rate higher than the threshold.
15:34:08 [peterswire]
peterswire has joined #dnt
15:34:10 [johnsimpson]
johnsimpson has joined #dnt
15:34:24 [yianni]
Felix: if you start guessing, you will be right 9% of time, do I care if I know?
15:34:37 [peterswire]
peterswire has joined #dnt
15:34:52 [yianni]
Rob: if I were to guess randomly, I would get some right randomly
15:34:54 [johnsimpson]
johnsimpson has joined #dnt
15:35:10 [jmayer]
q+
15:35:20 [johnsimpson]
johnsimpson has joined #dnt
15:35:21 [peterswire_]
peterswire_ has joined #dnt
15:35:27 [yianni]
Felix: you would not know you are right, but you could guess 9%.
15:35:28 [jmayer]
This is assuming complete l-diversity among the group?
15:35:44 [aleecia]
Shane - that sounds a lot closer to what would be reasonable to provide to users who turn on DNT
15:35:49 [johnsimpson]
johnsimpson has joined #dnt
15:35:50 [peterswire]
peterswire has joined #dnt
15:36:10 [hwest]
ack dwainber_
15:36:14 [yianni]
Khaled: with unlimited resources, they could verify, but expensive
15:36:17 [johnsimpson]
johnsimpson has joined #dnt
15:36:53 [yianni]
Khaled: how do you choose one of four values?
15:36:59 [johnsimpson]
johnsimpson has joined #dnt
15:37:15 [yianni]
...public you use .05 or .09. If not public, you look at a number of other factors
15:37:16 [mnolet]
mnolet has joined #dnt
15:37:26 [yianni]
...if company have good controls, not as worried about a rogue employee
15:37:34 [johnsimpson]
johnsimpson has joined #dnt
15:37:38 [dtauerbach]
i think the wifi in the room isn't great, i suspect that's the reason
15:37:41 [yianni]
David: do you look at sensitivity of data?
15:37:41 [justin]
We'll see what we can do during the break.
15:37:46 [johnsimpson]
I am not doing anything.. Don't know why it is happening
15:38:12 [johnsimpson_]
johnsimpson_ has joined #dnt
15:38:17 [yianni]
Khaled: three things to look at: sensitivity, potential harm, and consent
15:38:20 [peterswire]
peterswire has joined #dnt
15:38:32 [yianni]
...motives managed with contract
15:38:44 [yianni]
...with academics and journalist motive to re-identify
15:38:44 [peterswire]
peterswire has joined #dnt
15:38:53 [johnsimpson]
johnsimpson has joined #dnt
15:39:02 [yianni]
...they are check lists for doing this process.
15:39:16 [yianni]
...need a repetable process to evaluate all of the factors
15:39:36 [yianni]
Chris: is there ever a scenario that there is zero risk if you release data?
15:39:37 [johnsimpson]
johnsimpson has joined #dnt
15:39:46 [yianni]
Khaled: no
15:40:07 [jmayer]
...but there are systems that can give rigorous bounds on risk if you release data.
15:40:12 [yianni]
Peter: threat models, why would someone attack here, how capable (money, show your smart)
15:40:31 [johnsimpson_]
johnsimpson_ has joined #dnt
15:40:31 [yianni]
...might be commercial reasons, upset employees, think of all the reasons why people might attack
15:40:51 [yianni]
...why do we care here, what are the harms, are they very sensitive
15:41:11 [johnsimpson_]
johnsimpson_ has joined #dnt
15:41:23 [Wileys]
Aleecia - I understand that are your perspective of what DNT should mean - as you know I disagree with that position and would interpret a DNT to mean something different (no profiling, not 'no analytics')
15:41:24 [yianni]
...different values of invasion of privacy: complete browsing history available to FBI may upset some advocates
15:41:29 [aleecia]
I don't think the FBI is the worst thing possible - we operate in an international climate
15:41:36 [peterswire]
peterswire has joined #dnt
15:41:44 [yianni]
...other specturm: not a big deal, no one would care about browsing, little harm or risk around it
15:41:54 [johnsimpson_]
johnsimpson_ has joined #dnt
15:42:04 [yianni]
...assume different views on invasion of privacy.
15:42:14 [yianni]
...Left slide of slide: mitigating controls
15:42:24 [robsherman1]
robsherman1 has joined #dnt
15:42:30 [yianni]
...lot of discussion on de-identification have been on publically disclosed databases
15:42:32 [johnsimpson]
johnsimpson has joined #dnt
15:42:52 [yianni]
...if you post on internet, smart people will attack, that is purely technical protection
15:43:11 [yianni]
...most of the stuff we are talking about is different: secret databases, set of administrative controls
15:43:14 [johnsimpson]
johnsimpson has joined #dnt
15:43:30 [yianni]
...privacy act talks about technical, administrative and physical safeguards
15:43:36 [johnsimpson]
johnsimpson has joined #dnt
15:43:39 [aleecia]
Shane - we started this with the idea that DNT would limit collection of data. If we actually did that, I'd relax in other areas. But right now we're talking about no reduction in collection at all. My fear is that we build a system that is deceptive :-)
15:43:44 [efelten]
efelten has joined #dnt
15:43:56 [yianni]
...that is how a lot of the data protections take place today
15:44:14 [hwest]
q?
15:44:17 [aleecia]
When I talk to users, their main concern is not profiling, it's the data collection itself
15:44:17 [johnsimpson]
johnsimpson has joined #dnt
15:44:24 [Wileys]
Aleecia - as long as we're clear with users and the world on exactly what DNT means and how data will be handled then we won't be deceptive
15:44:25 [aleecia]
And we're not going to help them with that
15:44:27 [yianni]
...all the different variables would feed into how we think about de-identification
15:44:30 [Wileys]
ack wileys
15:44:46 [efelten_]
efelten_ has joined #dnt
15:44:50 [johnsimpson]
johnsimpson has joined #dnt
15:44:59 [peterswire]
peterswire has joined #dnt
15:45:00 [robsherman1]
q?
15:45:02 [aleecia]
ack jmayer
15:45:19 [johnsimpson]
q?
15:45:29 [peterswire]
peterswire has joined #dnt
15:45:30 [yianni]
Jonathon: factors that could contribute to or mitigate risk, but no way to eliminate risk
15:45:32 [aleecia]
Shane - I agree that being clear is necessary. I disagree that it is sufficient
15:45:43 [yianni]
...we do have ways to put rigorous bounds on risk develop by computer scientist
15:45:57 [AHanff]
with respect privacy and data protection as not the same thing. Privacy rights don't exist merely to manage risk, there are rights based around people's desire to lead a private life. So it is irrelevant to say that if data is de-identified it is ok because there is no risk, people have a right (under law in Europe and elsewhere) to refuse to have that data collected in the first place.
15:46:00 [yianni]
...we can determine just how much the best adversary can accomplish
15:46:03 [aleecia]
If we carefully document that DNT does nothing at all, that's not sufficient :-)
15:46:09 [johnsimpson]
johnsimpson has joined #dnt
15:46:32 [Wileys]
AHanff, you're overstating EU law
15:46:36 [johnsimpson]
johnsimpson has joined #dnt
15:47:00 [johnsimpson_]
johnsimpson_ has joined #dnt
15:47:01 [AHanff]
actually no I am not, would you like me to quote it verbatim, I worked on it so I know it pretty well...
15:47:04 [yianni]
...techniques for rigorous bounds: differential privacy, body of writing on developing advertising analytics without following users around
15:47:12 [Wileys]
Aleecia, so we agree on being clear, we disagree on the level of data "scrubing" that comes with a DNT signal. Progress... :-)
15:47:20 [Zakim]
+SusanIsrael
15:47:26 [yianni]
...lets make marginal gains, some are more rigorously oriented
15:47:28 [susanisrael]
susanisrael has joined #dnt
15:47:32 [efelten_]
efelten_ has joined #dnt
15:47:35 [johnsimpson_]
johnsimpson_ has joined #dnt
15:47:35 [justin]
There was disagreement that we should be clear before?
15:47:37 [aleecia]
I think you're even agreeing that being clear is not all that's needed
15:47:48 [jmayer]
s/lets make/some propose/
15:47:57 [Wileys]
AHanff, please share EU case law that supports your position - not your subjective interpretation of the written law.
15:47:59 [jmayer]
q+
15:48:03 [jmayer]
q-
15:48:09 [Wileys]
Aleecia - agreed :-)
15:48:09 [yianni]
Khaled: the managing risk slide is operational
15:48:27 [Zakim]
-[IPcaller.a]
15:48:31 [johnsimpson]
johnsimpson has joined #dnt
15:48:34 [peterswire_]
peterswire_ has joined #dnt
15:48:41 [aleecia]
breakfast time, yay
15:48:42 [Zakim]
- +1.215.286.aajj
15:48:52 [Zakim]
-Aleecia
15:48:56 [Zakim]
-vincent
15:49:00 [johnsimpson]
johnsimpson has joined #dnt
15:49:03 [peterswire]
peterswire has joined #dnt
15:49:23 [schunter]
schunter has joined #dnt
15:49:46 [dwainberg]
dwainberg has joined #dnt
15:50:00 [johnsimpson_]
johnsimpson_ has joined #dnt
15:50:23 [robsherman1]
robsherman1 has joined #dnt
15:50:34 [johnsimpson_]
johnsimpson_ has joined #dnt
15:51:02 [Zakim]
-rvaneijk
15:51:04 [robsherman]
robsherman has joined #dnt
15:51:04 [johnsimpson]
johnsimpson has joined #dnt
15:51:45 [johnsimpson]
johnsimpson has joined #dnt
15:51:50 [peterswire]
peterswire has joined #dnt
15:52:16 [johnsimpson]
johnsimpson has joined #dnt
15:52:16 [susanisrael]
zakim, aajj is susanisrael
15:52:16 [Zakim]
sorry, susanisrael, I do not recognize a party named 'aajj'
15:52:17 [peterswire]
peterswire has joined #dnt
15:52:40 [susanisrael]
zakim, 215 286 aajj is susanisrael
15:52:40 [Zakim]
I don't understand '215 286 aajj is susanisrael', susanisrael
15:53:02 [johnsimpson_]
johnsimpson_ has joined #dnt
15:53:23 [susanisrael]
npdoty can you help me advise zakim that my phone number is 215 286 aajj
15:53:42 [Zakim]
- +1.646.722.aagg
15:54:02 [johnsimpson_]
johnsimpson_ has joined #dnt
15:54:25 [johnsimpson__]
johnsimpson__ has joined #dnt
15:54:55 [johnsimpson]
johnsimpson has joined #dnt
15:55:02 [peterswire]
peterswire has joined #dnt
15:55:12 [Zakim]
+[IPcaller]
15:55:23 [johnsimpson]
johnsimpson has joined #dnt
15:55:32 [moneill2]
zakim, [IPCaller] is me
15:55:32 [Zakim]
+moneill2; got it
15:55:52 [johnsimpson_]
johnsimpson_ has joined #dnt
15:55:56 [susanisrael]
zakim, [215 286 aajj] is me
15:55:56 [Zakim]
I don't understand '[215 286 aajj] is me', susanisrael
15:55:58 [robsherman]
robsherman has joined #dnt
15:56:00 [johnsimpson_]
test
15:56:09 [efelten]
efelten has joined #dnt
15:56:39 [Zakim]
-SusanIsrael
15:56:40 [johnsimpson_]
Shane, problem was the network we were on. Changed network.
15:56:53 [efelten_]
efelten_ has joined #dnt
15:56:54 [Paul]
Paul has joined #DNT
15:56:57 [robsherman1]
robsherman1 has joined #dnt
15:56:58 [johnsimpson_]
hope this is stediar
15:56:58 [susanisrael]
npdoty: can you help me communicate with zakim about my phone number? i don't seem to have the syntax right.
15:57:18 [johnsimpson__]
johnsimpson__ has joined #dnt
15:57:30 [Wileys]
John - that didn't seem to do the trick
15:57:42 [vincent]
vincent has joined #dnt
15:57:58 [Wileys]
Hard to follow anything on IRC today with so many connect/disconnect events being thrown up.
15:58:09 [johnsimpson_]
johnsimpson_ has joined #dnt
15:58:36 [peterswire_]
peterswire_ has joined #dnt
15:58:43 [robsherman]
robsherman has joined #dnt
15:58:49 [Zakim]
+??P24
15:59:09 [peterswire_]
peterswire_ has joined #dnt
15:59:10 [Zakim]
+rvaneijk
15:59:16 [dwainber_]
dwainber_ has joined #dnt
15:59:33 [vincent]
zakim, ??P24 is vincent
15:59:33 [Zakim]
+vincent; got it
15:59:35 [yianni]
Peter: Mike had comment on last slide
15:59:40 [Zakim]
+SusanIsrael
15:59:44 [JoeHallCDT]
ok, how do I scribe nick me?
15:59:56 [yianni]
Scribe: JoeHallCDT
15:59:58 [justin]
scribenick: joehallcdt
16:00:06 [robsherman]
robsherman has joined #dnt
16:00:14 [moneill2]
cookies are not anonymous, they pinpoint an individual/device
16:00:18 [Chris_IAB]
Chris_IAB has joined #dnt
16:00:19 [hwest]
hwest has joined #dnt
16:00:36 [JoeHallCDT]
scribe: JoeHallCDT
16:00:49 [jeffwilson]
jeffwilson has joined #dnt
16:00:52 [robsherman1]
robsherman1 has joined #dnt
16:01:55 [robsherman]
robsherman has joined #dnt
16:02:00 [peterswire]
peterswire has joined #dnt
16:02:38 [JoeHallCDT]
q?
16:02:52 [JoeHallCDT]
Peter: we're not going to debate how strict a standard is
16:02:59 [JoeHallCDT]
… let's imagine a three-step model
16:03:20 [JoeHallCDT]
… super strict standard for De-ID, a middle ground and no de-ID
16:03:27 [justin]
Speaker was Mike Nolet from AppNexus
16:03:34 [mnolet]
mnolet has joined #dnt
16:03:39 [JoeHallCDT]
thx
16:04:07 [dwainber_]
q+
16:04:26 [felixwu]
felixwu has joined #DNT
16:04:32 [JoeHallCDT]
… there are choices for businesses to give up a de-ID'd approach if the cost is too high
16:04:45 [JoeHallCDT]
Mike Nolet: it's not as much cost as competition
16:04:55 [JoeHallCDT]
… some companies are getting into thrid party advertising
16:05:50 [moneill2]
identifiers in cookies are PII in Europe
16:06:06 [jmayer]
q+
16:06:09 [JoeHallCDT]
Mark Groman: truly believe that the standard we're discussing that will have unintended consequences
16:06:20 [JoeHallCDT]
… some of the things we propose may have a net-negative impact on privacy
16:06:24 [susanisrael]
*Joehallcdt if you want me to scribe let me know
16:06:25 [jmayer]
So, about that de-identification topic...
16:06:42 [JoeHallCDT]
… the notion that opt-in consent is all that's needed to over-collect
16:07:07 [JoeHallCDT]
Peter: we did start with a discussion of incentives for de-ID
16:07:18 [JoeHallCDT]
… one was compliance with NAI, etc, codes
16:07:20 [moneill2]
You have to say what data you gather and what you intend to do with it to get consent
16:07:27 [justin]
The FTC sees cookies and IP addresses as "personal information" as well. All information is personal, but some is more personal than others.
16:07:29 [robsherman1]
robsherman1 has joined #dnt
16:07:30 [dwainberg]
dwainberg has joined #dnt
16:08:06 [justin]
There is a value in incentivizing companies to keep data at pseudonymous instead of real-name idenifiers.
16:08:08 [JoeHallCDT]
gills (?): if we follow de-ID as a privacy protective tool, we can't say that a cookie is PII
16:08:27 [efelten]
There is no notion of PII in this standard.
16:08:31 [justin]
But this is somewhat off topic.
16:08:40 [JoeHallCDT]
… you've created an incentive to create PII databases
16:09:10 [JoeHallCDT]
… PII should matter, if you value de-ID as a way to break the link to the individual
16:09:19 [JoeHallCDT]
Chris Mejia: agrees with Jonathan!
16:09:36 [jmayer]
q-
16:09:44 [dtauerbach]
dtauerbach has joined #dnt
16:09:47 [JoeHallCDT]
… we are supposed to do good practices for de-ID and I want to do that.
16:09:51 [JoeHallCDT]
q?
16:10:02 [susanisrael]
*joehallcdt you had marc groman and paul glist speaking before chris iab
16:10:17 [JoeHallCDT]
Peter: has not had that focus, wants to have comon language
16:11:12 [susanisrael]
sribenick: susanisrael
16:11:32 [susanisrael]
peter swire: let's start talking about hashing
16:11:50 [justin]
DNT was proposed as a solution to address psuedonymous third party tracking. I don't think we're going to walk away from that idea at this point.
16:11:58 [susanisrael]
khaled: understand that hashing was discussed as a way to protect against cookies or other unique identifiers
16:12:27 [susanisrael]
...if you are hashing without salting, can easily be broken and recover say ss#, so plain hashing not recommended
16:13:01 [Wileys]
This makes sense for sharing data externally but not for internal storage of data
16:13:01 [susanisrael]
...if you have [something] that can be added to your value....but challenge for distributed system with salt, you don't want to distribute salt to everyone
16:13:16 [susanisrael]
....have to come up with protocol where salting happens at central location.
16:13:29 [susanisrael]
[someone] need to know who can hash
16:13:36 [susanisrael]
[who was speaking?]
16:13:53 [dtauerbach]
efelten
16:13:54 [efelten]
s/[someone]/efelten/
16:13:58 [susanisrael]
khaled: one alternative is to use public keys that you can distribute and have encrypted value done say within browser
16:14:05 [susanisrael]
...instead of hashing you encrypt
16:14:19 [Zakim]
+Aleecia
16:14:32 [susanisrael]
...other consideration even with salted values is that you can have frequency attacks...certain names more common...can guess.
16:15:05 [susanisrael]
....so can recover names by looking at frequency. even ss#s. so salting not adequate where there is frequency distribution
16:15:32 [susanisrael]
.....with encryption [?] would do it differently each time, frequency not an issue
16:15:59 [peterswire]
peterswire has joined #dnt
16:16:05 [susanisrael]
.....to the extent its a problem certain fields may be too long to process or transmit [with encryption?].....
16:16:37 [susanisrael]
...so for example you can get encrypted ss# with same character set as actual ss# so you avoid long strings. sometimes practical advantage
16:17:17 [susanisrael]
peter swire: have some observations: lots of hashing in commercial ecosystem. heard yesterday at hhs that unsalted ss# not ok bc easy to do dictionary attack
16:17:25 [Wileys]
Good resource on the technical and security details in this area: http://crackstation.net/hashing-security.htm
16:17:33 [susanisrael]
.....turning to ed, you have expressed cautions re: hashing.
16:17:50 [susanisrael]
ed felten: different scenarios in which hashing fails. doesn't do much without salt.
16:18:19 [susanisrael]
...even with salted hash someone who knows the salt can generally break it or someone who can cause salted function to be evaluated on their behalf.
16:18:41 [susanisrael]
....gives example where you ask one server to compute hash on another. [simplified]
16:18:47 [rvaneijk]
A hash turns user data into a pseudonymous identifier
16:19:10 [susanisrael]
...if multiple records contain same salted hash value they can be linked. need to use probablistic encryption or something like that
16:19:18 [susanisrael]
chris iab: there is hashing then access to salt
16:19:48 [Wileys]
We should discuss keyed hashes as being superior to salted hashes (although in the same universe)
16:19:52 [susanisrael]
ed felten: not just access to salt. if you have value hash then you can do same dictionary attacks as if you knew salt so not enough to ask if you know salt
16:20:17 [susanisrael]
ed felten: can make sophisticated argument .....rare case where hashing is secure
16:20:36 [susanisrael]
peter swire: assume people will use hashing and will be long enough not to be broken
16:20:43 [susanisrael]
chris iab: how reliable?
16:20:44 [Wileys]
One-way hashes don't allow direct reverse identification by themselves - access to the salt/key allows someone to perform a dictionary attack
16:21:03 [susanisrael]
ed felten: if you can have hash computed for you just the same as if you can break it
16:21:06 [Wileys]
Requires access to the original raw data (if it still exists) and the salt/key
16:21:08 [susanisrael]
what are we hashing?
16:21:30 [rvaneijk]
In the EU organizational measures are not enough to make hashed values of user data anonymous.
16:21:38 [susanisrael]
someone [who is speaking?]: will use admin controls with hashing
16:21:59 [susanisrael]
ed: if you can make up inputs and ask people to hash them that is just as good as if you had the salt
16:22:11 [susanisrael]
someone: but that is form knowing input and output
16:22:34 [Wileys]
Rob, if paired with administrative, technical, and policy/educational, then keyed hashing is considered enough to reach the point of "likely reasonable" to no longer be personal data (de-identified), correct?
16:22:47 [susanisrael]
ed felten: what if you take value with identifier and cookie, ask someone to make salted hash, don't tell you the salt, but put it back in your data base
16:22:51 [Wileys]
Rob, add "safeguards" after "policy/educational"
16:23:12 [susanisrael]
someone: but that assumes you know input and output
16:23:25 [rvaneijk]
shane: if you throw away the key, then yes. TomTom was a nice example.
16:23:38 [susanisrael]
peter swire: i have observed lots of hashing in ad world. for most sophisticated attackers they may be able to break them
16:24:03 [susanisrael]
...we will eventually have to come to view of how we will discuss all this. so common hashes might be of email address? cookie value?
16:24:41 [Wileys]
Rob, if you keep the key in a safeguarded location, limited access, technical controls, etc. - I believe you still reach the bar per the A29WP Option from April 2011.
16:24:45 [susanisrael]
peter swire: let's take email addresses. if my email is hashed using proper salt, and someone gets output, they can eventually figure out hash and salt
16:24:58 [Wileys]
Rob, or was that 2010 - I'll look it up.
16:25:12 [susanisrael]
ed felten: can ask that hash be done on known value, and record hashed value in database then can correllate
16:25:25 [rvaneijk]
Well, that safeguard is a very high bar, ie a notary, who has a legal obligation to not disclose
16:25:29 [susanisrael]
[someone] qu is from whom you are trying to secure the data
16:25:38 [Wileys]
Rob, I agree throwing away the key is an absolute end-point, but I'm aiming for the 'likely reasonable' standard
16:25:44 [susanisrael]
is it protection at all wrt a particular party that has particular data
16:26:35 [susanisrael]
david w. not hashing for hashing's sake. need to figure out from whom you are trying to protect the data from, and tailor approach to that
16:26:38 [rvaneijk]
Shane, the point is, that if I should not be able to calculate a hash after let's say a year, and expect the same output, such that users can be re-identified.
16:26:53 [rvaneijk]
s/if/_/
16:27:09 [susanisrael]
khaled: even if we go back to previous model using hash or salted hash, probability of recovering original value is 1, certain
16:27:30 [Wileys]
Rob, why? As long as the original key is secure, then there is very low risk of user re-identification
16:27:31 [robsherman]
robsherman has joined #dnt
16:27:34 [aleecia]
Rob, is that an art 29 position, or your own? (Both are valuable, I'm just trying to get which is what)
16:27:36 [susanisrael]
chris iab: assuming you have access to data in first place, right?
16:27:55 [susanisrael]
khaled: so final result at end of all risk assessment is still high, still has to be further mitigated
16:28:05 [Wileys]
Aleecia, the A29WP position in the opinion paper is not as strict as Rob is stating (in my opinion)
16:28:06 [vincent]
Wileys, in the DNT case, are we just considering hashing cookie IDs? if so, I'm not sure it brings any real protection: cookie IDs are opaque anyway
16:28:09 [susanisrael]
peter swire: let's see why people might feel strongly
16:28:41 [susanisrael]
...if db is publicly accessible and people can get access then probability of breaking is higher, but david and chris are saying you can limit access
16:29:09 [Wileys]
Vincent, keyed hashing coupled with other measures, as well as the cessation of certain business activities (profiling), does meet the goals of DNT in my opinion.
16:29:15 [susanisrael]
.[someone]..but ed is saying if you have access to hash and salt -if disconnected doesn't work
16:29:28 [yianni]
Jeff Wilson
16:29:46 [peterswire_]
peterswire_ has joined #dnt
16:29:56 [susanisrael]
david w: i think what we are talking about is that using some form of oneway hash was a useful method of de-identifying
16:30:21 [susanisrael]
khaled: depends. must be done in such a way that you can protect against attacks ed is describing which are quite trivial
16:30:26 [vincent]
Wileys, well that's not my question :). What type of protection does it bring with regard to the risk of re-identifiication?
16:30:37 [susanisrael]
david and khaled back and forth a bit
16:30:55 [yianni]
q?
16:30:56 [rvaneijk]
Shane, let's have this discussion in Boston
16:31:06 [susanisrael]
khaled: probability that someone attempts to attack, then that they can break hash
16:31:13 [robsherman1]
robsherman1 has joined #dnt
16:31:25 [Wileys]
Vincent, as long as the original data is not accessible and neither is the key to the hash, then there is very low risk of re-identification (depending on the details housed within the de-identified dataset)
16:31:29 [rvaneijk]
Aleecia: formal position within this DNT debate
16:31:32 [susanisrael]
...if low probability of attempt ....hard to make that case
16:31:32 [dwainber_]
dwainber_ has joined #dnt
16:31:50 [susanisrael]
[someone] isn't probability of reidentification only 1 if you have access to the computer?
16:31:51 [Wileys]
Rob - agreed - looking forward to it (the conversation that is, not the horrible weather we're likely to encounter in Boston :-) )
16:32:05 [rvaneijk]
:)
16:32:08 [susanisrael]
khaled: depends on workflow. may be hashed then go to central db
16:32:14 [yianni]
s/someone/Mike Nolet
16:32:32 [aleecia]
We need to recruit a new WG member with a big office in the Florida Keys
16:32:44 [Wileys]
+1 to Aleecia!
16:32:47 [peterswire_]
peterswire_ has joined #dnt
16:32:53 [aleecia]
Rob - thanks, that's exactly what I was asking, thank you
16:33:14 [susanisrael]
mike nolet : i have unique cookie id on ed. need to get totally random integer, if someone is snooping on all net traffic or has access to pc or net connection
16:33:17 [peterswire]
peterswire has joined #dnt
16:33:34 [vincent]
Wileys, how is the re-identification risk lower with the hased cookie ID rather than with the unhashed cookie ID? (that's actually what's discussed right now)
16:33:43 [susanisrael]
peter swire: is there a scenario where hashing matters? mike was saying you have to have access to cookie
16:33:47 [Chris_IAB]
Chris_IAB has joined #dnt
16:34:12 [susanisrael]
chris iab: does it matter if transferring to another party or internally?
16:34:18 [susanisrael]
peter swire: we are learning something
16:34:20 [Chris_IAB]
this was the equation put on the board: pr (re-identification) = pr (re-id/attempt) x pr (attempt)
16:35:00 [susanisrael]
jeff? there is industry practice where you hash, independent party enriches by matching, and there is permission to share 7 matches
16:35:08 [rvaneijk]
Cookie exchanges are interesting in this context..
16:35:08 [Wileys]
Vincent, its lower only if coupled with other factors (multi-factor test) such as seclusion of the key/salt and removal of access/existance from the original dataset.
16:35:12 [susanisrael]
....common identifier can be hashed
16:35:21 [Wileys]
+q
16:35:25 [susanisrael]
peter: so that is one scenario, do you see usefulness ed?
16:35:42 [dwainber_]
q?
16:35:47 [aleecia]
ack Wileys
16:35:51 [dwainber_]
ack dwainber_
16:35:53 [robsherman]
robsherman has joined #dnt
16:36:03 [susanisrael]
shane: the core purpose at yahoo for hashing/keys, is to disconnect that data from use in actual production systems
16:36:11 [justin]
"destroy"?
16:36:36 [peterswire_]
peterswire_ has joined #dnt
16:36:37 [susanisrael]
...destroys possibility for profiling, targeting. can not be used to modify users experience. but still useful for analysis..
16:36:47 [susanisrael]
peter swire: ed or dan does that make sense to you?
16:36:52 [rvaneijk]
WileyS, right. the goal is to break the re-identification
16:36:57 [susanisrael]
dan: i am confused by that
16:37:06 [aleecia]
sigh
16:37:32 [susanisrael]
shane: these are always multifactor tests. your purpose in hashing is to not do this. once you add multifactors, it serves purpose
16:37:46 [susanisrael]
[someone] if you can get hash function or key it doesn't matter
16:37:50 [robsherman1]
robsherman1 has joined #dnt
16:37:56 [susanisrael]
shane: good luck. we make key very inaccessible
16:38:07 [yianni]
s/someone/Joe Hall
16:38:07 [susanisrael]
ed felten: who knows keys?
16:38:29 [vincent]
vincent has joined #dnt
16:38:34 [susanisrael]
shane: keys are very large. systems that are set up to de-identify know key, but human connection to key is not allowed
16:38:58 [susanisrael]
felix: so if i understand correctly usefulness is to separate one part of company to another?
16:39:00 [Chris_IAB]
dwainberg, in case you missed it, "the key is on a post-it on Shane's desk" (that's a JOKE, btw.. lol)
16:39:14 [susanisrael]
shane: really to separate info from another context
16:39:25 [aleecia]
Chris - love it!
16:39:29 [susanisrael]
felix: 2 people (one w key) are separate
16:39:44 [susanisrael]
shane: isolation of key is not only factor.
16:39:59 [peterswire]
peterswire has joined #dnt
16:40:05 [Wileys]
Chris, LOL
16:40:18 [johnsimpson]
q?
16:40:20 [susanisrael]
peter swire: i think its relevant bc hashing and its uses have been talked about in a lot of context. people in ad industry at one end of table, others at other
16:40:31 [dwainberg]
dwainberg has joined #dnt
16:40:49 [susanisrael]
khaled: if that separation is strong and defensible, then at least under hipaa that would be ok. if you have good procedures for controlling access to key that's ok
16:40:51 [Wileys]
Yay for Yahoo!, we're good by HIPPA standards (too bad we don't handle PHI :-) )
16:41:04 [susanisrael]
....scenarios where regulators have accepted that
16:41:16 [susanisrael]
dan auerbach: rotating salt helps a lot
16:41:21 [Chris_IAB]
rotating salt is a good practice
16:41:40 [aleecia]
rotating salts kills everything shane wants out of the data
16:41:59 [Wileys]
Aleecia - we do rotate, but not daily.
16:42:03 [susanisrael]
david wainberg: we are saying its not binary, hashing is not perfect, question is how hard does it make it? how hard do we want to make it? what is the context/data involved?
16:42:09 [justin]
Rotating salts kills longitudinal view, which is a feature or bug depending on how you look at it.
16:42:14 [Chris_IAB]
aleecia, it means Yahoo buys LOTS of post-its (again, marked as a JOKE folks :)
16:42:16 [susanisrael]
someone: sounds like its trivial to break it
16:42:22 [aleecia]
I go with feature, Shane goes with bug :-)
16:42:26 [susanisrael]
david wainberg: what do you mean by trivial
16:42:30 [yianni]
s/someone/Joe
16:42:39 [rvaneijk]
what really hard means also depends on the purpose, not only on the context
16:42:42 [Wileys]
Aleecia, :-)
16:42:45 [susanisrael]
david w: depends on combination of technical and administrative
16:42:51 [aleecia]
buy stock in 3M, folks! you heard it here first.
16:43:14 [peterswire_]
peterswire_ has joined #dnt
16:43:16 [susanisrael]
someone: shane is describing intentional inadvertent viewing of data
16:43:38 [yianni]
s/someone/mike nolet
16:43:46 [susanisrael]
shane: purpose is more than just personal protection--disconnect data from operational systems so utility limited and therefore privacy is increased
16:44:28 [susanisrael]
jeff: everyone agrees with ed or should. if you have access to salt, it doesn't work. but if we say salting/hashing does not work, then we are saying passwords on internet don't work
16:44:46 [susanisrael]
....if you have access to hash and salt you could access hashed stored passwords
16:44:55 [aleecia]
daily rotated salts is at least a step forward. but having it change only when the janitor tosses out the post its by mistake once a year isn't going to make me happy :-)
16:44:57 [jmayer]
q+
16:45:10 [susanisrael]
chris iab: what would the alternative? put all raw data out on internet? or not collect any data?
16:45:16 [vincent]
WIleys, would not a request like "SELECT User from DB where user visited site1,site2,...,siteN" recreate the link that the hash just deleted?
16:45:24 [Wileys]
Aleecia - its a bit more formal/regular than that. Note - I don't use post-its :-)
16:45:26 [susanisrael]
ed felten: i have not heard an example here where hashing really helps
16:46:00 [susanisrael]
peter swire: i spent 2 years working on crypto policy. if system broken it doesn't work, but in practice it works 99 percent of the time
16:46:14 [Wileys]
Vincent, the hash was not meant to hide activity but rather to disconnect identity from operational systems.
16:46:22 [susanisrael]
...i have heard that there are attacks that could be made, but i have heard about administrative controls
16:46:23 [peterswire]
peterswire has joined #dnt
16:46:46 [rvaneijk]
Passwords are used to verify an identity, based on a shared secret, which is a totally different mechanism
16:46:48 [peterswire]
peterswire has joined #dnt
16:46:51 [susanisrael]
....all those seem like things in real world where protection is more than zero though might still be subject to some kinds of attacks
16:47:04 [susanisrael]
ed felten: no because these attacks are trivial
16:47:17 [jmayer]
q+
16:47:30 [vincent]
Wileys, yes but the history of websites visited by a user would help to reconnect the different operational system (the list of website is used as a unique identifier)
16:47:34 [susanisrael]
si question: do these attacks in fact happen in companies all the time in the real world?
16:47:39 [peterswire]
jonathan -- I see you;
16:47:42 [aleecia]
Shane - 3M weeps
16:48:24 [Wileys]
Vincent, agreed - so some URL cleansing helps remove this issue - or in the case of searches, attempts to cleanse personal data in queries helps.
16:48:24 [susanisrael]
ed felten: if we say we will separate our data base into 2 pieces and only one is hashed, whatever analysis someone wants to do they just need to do one more step
16:48:28 [yianni]
ack jmayer
16:48:32 [susanisrael]
chris iab: but they would have to have access right?
16:48:37 [dtauerbach]
q?
16:49:02 [Wileys]
Vicent, my approach can't guarantee 100% certainty but does meet the "very low risk" bar - or in the EU context, the "likely reasonable" bar.
16:49:13 [susanisrael]
jmayer: concrete example: ad company i studied tried to use hashing to do follow on analysis. user had id cookie. then had another cookie. "anonymous"
16:49:41 [peterswire_]
peterswire_ has joined #dnt
16:50:01 [justin]
If we the spec allows for a 30 day short-term retention period, presumably the group would be OK if the salts were rotated at least every 30 days.
16:50:01 [susanisrael]
...idea was that anonymous one was hash with secret salt and would be used for long term things and more private but susceptible to same attacks because you could always correlate with original cookie
16:50:06 [peterswire]
peterswire has joined #dnt
16:50:49 [susanisrael]
peter swire: jmayer you were giving example, and jeff and crhis had questions or comments
16:51:02 [susanisrael]
chris iab: you described a bad practice
16:51:30 [David]
David has joined #dnt
16:51:42 [susanisrael]
...you don't throw out baby with bath water. Just bc there is one bad practice doesn't mean all hashing worthless
16:51:51 [vincent]
Wileys, I don't the "very low risk" bar well enough :) just trying to see what is the type of threat that cookie hashing address
16:52:00 [efelten]
We have yet to hear an example where hashing makes any attack appreciably more difficult.
16:52:27 [David_MacMillan_]
David_MacMillan_ has joined #dnt
16:52:33 [Wileys]
Justin, the spec should not be prescriptive on timeframes and rather, much like HIPPA, should focus on acceptable risk thresholds.
16:52:36 [susanisrael]
jmayer: agree there are better engineering practices; but pretty predictable failures; have heard things like figuring out salt or doing dictionary attacks,
16:53:00 [susanisrael]
...but these are not only attacks. there are enormous re-identifiability problems.
16:53:03 [Wileys]
Vincent, you don't "?" the "very low risk" bar well enough?
16:53:04 [peterswire]
peterswire has joined #dnt
16:53:28 [rvaneijk]
Ed, hashing makes sense, if you take out information such that enough collissions appear, that meat a k-anonimity bar.
16:53:30 [peterswire]
peterswire has joined #dnt
16:53:35 [aleecia]
Justin, I think you're saying: if we're going to have 30 (or more) days for people to take first-logged data to figure out what they have and if they're first or third party while collecting, then we should also be ok with a company holding all data indefinitely, so long as they rotate every 30 days.
16:53:39 [rvaneijk]
s/meat/meet/
16:53:56 [dtauerbach]
I think the point is that in all the examples so far, hashing is purely a method of operational control, and it is not a great one given engineering challenges
16:53:57 [vincent]
Wileys, I don't "know" it well enough, sorry
16:53:58 [susanisrael]
....i think we have an error in the way some people are approaching this. you have fact pattern, try to apply approach. start with specific problem and way to solve and ask if hashing get you there...
16:54:07 [dtauerbach]
e.g. you can't hvae an oracle and that is hard to control in practice
16:54:11 [Zakim]
- +1.631.803.aacc
16:54:33 [susanisrael]
....ed is not asking straight up;/down vote on metaphysics of hashing...and ihave not heard concrete problem and proposed hashing solution that solves the problem
16:54:43 [justin]
aleecia, well, we've had different interpretations of the point of the short-term period over time, but basically yes.
16:54:50 [Wileys]
Ed, if a dataset were breached in isolation (a single data table), wouldn't you agree that hashing of identifiers in that table (depending on what additional feeds were available) would help deter re-identification?
16:54:55 [susanisrael]
peter swire: can industry explain use case where hasing helps?
16:55:33 [susanisrael]
david wainberg: can we identify risk thta ed and jonathan are concerned about it and see if that can be addressed
16:55:37 [aleecia]
Justin - ok. So I'm ok with a single short period, but may not be ok with infinite retention even with rotation
16:56:07 [Zakim]
+DAvid
16:56:10 [jmayer]
q+
16:56:11 [susanisrael]
felix? : sounds like we are concerned about internal controls. valuable if you have company where not everyone or no one is careless or malicious
16:56:26 [efelten]
What I'm looking for is a specific example--a specific use of hashing, and a specific attack that is made more difficult because of the use of hashing.
16:56:32 [susanisrael]
jeff: 3 scenarios where hashing helps. 1: passwords
16:56:36 [peterswire]
peterswire has joined #dnt
16:56:48 [susanisrael]
2. if you want to do research internally in large company.....
16:56:50 [dtauerbach]
Shane, it depends on the details of the hashing. For example, an unsalted hash of social security numbers in that isolated table does not help at all
16:57:02 [Chris_IAB]
new (related) subject: are toilet seat covers effective? (again, humor is my defense mechanism :)
16:57:09 [justin]
aleecia, Fair enough, to the extent there is an inherent risk that a delinked 30-day set of urls is inherently identifiable and/or tiable to other 30-day sets.
16:57:11 [Zakim]
-[GVoice]
16:57:21 [Wileys]
dtauerbach, agreed - I'm speaking only of salted or keyed hashes.
16:57:43 [Zakim]
-Jonathan_Mayer
16:57:45 [susanisrael]
peter swire: so if some risk of internal misuse, but hash passwords or separate research database from where it came from, you reduce risk even.,..
16:58:04 [susanisrael]
if doesn't protect against sophisticated attacks, reduces risk from normal people.
16:58:04 [Zakim]
+Jonathan_Mayer
16:58:12 [aleecia]
Justin - exactly
16:58:20 [vincent]
vincent has joined #dnt
16:58:28 [susanisrael]
felix: i think we are seeing risk reduction in normal ways. seeing qu from ed re: scenarios
16:58:58 [aleecia]
I would guess that at 24 hours I'd be ok. But I'd need to know more. And I think the right way to get at this is not a timeframe, but rather the ability to chain across datasets
16:59:07 [susanisrael]
in some sense from tech perspective does not help much but if the data just requires an extra step that may be enough to deter or detect attack from pt of view of internal controls
16:59:26 [susanisrael]
mike nolet: re: david's question. what is risk you are talking of reducing
16:59:39 [susanisrael]
someone: risk that info on research side is then used to target
16:59:52 [susanisrael]
felix? if dnt is 1?
16:59:55 [Zakim]
- +1.202.257.aaff
16:59:56 [jmayer]
-q
16:59:59 [susanisrael]
yes:
17:00:08 [aleecia]
q?
17:00:18 [peterswire_]
peterswire_ has joined #dnt
17:00:43 [susanisrael]
ed felten: cs views attacks at 3 levels. started discussion bc broad claims were made that hashed data should be treated as per se de-identified.
17:00:48 [Wileys]
Ed, It was never stated in isolation but as one factor of multiple steps to achieve unlinkability.
17:00:54 [Wileys]
Ed, at least not by me
17:01:28 [susanisrael]
...we don't have to talk about hashing or micromanage how people protect, but i don't think we should talk about hashing as total protection
17:02:09 [susanisrael]
paul glist: broad claims on both sides. have looked at this as dial. can reduce risk to socially acceptable levels. hashing is not nothing...
17:02:20 [Chris_IAB]
+1 to current speaker's point
17:02:29 [susanisrael]
...and not everything. it's a tool. add other tools. it's useful.
17:02:53 [jmayer]
There are protections that are effective even if an attacker controls the terminal. That's part of the point.
17:03:08 [susanisrael]
johnsimpson: still having trouble figuring out how this relates to DNT. have been talking about protecting data sets with pii.
17:03:14 [dwainber_]
dwainber_ has joined #dnt
17:03:16 [peterswire]
peterswire has joined #dnt
17:03:29 [dtauerbach]
jmayer, for example: hard disk encryption
17:03:34 [susanisrael]
chris iab: you may want to have access to uri's for example. but don't need it connected to unique users
17:03:34 [justin]
Right, the deidentification method has to take into account the internal misuse angle.
17:03:50 [peterswire]
peterswire has joined #dnt
17:03:55 [susanisrael]
john simpson: but that's the disconnect bc most people saying that dnt is do not collect
17:03:58 [jmayer]
q+
17:04:00 [susanisrael]
someone: is that right?
17:04:22 [susanisrael]
someon: if there is any identifier you still have a problem
17:04:39 [justin]
Someone is justin, someon is jmayer :)
17:04:45 [susanisrael]
peter swire: we heard different perspectives:
17:04:53 [susanisrael]
* thanks justin
17:05:31 [susanisrael]
peter swire...unique identifiers. can you enlighten me? how is going into buckets relevant?
17:06:20 [susanisrael]
someone asks if adding attributes and using those is unique identifiers
17:06:32 [yianni]
s/someone/joe hall
17:06:48 [peterswire]
peterswire has joined #dnt
17:06:51 [susanisrael]
dan auerbach: better privacy friendly way to add advertising that is targeted. need minimum number of people in a bucket
17:07:05 [rvaneijk]
Dan, the minimum buckets make nice micro-segments.
17:07:19 [peterswire]
peterswire has joined #dnt
17:07:26 [susanisrael]
...we suggested 1024 is a minimum bar. with that don't need unique identifier, just low entropy cookies
17:07:48 [susanisrael]
heather: might be useful to look at transcript of previous discussion
17:07:49 [jmayer]
If you're interested in advertising, analytics, etc. without unique IDs... https://air.mozilla.org/tracking-not-required/
17:07:59 [susanisrael]
peterswire: room is not catching fire on this
17:08:41 [susanisrael]
chris mejia: i do agree with dan's core premise, that much harder to identify person from a few attributes distilled from all the uris that people visited
17:08:47 [aleecia]
q+
17:08:56 [jmayer]
q- later
17:08:56 [susanisrael]
dan auerbach: can keep those collections without unique identifers
17:09:21 [peterswire]
ok, I see aleecia and jonathan
17:09:37 [susanisrael]
chris: we agree on that part (harder to identify that way-with quasi identifiers), not necessarily the second part
17:09:43 [dwainber_]
q?
17:09:45 [susanisrael]
.....that is sort of an industry practice
17:09:47 [dwainber_]
q+
17:09:51 [yianni]
ack aleecia
17:10:10 [susanisrael]
aleecia: i think we are all getting there. want to separate 2 different parts of dan's description. one is how to do ads without tracking....
17:10:32 [susanisrael]
...but pertinent is here's how you can do de-identification, suggest we focus on the de-id half
17:10:33 [peterswire_]
peterswire_ has joined #dnt
17:10:50 [susanisrael]
aleecia: ....interesting re: reduced identificaiton risk
17:11:05 [Zakim]
+ +1.631.803.aakk
17:11:06 [peterswire]
peterswire has joined #dnt
17:11:16 [dwainberg]
dwainberg has joined #dnt
17:11:20 [susanisrael]
david wainberg: outline of discusison, 3 general models: 1. random unique identifier, interest buckets
17:11:47 [susanisrael]
2. unique identifier associated with buckets, dan proposing buckets only, no identifiers
17:11:58 [susanisrael]
dan: maybe what aleecia proposed make sense
17:12:43 [susanisrael]
davd w: as discussed earlier, what we mean by de-identified requires setting threshold, and we're just jumping to let's break the connection instead of
17:12:58 [dwainbe__]
dwainbe__ has joined #dnt
17:13:16 [susanisrael]
...discussing what is a level of acceptable risk. there are significant consequences to forcing ad industry to do this
17:13:22 [aleecia]
what does "not linked at all" mean here?
17:13:32 [peterswire_]
peterswire_ has joined #dnt
17:13:32 [susanisrael]
peter swire: if not linked at all then outside dnt
17:13:32 [aleecia]
q?
17:13:40 [susanisrael]
davd w: but still some risk
17:14:00 [susanisrael]
ed: but gets to idea of attribute disclosure vs record re-identificaiton
17:14:01 [peterswire_]
peterswire_ has joined #dnt
17:14:03 [yianni]
ack dwainber
17:14:17 [aleecia]
q+
17:14:22 [susanisrael]
ed: matters a lot what the bucket is: soccer dad vs. aids patient
17:14:23 [jmayer]
q- later
17:14:40 [aleecia]
would like to respond to Ed
17:14:48 [susanisrael]
ed: need more than knowing that there is a bucket, some sensitive info has to not be used
17:15:01 [susanisrael]
ed: but combos of attributes could identify
17:15:09 [jmayer]
Just to be clear, the DAA principles do not prohibit inferences about medical conditions.
17:15:34 [jmayer]
q+
17:15:37 [jmayer]
q+ earlier
17:15:42 [susanisrael]
mike: want to come back to theme: understanding what we're trying to accomplish. what is bad stuff we are trying to prevent. seeing a relevant ad?
17:15:43 [jmayer]
q- earlier
17:15:49 [aleecia]
could we please stay on topic?
17:15:56 [peterswire_]
jonathan -- I'm unclear -- are you in the q?
17:16:02 [aleecia]
this is an interesting discussion, but not today's agenda
17:16:02 [susanisrael]
...what other bad stuff, scary outcomes, than seeing an ad for something i bought on amazon?
17:16:08 [jmayer]
Yep, just testing the limits of Zakim.
17:16:12 [rvaneijk]
The HARM is not a relevant factor when it comes to unlinkability
17:16:26 [dtauerbach]
q?
17:16:28 [johnsimpson]
q?
17:16:33 [susanisrael]
peter swire: what the harm is in tracking comes up in a lot of settings but not main topic today
17:16:33 [yianni]
ack aleecia
17:17:07 [susanisrael]
aleecia: want to respond to ed re: which buckets you might care more about, but group decided we would not distinguish, say re: childrens data
17:17:14 [peterswire]
peterswire has joined #dnt
17:17:23 [susanisrael]
....treating all data same here, which is different than iab daa position
17:17:42 [aleecia]
ack jmayer
17:17:43 [susanisrael]
peter swire: thank you for history but some people do not acknowledge they agreed to that
17:17:44 [yianni]
ack jmayer
17:17:47 [susanisrael]
jmayer passes
17:17:54 [peterswire_]
peterswire_ has joined #dnt
17:18:29 [susanisrael]
peter swire: had initial discussions on buckets and learned a bit on dimensions there. talked with mike at break re: example of something you think it would beuseful to look at
17:18:39 [aleecia]
of note: this is not me *objecting* to treating some data as of more concern. just what the group decided many months ago.
17:19:00 [susanisrael]
david wainberg: i thought next step would be taking approach of your favorite slide and start thinking through risks and how to apply techniques to mitigage
17:19:06 [aleecia]
if there is new information before the group, Peter & Matthias have the option to reopen
17:19:12 [Zakim]
-moneill2.a
17:19:18 [Wileys]
Aleecia - my memory matches yours - we decided to not get bogged down in the "sensitivity" debate and allow self-regulation and laws deal with that item
17:19:24 [susanisrael]
peter swire: that is one possible work flow. use khaled's checklist
17:19:51 [susanisrael]
...maybe there are subsets of people willing to do work on that and come back with a draft. let peter know after meeting if you want to work on
17:20:04 [justin]
Yes, there has never been anything about "sensitive" data in the compliance spec.
17:20:05 [aleecia]
thanks Shane. it was a while ago and pre-dates many folks joining the group. if needed the minutes are out there, but my eagerness to volunteer to find it is not particularly high this week
17:20:12 [jmayer]
q+
17:20:15 [susanisrael]
chris: i have not gotten an answer to what works and protects data if hashing does not work, assuming we will have data
17:20:18 [justin]
Well, apart from that one geolocation section . . .
17:20:45 [susanisrael]
khaled: in health context use probablistic encryption that permits mathematical operations on data
17:20:45 [peterswire_]
peterswire_ has joined #dnt
17:20:54 [Wileys]
Aleecia, I likewise have not desire to volunteer on that point :-) But would be happy to argue to the same outcome as I believe it was a good decision by the group
17:20:58 [susanisrael]
...encrypt at source in browser....
17:21:10 [Wileys]
Justin, agreed - not sure how that snuck through...
17:21:28 [susanisrael]
if you want to use those values to do lookup in db not possible for db owner to determine lookup result
17:21:35 [efelten]
efelten has joined #dnt
17:21:43 [susanisrael]
....efficient process. not much slower than hashing.
17:21:52 [susanisrael]
...using for lookup in large database
17:22:12 [susanisrael]
peter swire: on a wednesday call could learn about homomorphic encryption. seeing nods on this
17:22:28 [susanisrael]
dan auerbach. talking about fully homomorphic encryption? we are not close?
17:22:33 [susanisrael]
khaled: partial
17:23:14 [dwainberg]
dwainberg has joined #dnt
17:23:21 [susanisrael]
felix: also techniques like differential privacy, adding noise to data. questions whether data still useful, but also protects against some attribute disclosure:
17:23:21 [aleecia]
My recollection is Jeff was alone at the time, perhaps one or two people with him at most, and the rest of the group either had the view you have, Shane, or came up with "we don't care, let's talk about something more interesting"
17:23:30 [efelten_]
efelten_ has joined #dnt
17:23:38 [peterswire]
Q?
17:23:54 [susanisrael]
jeff: with encryption or data modificiaton the criticism of hashing is that if you have key or access you can get around, and same is true for other methods, for example keys
17:24:00 [peterswire]
peterswire has joined #dnt
17:24:14 [susanisrael]
felix: not wrt noise, which you can't figure out even if you know how noise was added
17:24:22 [peterswire]
peterswire has joined #dnt
17:24:34 [susanisrael]
ed: lets put off discussion on how works
17:24:49 [susanisrael]
david : interesting but jumping to solution without identifying problems
17:25:18 [susanisrael]
felix: noticing that there is symmetry to this. many techniques improve privacy but limit value of data.
17:25:22 [justin]
WileyS, at some point we'll have to go back and revisit that piece.
17:25:37 [susanisrael]
....homomorphic encryption does not presreve ability to do many things with data
17:25:51 [Wileys]
Justin, we'll never finish this standard if we attempt to define what is "sensitive" in a global marketplace - good luck with that.
17:26:04 [susanisrael]
felix: what use are we trying to preserve once data is de-identified. some uses will be preserved, others not
17:26:09 [yianni]
ack jmayer
17:26:11 [aleecia]
The geoIP part was well locked down, and then Ian rejoined and *did* have new information.
17:26:23 [susanisrael]
jmayer: will postpone since postponing methodology discussion
17:26:44 [justin]
WileyS, I am not arguing that we should.
17:26:53 [susanisrael]
peter swire: thanks to khaled for coming and providing expertise. there was clear explanation of risk based approach used in other settings
17:27:15 [aleecia]
We cannot bar geoIP since knowing where people are affects what to do if DNT is unset
17:27:28 [peterswire]
peterswire has joined #dnt
17:27:31 [susanisrael]
...we also i think has some terminology gain in a lot of places. de-identified or de-linked are conclusion terms that apply once you have a standard, for example in hipaa....
17:27:42 [aleecia]
So we were trying to find a way to say "fine, fine, just pick a large enough geography," and then were hung up in the details on what that means
17:27:50 [susanisrael]
.....we also had variety of other terms about direct identifiers and quasi identifiers that will be helpful....
17:28:05 [susanisrael]
....heard interest in presentation for homomorphic encryption...
17:28:30 [susanisrael]
...also heard suggestion re: doing pieces of that one slide--what are harms, risks, people are concerned about, and
17:28:36 [jmayer]
If we're going to discuss methodologies, differential privacy and privacy-preserving implementations should make the cut.
17:28:50 [susanisrael]
...in particular for online setting develop use cases we should care about if we are to get to homomorphic encryption.
17:29:07 [susanisrael]
....any other action items?
17:29:08 [mnolet_]
mnolet_ has joined #dnt
17:29:37 [susanisrael]
...if you have them after the meeting i welcome those. we are heading to f2f mtg, and want to make progress on this in advance...
17:29:45 [Zakim]
-bryan
17:29:47 [aleecia]
thanks, Peter!
17:29:49 [susanisrael]
....thanks to cdt, khaled, all who came
17:29:50 [Zakim]
- +1.631.803.aakk
17:29:51 [johnsimpson]
johnsimpson has left #dnt
17:29:52 [Zakim]
-Brooks
17:29:54 [Zakim]
-Peder_Magee
17:29:55 [aleecia]
and thanks Susan for scribing so much!
17:30:04 [Zakim]
- +1.215.286.aaee
17:30:05 [Zakim]
-rvaneijk
17:30:07 [Zakim]
-Aleecia
17:30:12 [yianni]
rrsagent, make logs public
17:30:18 [Zakim]
-vincent
17:30:25 [yianni]
rrsagent, set logs would visible
17:30:36 [aleecia]
(you want public)
17:30:40 [yianni]
rrsagent, draft minutes
17:30:40 [RRSAgent]
I have made the request to generate http://www.w3.org/2013/01/17-DNT-minutes.html yianni
17:31:14 [Zakim]
-Jonathan_Mayer
17:31:22 [Zakim]
-vinay
17:31:39 [Ho-Chun_Ho_]
Ho-Chun_Ho_ has left #dnt
17:31:58 [Zakim]
-SusanIsrael
17:34:21 [Zakim]
-WileyS
17:45:39 [Zakim]
-DAvid
17:50:11 [peterswire]
peterswire has joined #dnt
17:56:13 [Zakim]
-moneill2
18:05:00 [Zakim]
disconnecting the lone participant, [CDT], in Team_(dnt)14:00Z
18:05:02 [Zakim]
Team_(dnt)14:00Z has ended
18:05:02 [Zakim]
Attendees were Jonathan_Mayer, [GVoice], rvaneijk, +1.425.214.aaaa, Aleecia, +1.202.587.aabb, WileyS, +1.631.803.aacc, [CDT], +1.215.796.aadd, bryan, +1.215.286.aaee, vincent,
18:05:02 [Zakim]
... Peder_Magee, +1.202.257.aaff, +1.646.722.aagg, Brooks, +1.917.934.aahh, vinay, +1.646.654.aaii, +1.215.286.aajj, moneill2, SusanIsrael, DAvid, +1.631.803.aakk
18:05:08 [efelten]
efelten has joined #dnt
18:05:55 [JoeHallCDT]
JoeHallCDT has joined #DNT
18:16:54 [efelten]
efelten has joined #dnt
18:35:59 [JoeHallCDT]
JoeHallCDT has left #dnt
18:40:12 [dwainberg]
dwainberg has joined #dnt
18:43:50 [dwainber_]
dwainber_ has joined #dnt
18:51:42 [efelten]
efelten has joined #dnt
18:55:10 [mnolet]
mnolet has joined #dnt
19:12:16 [robsherman]
robsherman has joined #dnt
19:31:46 [Zakim]
Zakim has left #dnt
19:43:27 [npdoty]
npdoty has joined #dnt
20:00:43 [dsinger]
dsinger has joined #dnt
21:50:51 [hwest]
hwest has joined #dnt