IRC log of dnt on 2013-01-16

Timestamps are in UTC.

16:57:18 [RRSAgent]
RRSAgent has joined #dnt
16:57:18 [RRSAgent]
logging to
16:57:22 [npdoty]
rrsagent, make logs public
16:57:23 [Zakim]
+ +1.408.836.aaaa
16:57:27 [Zakim]
16:57:32 [npdoty]
Meeting: Tracking Protection Working Group teleconference
16:57:35 [Chris_IAB]
Chris_IAB has joined #dnt
16:57:37 [npdoty]
Chair: peterswire
16:57:43 [kulick]
1.408.836.aaaa -> brad kulick
16:57:46 [Zakim]
16:57:48 [vincent]
vincent has joined #dnt
16:57:50 [Walter]
I hear some underage participant
16:57:55 [BrendanIAB]
P39 is Mattias I think
16:58:06 [npdoty]
Zakim, aaaa is kulick
16:58:06 [Zakim]
+kulick; got it
16:58:08 [Zakim]
+ +31.65.141.aabb
16:58:12 [Zakim]
+ +1.202.587.aacc
16:58:16 [rvaneijk]
Zakim, aabb is me
16:58:16 [Zakim]
+rvaneijk; got it
16:58:18 [moneill2]
zakim, [IPcaller] is me
16:58:18 [Zakim]
+moneill2; got it
16:58:20 [jeffwilson]
jeffwilson has joined #dnt
16:58:23 [npdoty]
Zakim, ??P39 is maybe schunter
16:58:23 [Zakim]
I don't understand '??P39 is maybe schunter', npdoty
16:58:28 [Zakim]
16:58:29 [Zakim]
16:58:31 [Zakim]
16:58:34 [Zakim]
16:58:45 [JC]
JC has joined #DNT
16:58:53 [Chris_IAB]
Chris Mejia just joined the call from 212
16:58:57 [samsilberman]
samsilberman has joined #dnt
16:59:08 [BrendanIAB]
npdoty - I think the keyword you're looking for is "probably" rather than "maybe"
16:59:08 [Zakim]
16:59:12 [Zakim]
16:59:21 [Brooks]
Brooks has joined #dnt
16:59:40 [npdoty]
Zakim, ??P39 is probably schunter
16:59:40 [Zakim]
+schunter?; got it
16:59:46 [Zakim]
16:59:47 [susanisrael]
susanisrael has joined #dnt
16:59:51 [Zakim]
16:59:54 [Lia]
Lia has joined #dnt
17:00:02 [Yianni]
Zakim, aacc is peterswire
17:00:02 [Zakim]
+peterswire; got it
17:00:16 [npdoty]
Zakim, who is on the phone?
17:00:16 [Zakim]
On the phone I see BrendanIAB?, schunter?, dwainberg, walter, kulick, Fielding, moneill2, rvaneijk, peterswire, JeffWilson, Chris_IAB, vincent, npdoty, Brooks, [Microsoft],
17:00:19 [Zakim]
... Susan_Israel, samsilberman
17:00:20 [Zakim]
17:00:23 [ChrisPedigo_OPA]
ChrisPedigo_OPA has joined #dnt
17:00:23 [aleecia]
aleecia has joined #dnt
17:00:28 [hefferjr]
hefferjr has joined #dnt
17:00:29 [Zakim]
17:00:35 [Zakim]
+ +1.202.331.aadd
17:00:43 [Zakim]
+ +1.646.654.aaee
17:00:46 [Zakim]
17:00:52 [kj]
kj has joined #dnt
17:01:04 [Zakim]
17:01:07 [Zakim]
17:01:08 [Zakim]
17:01:12 [Lia]
Zakim, aaee is me
17:01:12 [Zakim]
+Lia; got it
17:01:14 [Zakim]
17:01:18 [npdoty]
scribe: JC
17:01:31 [justin]
justin has joined #dnt
17:01:32 [Zakim]
+ +1.917.974.aaff
17:01:34 [JC]
Peterswire: Put in IRC any new phone numbers
17:01:40 [jmayer]
jmayer has joined #dnt
17:01:40 [hwest]
hwest has joined #dnt
17:01:41 [npdoty]
Zakim, aaff is justin
17:01:41 [Zakim]
+justin; got it
17:01:47 [npdoty]
Zakim, who is making noise?
17:01:48 [David_MacMillan]
David_MacMillan has joined #dnt
17:01:48 [aleecia]
please mute :-)
17:01:55 [Zakim]
+ +1.609.258.aagg
17:01:56 [vinay]
vinay has joined #dnt
17:01:57 [Zakim]
npdoty, listening for 10 seconds I heard sound from the following: peterswire (50%), Susan_Israel (31%)
17:01:59 [Zakim]
17:02:01 [efelten_]
Zakim, aagg is me
17:02:01 [Zakim]
+efelten_; got it
17:02:02 [JC]
... please be on mute if you are talking locally
17:02:13 [peter-4As]
peter-4As has joined #dnt
17:02:17 [JC]
... scribes will be selected before calls
17:02:17 [Zakim]
17:02:21 [JC]
... hello to everyone
17:02:24 [Zakim]
+ +1.202.344.aahh
17:02:45 [Mike_Zaneis]
Mike_Zaneis has joined #dnt
17:02:48 [JC]
... we will be looking at de-identification issues which will be important to future call
17:02:49 [npdoty]
17:02:49 [trackbot]
ISSUE-191 -- Non-normative Discussion of De-Identification -- raised
17:02:49 [trackbot]
17:02:52 [Keith]
Keith has joined #dnt
17:03:02 [JC]
... a new issue 191 was created for this
17:03:08 [Zakim]
17:03:13 [dsinger]
dsinger has joined #dnt
17:03:14 [JC]
... for linkability and de-identification
17:03:32 [JC]
... it is important to get clarity around definitions and problems that have come up
17:03:38 [rvaneijk]
Is there a URL with info to participate remotely tomorrow for the de-ID workshop?
17:03:40 [Zakim]
17:03:48 [adrianba]
zakim, [Microsoft.a] is me
17:03:48 [Zakim]
+adrianba; got it
17:03:51 [JC]
... two major reports have been sent out on this
17:03:56 [adrianba]
zakim, mute me
17:03:56 [Zakim]
adrianba should now be muted
17:04:02 [JC]
... the US document will be discussed today
17:04:12 [aleecia]
David I had to try a few times too
17:04:15 [Zakim]
17:04:15 [dsinger]
zakim, who is on the phone?
17:04:15 [Zakim]
On the phone I see BrendanIAB?, schunter?, dwainberg, walter, kulick, Fielding, moneill2, rvaneijk, peterswire, JeffWilson, Chris_IAB, vincent, npdoty, [Microsoft], Susan_Israel,
17:04:18 [Zakim]
... samsilberman, [CDT], Keith_Scarborough, +1.202.331.aadd, Lia, DAvid, hefferjr, Aleecia, RichardWeaver, justin, efelten_, Jonathan_Mayer, hwest, +1.202.344.aahh, adrianba
17:04:18 [Zakim]
... (muted)
17:04:18 [JC]
.... Deven McGraw from CDT was involved in it
17:04:31 [JC]
... the second one was from UK ICO
17:04:42 [JC]
... links are in today's agenda
17:04:42 [npdoty]
17:04:55 [rvaneijk]
Just to make sure, the UK ICO report is for the UK only...
17:05:00 [WileyS]
WileyS has joined #dnt
17:05:05 [npdoty]
Zakim, who is making noise?
17:05:10 [rvaneijk]
You can not extrapolate it to the EU..
17:05:14 [JC]
... hopefully we can work on advancement of common knowledge
17:05:15 [Zakim]
npdoty, listening for 10 seconds I heard sound from the following: Susan_Israel (19%)
17:05:27 [JC]
... remember gathering at CDT
17:05:31 [Mike_Zaneis]
Zakim, 2023444652 is Mike Zaneis
17:05:31 [Zakim]
I don't understand '2023444652 is Mike Zaneis', Mike_Zaneis
17:05:40 [npdoty]
Zakim, aahh is Mike_Zaneis
17:05:40 [Zakim]
+Mike_Zaneis; got it
17:05:46 [rvaneijk]
Is there a URL with info to participate remotely tomorrow for the de-ID workshop?
17:05:47 [JC]
... should be emailed if you are attending in person
17:05:48 [Zakim]
17:05:58 [Zakim]
17:06:06 [Zakim]
17:06:07 [Zakim]
+ +1.646.722.aaii
17:06:09 [JC]
... one of the rules for discussion is no normative conversations
17:06:20 [JC]
... same call in rules for weekly calls
17:06:32 [ATurkel]
ATurkel has joined #dnt
17:06:38 [Marc_G]
Marc_G has joined #DNT
17:06:46 [Walter]
The UK document doesn't even bind the UK
17:06:56 [Zakim]
17:06:57 [Walter]
And definitely does not bind anyone in Europe
17:07:03 [JC]
... the documents do not bind countries or are necessarily the right way to go
17:07:03 [dsinger]
zakim, [apple] has dsinger
17:07:03 [Zakim]
+dsinger; got it
17:07:08 [Zakim]
+ +1.425.214.aajj
17:07:15 [rvaneijk]
the UK document is centered around its definition of personal data.
17:07:22 [Zakim]
+ +1.425.455.aakk
17:07:28 [Zakim]
+ +1.202.265.aall
17:07:38 [Marc_G]
Marc 202 265 2736
17:07:42 [JC]
... I gave sample reasons why one might be less strict to use, not to say that these are the correct answers on how we shouild go on DNT
17:07:45 [npdoty]
Zakim, aall is Marc_G
17:07:45 [Zakim]
+Marc_G; got it
17:07:57 [pedermagee2023263538]
pedermagee2023263538 has joined #dnt
17:07:58 [CraigSpiezle]
CraigSpiezle has joined #dnt
17:07:59 [JC]
... post any new documents to the Wiki Nick has setup
17:08:12 [npdoty]
Zakim, who is making noise?
17:08:17 [dsinger]
zakim, who is mnaking noise?
17:08:17 [Zakim]
I don't understand your question, dsinger.
17:08:18 [JC]
... issues we plan to discuss tomorrow at 9:00
17:08:18 [aleecia]
cannot understand
17:08:22 [Zakim]
npdoty, listening for 10 seconds I heard sound from the following: Fielding (9%), peterswire (15%)
17:08:29 [Chris_IAB]
can't hear due to background noise
17:08:50 [JC]
... what are insentives to do de-identification
17:09:13 [npdoty]
17:09:17 [JC]
.... if we understand reasons, risks, benefits, that can lead to uses cases
17:09:21 [Zakim]
+ +1.206.658.aamm
17:09:45 [JC]
... second topic, what are some measurements of de-identification. what are risks of reidentification
17:09:47 [npdoty]
Zakim, aamm is probably amyc
17:09:47 [Zakim]
+amyc?; got it
17:09:56 [JC]
... what are goals as we define these regimes
17:10:12 [fielding]
17:10:19 [JC]
... what are goals technical safeguards versus adminstrative safeguards
17:10:32 [JC]
... number 4 hashing
17:10:43 [JC]
... what kind of safeguards can it provide
17:10:57 [JC]
... next issue, use of persistence identifiers
17:11:15 [JC]
... how is it that various buckets can be updated when deidentification is used
17:11:43 [JC]
... if there are other descriptive issues that should be identified send them to Peter Swire
17:12:00 [JC]
... We ccirculated Devin's slides earlier
17:12:09 [JC]
... any questions or comments?
17:12:13 [npdoty]
ack fielding
17:12:19 [efelten_]
Could somebody post a link to Deven's slides?
17:12:30 [JC]
fielding: can you describe why deidentification is it applicable to DNT
17:12:33 [Walter]
17:12:38 [Wileys]
Wileys has joined #dnt
17:12:42 [Walter]
as in, I'd like to get a link to the slides too
17:12:45 [JC]
peterswire: I see it relevant in a couple of ways
17:13:01 [JC]
... data collected online is so aggregated it is not considered tracking
17:13:24 [JC]
... at the other end data is associated with a specific individual such as Peter
17:13:28 [rvaneijk]
echo: Could somebody post a link to Deven's slides?
17:13:41 [JC]
... knowing were data falls is important to the process
17:13:56 [aleecia]
Hi Roy, we're basically working through
17:14:01 [Zakim]
17:14:03 [JC]
... second thing, int he compliance spec it is related to the various uses
17:14:14 [justin]
The standard says it doesn't apply to data that has been deidentified/delinked. And it's been one of the most debated topics within the group. How is that not relevant?
17:14:25 [JC]
... there can be time when data goes into a DB and it should not come out in a way that can be linked to an individual
17:14:26 [Chris_IAB]
to the individual or to the unique ID?
17:15:02 [JC]
... I'm not saying that the DAA rules a perfect, but that have definitions about how data goes into the system but does not come out
17:15:03 [tedleung]
tedleung has joined #dnt
17:15:08 [JC]
... in an identifiable way
17:15:09 [fielding]
justin, because I have no interest in keeping data when DNT:1 is set other than for security purposes
17:15:25 [aleecia]
Justin's right that we've discussed anything being permitted for de-id'ed data, but that's not nailed down as we were still working through which defn we would go through
17:15:33 [JC]
... because the compliance spec covers the meaning of tracking and others I feel it is relevant
17:15:36 [Wileys]
Nick, could you share the Twiki link you created - does it host the presentation being referenced?
17:15:51 [JC]
chris_iab: when you say individaul do you mean unique ID?
17:16:03 [JC]
peterswire: I'm trying not to make decisions about what I mean
17:16:15 [npdoty]
Wileys, wiki page is here some people have already added more to it
17:16:21 [JC]
... sometimes it is associated with a machine or cookie or individual
17:16:38 [JC]
... that is what I mean by creating a working definition
17:16:51 [JC]
... so we know what we are referencing in conversations
17:17:02 [Wileys]
Thank you Nick
17:17:29 [JC]
... I will introduce Devin McGraw
17:17:37 [justin]
fielding, I am glad to hear it! But other working group members want to do more with that data. So the previous discussions we had about product improvement/market research have now migrated to the discussion of deidentification.
17:17:46 [JC]
... she was very involved in public hearings on deidentification
17:17:49 [rvaneijk]
17:17:50 [npdoty]
17:18:04 [dsinger]
17:18:06 [npdoty]
ack rvaneijk
17:18:14 [JC]
rvaneijk: Can we get a link to the slides?
17:18:15 [Wileys]
Nick, checked the Twiki and can't find the slides
17:18:17 [Brooks]
+1 on slides
17:18:23 [Wileys]
17:18:33 [JC]
peterswire: I met have created an error in my email
17:18:42 [JC]
Shane: can you post the slides?
17:18:49 [Wileys]
17:19:24 [Walter]
just paste the link in here?
17:19:31 [Wileys]
Thank you Peter
17:19:39 [JC]
Peterswire: I will send these to Nick and he can post them
17:19:47 [JC]
... Deven you can go
17:19:47 [Zakim]
17:19:58 [JC]
Deven: The slides are mostly text without math
17:21:09 [JC]
... The guidance that was given on deidentification came from the HIPAA and that where we will start
17:21:15 [Zakim]
17:21:32 [JC]
... HIPAA protects health information in the US, but it is not a data protectin law
17:21:40 [Zakim]
17:21:52 [JC]
... most of the data holders in the US are covered by HIPAA
17:22:07 [JC]
... HealthVault and similar apps are not covered
17:22:08 [aleecia]
Roy it's possible that some of the discussion around how to keep data protected may be interesting in the context of data held to prevent fraud / for security. Unclear to me, but I could imagine some cross-over there
17:22:24 [JC]
... the bad news is HIPAA does not cover all health data
17:22:36 [JC]
... is my email
17:22:38 [dsinger]
zakim, who is making noise?
17:22:49 [Zakim]
dsinger, listening for 10 seconds I heard sound from the following: Susan_Israel (48%)
17:23:05 [aleecia]
zakim, mute susan_israel
17:23:05 [Zakim]
Susan_Israel should now be muted
17:23:06 [susanisrael]
i have musted sorry
17:23:19 [JC]
... when you have data that meets the standard for deidentification it is not covered by the law
17:23:34 [JC]
... you can do almost anything with deidentified data
17:23:47 [JC]
... the deidentification standard is a legal one
17:24:16 [JC]
... there is no specific percentage risk which is established as a baseline
17:24:25 [Zakim]
17:24:28 [JC]
... risk is contextual
17:24:38 [Zakim]
17:24:41 [JC]
... there are two methods that can be used
17:24:59 [JC]
... the expert method requires someone with expertise to document that the risk is small
17:25:20 [JC]
... it must be determined who the data is going to and what other data they have
17:25:44 [JC]
... safeharbor metho requires removing ?? amounts of data
17:25:50 [JC]
... I"m on slide 5
17:26:19 [JC]
... a code can be assigned to deidentified data to allow data to be reidentified as long as the code is not derived from individual
17:26:37 [JC]
... and you cannot deisclose the code to the identity you are giving the data to
17:26:49 [Zakim]
17:27:08 [JC]
... this provision permits healthcare entities to be able to reidentify the data when notification is required
17:27:24 [JC]
... for example in case of an infectious desease
17:27:25 [npdoty]
17:27:30 [Chris_IAB]
sorry, but have the slides that we are reviewing been distributed? I can't seem to find them?
17:27:47 [Chris_IAB]
see in now npdoty, thanks
17:27:51 [JC]
... the assignment of codes is covered in guidance
17:28:09 [JC]
... on slide 6 let's discuss safeharbor
17:28:18 [Chris_IAB]
slides are not numbered
17:28:37 [Zakim]
17:28:45 [peterswire]
if you view in other mode, you can see the slide numbers
17:29:21 [Chris_IAB]
it's a pdf peterswire; which mode are you referring to?
17:29:33 [JC]
... names, addresses, zip codes, all elements of dates, ages are okay except for the elderly, telephone number, account number, VIN, IP address, URL
17:29:50 [peterswire]
ah, I'm viewing in powerpoint
17:29:51 [JC]
... and any other unique identifying number or code cannot be used
17:30:10 [aleecia]
I do not think I understand this "code"
17:30:13 [JC]
... the trick with safeharbor is you have to remove all of these types of data to be covered
17:30:20 [Zakim]
17:30:41 [JC]
... if this does not work you can use the statistician method, but someone must validate the method
17:30:49 [Zakim]
+ +44.772.301.aann
17:31:04 [ChrisPedigo_OPA]
ChrisPedigo_OPA has joined #dnt
17:31:24 [JC]
... safeharbor method deems that the data is deidentified and thus unregulated
17:31:25 [schunter]
schunter has joined #dnt
17:31:29 [phildpearce]
phildpearce has joined #dnt
17:31:43 [JC]
... it is also a cookbook that tells you how to deidentify
17:31:56 [Zakim]
17:32:00 [Zakim]
17:32:12 [Zakim]
+ +1.213.239.aaoo
17:32:22 [JC]
... under the statistical method there are no rules for the statistician
17:32:27 [Mike_Zaneis_]
Mike_Zaneis_ has joined #dnt
17:32:28 [mecallahan]
mecallahan has joined #DNT
17:32:51 [JC]
... I have never heard of anyone be held up by a regulator because they did not properly deidentify data
17:33:33 [schunter]
Zakim, ??P24 is schunter
17:33:33 [Zakim]
+schunter; got it
17:33:38 [JC]
... the standard is to reach low risk of reidentification, not zero risk
17:33:50 [JC]
... requiring zero risk would remove all utility
17:34:02 [aleecia]
Ahh. 1999. Before a lot of the re-identification work had happened.
17:34:05 [amyc]
amyc has joined #dnt
17:34:27 [JC]
... provides rules for contractors
17:34:56 [JC]
... data use agreements are not required, but a data holder may require an agreement for deidentificaiton
17:35:10 [JC]
... slide 12 guidance covers who is an expert
17:35:38 [JC]
... no specific degree or level or education is required, but they will look at that in a review
17:36:04 [JC]
... no numeric target is given for risk
17:36:17 [npdoty]
aleecia, isn't this much more recent guidance? I'm hearing explicit acknowledgement of re-identification -- low risk, not no risk
17:36:43 [JC]
... multiple algorithms can be used in a single datasets
17:36:58 [JC]
... as long as datasets cannot be combined for reidentification
17:37:09 [JC]
... slide 13 shows dataflow
17:37:21 [efelten_]
Nick, one example of outdated thinking is the discussion of k-anonymity.
17:37:23 [JC]
... deidentification can be iterative
17:38:08 [JC]
... an agreement cannot be a tool of deidentification
17:38:21 [aleecia]
the guidance is more recent, I agree. The original text was from '99. That explains why there would be an identifier added back after doing all the de-identifying work -- the risk of that was likely not really appreciated at the same level in '99
17:38:25 [JC]
... slide 14 and 2.9 of guidance
17:38:34 [bryan]
bryan has joined #dnt
17:38:36 [JC]
... you cannot assign a code that is given away with the data
17:38:37 [aleecia]
And here we are :-)
17:38:53 [aleecia]
So it sounds like they're trying to fix it
17:38:55 [npdoty]
efelten, forgive my ignorance, why is discussing k-anonymity outdated?
17:39:00 [peterswire]
this is 2012 guidance; original rule drafted in 1999/2000
17:39:12 [JC]
... however you can disclose a code that has been derived from the data as long as the code and data meet low risk standard
17:39:44 [JC]
... you can take protected health information and transform it into values for cryptographic hash functions
17:39:49 [efelten_]
k-anonymity does not imply any limitation on the the analyst's ability to infer sensitive data about individuals, for one thing.
17:40:01 [JC]
... but do not give away the formula or hash
17:41:02 [JC]
... slide 16 remember when you are using safeharbor to remove 18 types of data you have to know if the data can be reidentified
17:41:42 [JC]
... structured data and free text fields are covered by deidentification rules
17:42:04 [JC]
... deidentification is aimed at protecting patients and families not staff
17:42:24 [Zakim]
+ +1.917.318.aapp
17:42:34 [JC]
... HIPAA rules does not cover healthcare providers
17:42:37 [Zakim]
17:42:58 [Zakim]
17:43:12 [JC]
... I will let you know when the guidance does not cover something
17:43:16 [npdoty]
Zakim, aapp is Alan
17:43:16 [Zakim]
+Alan; got it
17:43:30 [JC]
... the agency did what congress asked them to do and nothing more
17:44:06 [aleecia]
Some of this is really good. But it starts from a point of trying to create incentives for de-id'ing data, presumably because aggregate health information has so much public benefit. Bit different here, but very very interesting to hear what they did
17:44:22 [JC]
Peterswire: Under safeharbor IP address is PHI. What about cookies or browsing habits?
17:44:29 [JC]
Deven: there is no guidance on that
17:44:39 [npdoty]
I didn't understand IP address as personal health information, but just as information that would have to be removed to de-identify
17:44:42 [JC]
... you would need to look at what is being examined
17:44:55 [Zakim]
17:44:56 [JC]
... the hospital's website would not necessarily be covered
17:45:20 [JC]
Peterswire: Is knowing where the patient is logging in from covered
17:45:33 [efelten_]
URLs are covered as PHI, right?
17:45:40 [JC]
Deven: Since web data is covered this could be covered
17:46:02 [efelten_]
Or at least URLs are one of the things that have to be removed under the safe harbor.
17:46:03 [JC]
... that is why there is the catch-all category to catch these types of things, such as cookies
17:46:16 [npdoty]
efelten, the latter, yes
17:46:20 [JC]
Peterswire: have people use one method over the other
17:46:21 [Wileys]
The HIPPA standard for de-identification is focused on 'External Sharing' - whereas our discussions have centered around de-identification for data that is not to be shared externally. I believe it makes sense to have two standards here: internal vs. external
17:46:22 [moneill2]
guid in cookie obv. can be used to re-identify
17:46:44 [JC]
Deven: The analytical folks tend to use statistician method because they need dates
17:47:14 [aleecia]
Shane, I could imagine that working
17:47:15 [JC]
... similaryly understanding health trends is difficult with safeharbor method
17:47:37 [JC]
... bess analytics is done with statitically deidentified data
17:47:53 [justin]
justin has joined #dnt
17:47:58 [Marc_G]
What about Shane's question or point above?
17:48:00 [JC]
Peterswire: Can you explain if salts are required with hashing
17:48:32 [JC]
Deven: I believe the guidance states if you are using a hash, after you hand the data to a recipient they should not be able to reidentify the data
17:49:03 [JC]
... the risk should be very low and examples are provided for when codes can be provided
17:49:07 [efelten_]
In healthcare, providers are given different treatment because they have informed consent from the patient.
17:49:34 [JC]
... for hashes you cannot provide the key or salt
17:52:11 [npdoty]
scribenick: npdoty
17:52:15 [Wileys]
Ed, if the URL is non-specific to a user, then this would not have to be removed (meets 'very low risk' standard)
17:52:42 [npdoty]
peterswire: regarding data-use agreements under HIPAA, when does de-identification happen vs. data-use agreements?
17:53:08 [JC_]
JC_ has joined #DNT
17:53:08 [npdoty]
deven: data-use agreement is not required when you've reached de-identification (statistically to low risk, or under safe harbor)
17:53:15 [rvaneijk]
Shane, a dataset with full URL's contains behavioral information, which is specific to a user
17:53:29 [npdoty]
... you don't need to execute an agreement with the recipient of your data, they don't need to commit not to re-identify
17:53:46 [npdoty]
... if you want to use a data-use agreement as an extra measure of caution, you can do that
17:53:48 [JC_]
17:53:52 [npdoty]
... enforced as a matter of contract
17:54:10 [npdoty]
... can't use the data-use agreement to get to the low risk of de-identification
17:54:26 [JC]
JC has joined #DNT
17:54:42 [npdoty]
... gray area regarding "anticipated recipient"
17:54:42 [Zakim]
- +1.425.455.aakk
17:54:59 [npdoty]
... because there might be other people who can reidentify this data but you can't
17:55:31 [Zakim]
17:55:43 [npdoty]
... still raises questions about whether the agreement can limit recipients in a way that changes your statistical needs
17:56:03 [Wileys]
Rob - as long as the receiptient is not able to leverage the URL history to re-identify the user then it does not need to be stripped.
17:56:12 [npdoty]
peterswire: how much the expert's methodology should be public. what level of transparency is required?
17:56:21 [Zakim]
17:56:35 [npdoty]
deven: not required to document the methodology, but required to maintain evidence for use in response to regulators [did scribe get that right?]
17:56:56 [npdoty]
... certainly been to many conferences where computer scientists will share those methodologies for feedback
17:57:14 [aleecia]
Shane - it turns out URL history is an effective fingerprint. If "able to" is the threshold, then URLs are certainly going to need to be stripped
17:57:20 [npdoty]
... if you're willing to attest, put your name as a statistician, you don't have to document the method
17:57:33 [ATurkel]
ATurkel has joined #dnt
17:57:45 [npdoty]
... not specified what level of attestation is needed
17:58:03 [npdoty]
... I would want enough documentation as the data holder to respond to regulators who knock on my door
17:58:14 [npdoty]
... a handful of people who do this on a regular basis, and everybody uses them
17:58:34 [Wileys]
Aleecia - if I give you a handful of URLs and ask you to re-identify the individual they belong to, I doubt you'd be able to. This is the receiptent test.
17:58:54 [npdoty]
... gives legal comfort to pick someone who has been regularly used
17:59:05 [npdoty]
Zakim, who is making noise?
17:59:14 [efelten_]
Actually the test is: if you give her all of your data, can she re-identify.
17:59:18 [Zakim]
npdoty, listening for 12 seconds I could not identify any sounds
17:59:22 [dsinger]
zakim, who is making noise?
17:59:23 [npdoty]
Zakim, who is making noise?
17:59:34 [Zakim]
dsinger, listening for 11 seconds I heard sound from the following: justin (56%), peterswire (39%)
17:59:38 [Wileys]
Ed, agreed - the assembly of the specific data elements is a key factor
17:59:39 [npdoty]
Zakim, drop justin
17:59:39 [Zakim]
justin is being disconnected
17:59:39 [vincent]
Wileys, if you include the the timestamps I bet you could re-identify someone even with a few urls
17:59:40 [Zakim]
17:59:42 [Chris_IAB]
does anyone else hear that?!
17:59:48 [Zakim]
npdoty, listening for 13 seconds I heard sound from the following: peterswire (85%)
17:59:55 [Chris_IAB]
missed everything you said during noise
17:59:56 [Walter]
we certainly did
18:00:06 [Chapell]
Chapell has joined #DNT
18:00:16 [npdoty]
peterswire: q regarding categories of information
18:00:31 [Wileys]
Vicent, I'm not sure I agree but this does align with my conversation with Ed on the assembly of data elements is key to the determination of "very low risk"
18:00:36 [rvaneijk]
WIley, that is the whole point of pixel tagging
18:00:38 [npdoty]
deven: not all holders, aimed at hospitals and doctors, and the records they use to treat patients and pay healthcare claims
18:01:00 [Zakim]
18:01:03 [npdoty]
... of the data that's in those types of records, what elements are most likely to be re-identifiable
18:01:07 [aleecia]
Shane - it turns out people visit the same few sites in the long tail. So for me, that's going to be a specific set of four web comics. :-) The set of sites people visit is persistent and often unique
18:01:08 [justin]
zakim, mute me
18:01:08 [Zakim]
justin should now be muted
18:01:19 [Walter]
justin: your cat sat on the phone?
18:01:28 [Wileys]
Rob - pixel tagging through a unique cookie ID is meaningful to me - but since you as a receiptent don't have access to my cookie ID platform would not allow you to re-identify an individual
18:01:36 [npdoty]
... safe harbor categories came around after Latanya Sweeney's reidentification of the governor's record
18:02:04 [npdoty]
... data elements that she used are now listed in the safe harbor
18:02:28 [npdoty]
... but as we increase the amount of data in the external world, we shouldn't assume every year that the safe harbor makes it a very low risk
18:02:38 [npdoty]
... but a lot of public databases are not covered by HIPAA
18:03:23 [npdoty]
peterswire: some discussion regarding date of birth, different from other data fields in that it splits the population into 25,000 cells
18:04:09 [npdoty]
... what kind of data can be easily searched on the outside? when you're coming up with your definition of very low risk, demographic data or data that lasts with you for a long time is a higher risk
18:04:25 [npdoty]
... persists longer and is more easily obtainable from other sources
18:04:57 [npdoty]
deven: that's the level of detail in discussion of the statistical methodology
18:04:59 [npdoty]
18:05:16 [kj]
kj has joined #dnt
18:05:30 [npdoty]
peterswire: thanks very much to Deven
18:06:04 [npdoty]
peterswire: in person availability in Brussels; next Thursday or Friday, will provide more information
18:06:16 [fielding]
I have no doubt that understanding deidentification is useful in general for the privacy of all users [not just those sending DNT]. I don't believe discussing it here is useful because I don't see us redefining what it means in our specs. That's in stark contrast to defining tracking, which hasn't been defined by anyone else, we are specifically chartered to define, and we aren't going to make any real progress until we do. And, no, I don't think that
18:06:17 [fielding]
unlinkability is relevant just because someone made it an issue for TCS.
18:06:20 [npdoty]
... I'm not available next Wednesday, Matthias will have a technical call at the usual time
18:06:24 [npdoty]
... questions or comments?
18:06:27 [fielding]
What about the MIT meeting?
18:06:33 [tedleung]
any more details on the f2f?
18:06:33 [Zakim]
- +1.646.722.aaii
18:06:33 [adrianba]
is there logistics information for the f2f?
18:06:34 [Zakim]
18:06:34 [Zakim]
18:06:35 [Zakim]
18:06:35 [Zakim]
- +1.425.214.aajj
18:06:36 [npdoty]
... thanks everybody
18:06:37 [Zakim]
18:06:37 [Zakim]
- +1.202.331.aadd
18:06:37 [Zakim]
18:06:37 [Zakim]
18:06:37 [Zakim]
18:06:37 [Zakim]
18:06:38 [Zakim]
18:06:38 [Zakim]
18:06:39 [Zakim]
18:06:39 [Zakim]
18:06:41 [Zakim]
18:06:41 [Zakim]
18:06:41 [Zakim]
18:06:41 [Zakim]
18:06:42 [Zakim]
18:06:42 [Zakim]
18:06:42 [phildpearce]
18:06:43 [Zakim]
18:06:43 [Zakim]
- +1.213.239.aaoo
18:06:43 [Zakim]
18:06:43 [Zakim]
18:06:43 [Zakim]
18:06:45 [Zakim]
18:06:45 [Zakim]
18:06:46 [Zakim]
18:06:46 [Zakim]
18:06:47 [Zakim]
18:06:51 [bryan]
Hi Nick did you see my message?
18:06:52 [Zakim]
18:06:54 [Zakim]
18:06:55 [efelten_]
efelten_ has left #dnt
18:07:01 [Zakim]
18:07:05 [Zakim]
18:07:07 [npdoty]
I'm hearing questions about MIT logistics, and will follow up on the mailing list
18:07:12 [npdoty]
Zakim, list attendees
18:07:12 [Zakim]
As of this point the attendees have been BrendanIAB?, dwainberg, walter, +1.408.836.aaaa, Fielding, kulick, +31.65.141.aabb, +1.202.587.aacc, rvaneijk, moneill2, JeffWilson,
18:07:14 [aleecia]
thanks, Nick!
18:07:15 [Zakim]
... Chris_IAB, vincent, npdoty, Brooks, [Microsoft], schunter?, Susan_Israel, samsilberman, peterswire, [CDT], Keith_Scarborough, +1.202.331.aadd, +1.646.654.aaee, DAvid, hefferjr,
18:07:15 [Zakim]
... Chris_Pedigo, Aleecia, Lia, RichardWeaver, +1.917.974.aaff, justin, +1.609.258.aagg, Jonathan_Mayer, efelten_, hwest, +1.202.344.aahh, adrianba, Mike_Zaneis, Peder_Magee,
18:07:16 [bryan]
18:07:20 [Zakim]
... WileyS, +1.646.722.aaii, dsinger, +1.425.214.aajj, +1.425.455.aakk, +1.202.265.aall, Marc_G, +1.206.658.aamm, amyc?, Ted_Leung, +44.772.301.aann, +1.213.239.aaoo, schunter,
18:07:20 [Zakim]
... +1.917.318.aapp, Alan
18:07:20 [Zakim]
18:07:22 [Zakim]
18:07:22 [Zakim]
18:07:24 [Zakim]
- +44.772.301.aann
18:07:25 [Zakim]
18:07:30 [npdoty]
rrsagent, draft minutes
18:07:30 [RRSAgent]
I have made the request to generate npdoty
18:07:47 [Zakim]
18:07:48 [Zakim]
T&S_Track(dnt)12:00PM has ended
18:07:48 [Zakim]
Attendees were BrendanIAB?, dwainberg, walter, +1.408.836.aaaa, Fielding, kulick, +31.65.141.aabb, +1.202.587.aacc, rvaneijk, moneill2, JeffWilson, Chris_IAB, vincent, npdoty,
18:07:48 [Zakim]
... Brooks, [Microsoft], schunter?, Susan_Israel, samsilberman, peterswire, [CDT], Keith_Scarborough, +1.202.331.aadd, +1.646.654.aaee, DAvid, hefferjr, Chris_Pedigo, Aleecia, Lia,
18:07:50 [Zakim]
... RichardWeaver, +1.917.974.aaff, justin, +1.609.258.aagg, Jonathan_Mayer, efelten_, hwest, +1.202.344.aahh, adrianba, Mike_Zaneis, Peder_Magee, WileyS, +1.646.722.aaii, dsinger,
18:07:50 [Zakim]
... +1.425.214.aajj, +1.425.455.aakk, +1.202.265.aall, Marc_G, +1.206.658.aamm, amyc?, Ted_Leung, +44.772.301.aann, +1.213.239.aaoo, schunter, +1.917.318.aapp, Alan
18:08:40 [Chapell]
zakim, aapp is Chapell
18:08:40 [Zakim]
sorry, Chapell, I do not recognize a party named 'aapp'
18:08:53 [schunter]
18:09:14 [Chapell]
Thanks - called in from mobil
18:10:30 [npdoty]
rrsagent, bye
18:10:30 [RRSAgent]
I see no action items
18:10:32 [npdoty]
Zakim, bye