18:17:20 RRSAgent has joined #dntd 18:17:20 logging to http://www.w3.org/2013/02/11-dntd-irc 18:17:28 vincent has joined #dntd 18:21:26 wseltzer has joined #dntd 18:23:18 wseltzer has changed the topic to: Phone: +1.617.761.6200, conference code 26634 18:23:28 zakim, this is 26634 18:23:28 wseltzer, I see Team_(dntd)18:30Z in the schedule but not yet started. Perhaps you mean "this will be 26634". 18:23:34 zakim, this will be 26634 18:23:34 ok, wseltzer; I see Team_(dntd)18:30Z scheduled to start in 7 minutes 18:24:34 dan_auerbach_ has joined #dntd 18:27:24 Team_(dntd)18:30Z has now started 18:27:31 +BerinSzoka 18:28:14 +??P6 18:28:29 zakim, ??P6 is me 18:28:29 +vincent; got it 18:28:35 fielding has joined #dntd 18:29:44 aleecia has joined #dntd 18:29:44 johnsimpson has joined #dntd 18:30:35 + +1.415.920.aaaa 18:31:18 am I in right group? 18:31:35 I was sent here though my last name puts me in C 18:31:58 this is last name with S, though right? 18:32:18 yes 18:32:26 Hard to know -- Auerback, Fielding, Doty... 18:32:54 I think Nick is in all groups 18:32:59 I was just asked to lead this section 15 minutes ago or so 18:33:07 That's a good idea -- thanks, Roy 18:33:17 +johnsimpson 18:33:40 +bryan 18:34:29 Always an adventure 18:35:04 do we have the "questions" 18:35:09 npdoty has joined #dntd 18:35:14 -bryan 18:35:39 how many actually in the room? 18:36:04 + +1.650.723.aabb 18:36:09 sidstamm has joined #dntd 18:36:13 zakim, aabb is aleecia 18:36:13 +aleecia; got it 18:37:35 +[Mozilla] 18:37:36 let us know once the room is able to connect and we'll get started 18:37:43 Zakim, Mozilla has sidstamm 18:37:43 +sidstamm; got it 18:37:54 did everyone get the list of questions? or just the group leaders? 18:38:02 i can paste them into irc 18:38:08 if others haven't seen them 18:38:11 please do 18:38:11 I see Dan, John, Sid, Vincent, Aleecia. Presumably Wendy, like Nick, here to staff 18:38:15 Haven't seen them 18:38:16 zakim, unmute me 18:38:16 johnsimpson was not muted, johnsimpson 18:38:35 testing now. anybody hear me 18:38:40 nope 18:38:45 yes 18:38:48 i hear you 18:39:04 yes 18:39:04 robsherman__ has joined #dntd 18:39:15 + +1.425.214.aacc 18:39:58 zakim, mute me 18:39:58 johnsimpson should now be muted 18:39:59 robsherman has joined #dntd 18:40:37 are there "questions"? 18:40:42 schunter has joined #DNTD 18:40:53 zakim, who is here? 18:40:53 On the phone I see BerinSzoka, vincent, +1.415.920.aaaa, johnsimpson (muted), aleecia, [Mozilla], +1.425.214.aacc 18:40:54 AdamTurkel has joined #dntD 18:40:56 [Mozilla] has sidstamm 18:40:56 On IRC I see schunter, robsherman, sidstamm, npdoty, johnsimpson, aleecia, fielding, dan_auerbach_, wseltzer, vincent, RRSAgent, Zakim 18:41:23 where are the questions for the session? 18:41:27 we can be reading meanwhile? 18:41:42 I'd like to see them too 18:41:43 :) 18:41:54 seriously? 18:42:07 paste a url? 18:42:11 Upload the doc please 18:42:19 we're trying... 18:42:37 schunter has joined #DNTD 18:42:39 bryan has joined #dntd 18:42:44 thank you Rob 18:43:10 how many in actual room? 18:43:24 1. “Lifetime browsing history” is a phrase that is often used, but never defined clearly. What would LBH mean as a technical matter? 18:43:24 18:43:24 2. In light of this definition, what technical measures would suppress or delete LBH? 18:43:24 18:43:24 3. Tying LBH to the previous group discussions of “buckets” or “low-entropy cookies,” how can the latter continue while suppressing or deleting LBH? 18:43:24 18:43:24 4. Are there any compelling use cases for retaining detailed browsing history beyond a general time limit on retention? 18:43:25 18:43:25 5. If so, how would you limit those use cases consistent with the goals of: (1) limiting LBH; while (2) enabling “buckets” or “low-entropy cookies”? 18:43:32 1.  “Lifetime browsing history” is a phrase that is often used, but never defined clearly. What would LBH mean as a technical matter? 18:43:50 zakim, unmute me 18:43:50 johnsimpson should no longer be muted 18:43:55 Mike Zaneis, Rob Sherman, Bryan Sullivan, Sam Sherman 18:43:59 Adam Turkel 18:44:04 s/Sam Sherman/Sam Silberman 18:44:23 phone: John Simpson, Dan Auerbach, Aleecia MacDonald, Berin Szoka 18:44:24 q? 18:44:39 ... Sid Stamm 18:44:54 Room+: Wendy Seltzer, Mathias Schunter 18:45:05 wseltzer, I'm here as well you did not hear me? 18:45:09 Dan: going through questions high level, then will focus 18:45:13 (agreement) 18:45:34 Dan: lifetime browsing history defn -- would would LBH mean as a technical matter? 18:45:42 q? 18:45:44 Phone+ vincent Toubiana (thanks) 18:45:48 q+ 18:45:54 tlr has joined #dntd 18:45:55 ack dan_auerbach_ 18:46:10 Samsilberman has joined #dntD 18:46:21 ... If you have big table keyed with pseudonym and table has URIs and timestamps, that's what I think of as LBH 18:46:41 ... Assuming longer than short retention (1 week, 1 mo perhaps) that's my starting point for defn 18:46:52 ... use as working defn and keep going? 18:46:52 q? 18:46:53 q+ 18:46:54 q+ 18:46:59 Susan: ? 18:47:07 ack schunter 18:47:21 Matthias: too strong a requirement. Have URL and for some reason you know they came from same person, that's enough 18:47:36 ... if all articles without URLs, also LBH 18:47:54 ... all books someone has looked at, even without URIs, still LBH 18:48:04 q+ to say it means to me a set of URI data associated with an individual, compiled over a long period of time, and collected/maintained with that purpose in mind 18:48:06 Dan: any dataset known to be same person or device over time? 18:48:22 Matthias: if identifiers of books, not just URIs 18:48:25 Dan: agree 18:48:25 ack wseltzer 18:48:25 q? 18:48:45 ?: if no identifier but you know who it is? 18:48:56 s/?:/sherman:/ 18:49:02 Matthias: know it's the same person, even if not who the person is, that's a LBH 18:49:15 q+ 18:49:15 sherman: why? Why are we concerned if you cannot link? 18:49:24 Matthias: different question. We're answering what's a LBH 18:49:27 q? 18:49:28 q? 18:49:38 (cross talk) 18:49:49 q+ MikeZaneis 18:49:55 q+ Rachel 18:50:22 ATurkel has joined #dntd 18:50:27 ?: find what's alike, determine duration. An individual, to me, it says URI data with an individual whatever that is. Complied over a *long* period of time. Collected and maintained with that purpose in mind. 18:50:30 ack bryan 18:50:30 bryan, you wanted to say it means to me a set of URI data associated with an individual, compiled over a long period of time, and collected/maintained with that purpose in mind 18:50:40 s/?/bryan 18:50:44 ... something specific, not what you can do with it, but the record that is collected and maintained 18:51:13 Mike: supportive of Peter's intro, thought we have identified an issue we can agree is a consumer privacy issue that might be able to be addressed 18:51:15 The history (if not linkable to any person) seems far less critical as compared to a LBH that can be associated with a person. 18:51:16 ack Mike 18:51:38 Mike: 3rd party collection of a *long* history, not defining long yet, point being that's "tracking" 18:51:43 zakim, wh ois on the phone? 18:51:43 I don't understand your question, tlr. 18:51:49 ... what we've focused on is the 3rd party tracking 18:51:55 zakim, who is on the phone? 18:51:55 On the phone I see BerinSzoka, vincent, +1.415.920.aaaa, johnsimpson, aleecia, [Mozilla], +1.425.214.aacc 18:51:57 ... tried to come up with more transparency and control 18:51:57 [Mozilla] has sidstamm 18:52:12 I'm confused about the "for that purpose" part of the definition... what is the purpose referenced by "that"? 18:52:18 ... not interested in lifetime browsin history, but want to agree on the scope, and more interested in the other questions on the list 18:52:34 ... let's get on to the next questions unless we're not just talking about 3rd party 18:52:46 Dan?: not sure we want to get too deep, other comments? 18:52:50 +1 to moving forward to the other questions 18:52:51 q+ 18:52:57 ack wseltzer 18:52:58 q+ 18:53:00 q? 18:53:14 ack Rachel 18:53:31 ack aleecia 18:53:38 rachel_thomas has joined #dntd 18:53:45 +1 to Aleeia 18:54:04 aleecia: would include 1st parties in LBH for defn, though perhaps not what we care about under DNT 18:54:07 ack aleecia 18:54:10 BerinSzoka_ has joined #DNTD 18:54:12 ack schunter 18:54:15 +q 18:54:28 matthias: wouldn't just discuss 3rd parties, but may constrain to just 3rd parties 18:54:40 dan: purpose of the dataset shouldn't be part of the defn 18:54:43 +1 dan 18:54:52 dan: 1st or 3rd party shouldn't be part of defn 18:54:57 q? 18:55:05 dan: moving to Q2 18:55:08 In light of this definition, what technical measures would suppress or delete LBH? 18:55:11 link to the questions? 18:55:18 thanks, Matthias! 18:55:32 [2 2. In light of this definition, what technical measures would suppress or delete LBH?] 18:55:42 dan: rough idea of defn. what tech measures suppress / delete? 18:56:04 peterswire has joined #dntd 18:56:22 matthias: long series of events, can regularly suppress link ability. every fixed time you start collecting fresh, not long-term any more 18:56:32 q? 18:56:51 could someone please share the list of questions? 18:56:51 ... if you use cookies and you throw away cookies and set new ones, unless you do new things, that breaks the linkability 18:56:56 rrsagent, make record world 18:57:08 dan: hear you're changing the pseudonym 18:57:10 Regular breaking the linkability (e.g., by erasing cookies while not using any other linking-ability) 18:57:11 matthias: yes 18:57:20 dan: not enough. also storing IP address, can link 18:57:39 Not storing a dataset that can link two "subsequences" of the LBH. 18:57:41 dan: need strong notion beyond moving from one cookie to another. 18:57:45 MikeZaneis has joined #dntd 18:57:45 q? 18:57:50 q+ 18:57:52 q? 18:57:54 ack dan_auerbach_ 18:58:03 ack dan_auerbach_ 18:58:09 ack wseltzer_cpdp 18:58:21 wendy: if we have long but unid'ed history, addition of one piece of linking data could tie that back 18:58:35 ... rotating identifiers to break into shorter periods of time might be useful 18:58:42 q+ 18:58:48 dan: reasonable suggestion 18:59:02 Interesting question: re-linkability of sub-sequences. 18:59:10 ... fields that can link between records or data sets, important to look at everything, including time stamps 18:59:24 ... can correlate prior records to new ones 18:59:28 [what about fuzzing of data?] 18:59:50 ... broadly, want to look at all fields you are collecting and make sure none can correlate 19:00:01 ... can go from timestamp to a day or an hour 19:00:04 ack aleecia 19:00:22 dan: quickly through other qs then focus and make progress 19:00:41 bryan: confused about suppression 19:01:10 ... if it's impossible to correlate then you have suppression is that fair? 19:01:19 dan: yes, or make data less specific 19:01:24 If you cannot correlate two browsings that are a long time apart, then you suppressed the LBH. 19:01:31 bryan: url being one piece of the dataset, ok 19:01:53 ... different question: what is the tech that will enable decor over time is an arms race. not productive to get into details. 19:02:12 ... what we learned from HIPPA is best we've seen, don't know we'll do better 19:02:42 dan: should strive to do better. agree normative lang to specify a technique is not the way to go. but let's brainstorm 19:02:49 ... HIPPA missed the mark, we can do better 19:02:53 q+ 19:03:15 ... let's at least explore even if we don't suggest a particular thing 19:03:49 ack aleecia 19:04:45 aleecia: want to find a nice balance that can suppress while still providing a benefit for privacy and maximize monetary benefit 19:04:49 aleecia: we could just delete all URIs. presumably there are ways the data is useful for industry / profitability - how do we do that more privacy protecting? 19:04:50 BillScannell___ has joined #dntd 19:05:03 matthias: can we delete after 90 or 60 days? 19:05:19 Rob: matthias is looking at me :-) 19:05:25 ... timeline is not a LBH 19:05:39 ... can click Like button to add things to record, but one-off basis 19:05:52 ... not LBH, it's one off, and it's affirmative action from the user 19:06:09 I'm unsure of the value, in this working group and DNT context, of focusing on techniques for ensuring long-term records of data are not correlatable. That is pretty deep science for this group. I think at most we can set objectives, and let the market develop techniques that meet the objectives. 19:06:10 Matthias: affirmative action of user is important, not our concern 19:06:33 rob: if you choose to use a tool, it's out of scope 19:06:44 (bryan, thank you for adding what i didn't capture well enough) 19:07:02 - +1.425.214.aacc 19:07:03 +W3C 19:07:11 dan: not talking about bits and pieces a user affirmatively adds. this is a background thing that happens without the user's knowledge 19:07:26 (speaker phone troubles) 19:07:43 -- resolved. 19:07:59 dan: agree that the piece Rob & Matthias are talking about should go into the defn -- not a discrete set of user added items, but something automatic and regularly 19:08:14 (not sure I agree, but there is some line there) 19:08:18 zakim, W3C has Bryan, wseltzer, schunter, Mike_Zaneis, Rachel_Thomas, sherman, Adam_Turkel, Sam_Silberman 19:08:18 +Bryan, wseltzer, schunter, Mike_Zaneis, Rachel_Thomas, sherman, Adam_Turkel, Sam_Silberman; got it 19:08:42 Rob: plugins are short period of time, need for trouble shooting. Not kept more than 90 days. 19:08:49 ... then it's not identifiable form 19:09:10 Mike: attribution, analytics, targeting -- vary from short to long 19:09:25 Matthias: ad networks use for more than a year, is that normal for a campaign? 19:09:30 how long are "relatively short periods" 19:09:36 ... big data? 19:09:50 1 year: seasonal and campaigning. 19:09:57 Mike: varies. Over a year for seasonal campaigns to adjust inventory or for market research 19:10:03 ... need longer than a year 19:10:37 ... interesting discussion we can have is other ways to get the insights for an ad model but less identifiable or sorter retention 19:11:07 q? 19:11:07 ... you get wide ranges of time, and if first party even wider. Carriers will have lots of reasons to keep in identifiable format for longer. 19:11:11 q? 19:11:22 aleecia++ 19:11:39 dan: makes sense. Digression to ad world, if anyone there can help me understand 19:12:15 peterswire has joined #dntd 19:12:23 ... for behavioral targeting, can conclude interested in sports apparel. Have a URL then to several buckets, male, 30-40. 19:12:30 ... break into profiles, or use raw URL 19:12:33 q+ 19:13:08 rachel: can't speak to URL question, but isn't just you looked at a sports page. History across consumers, and over time, to reach conclusion not related to sports. 19:13:18 I think they use full URL when they do retargetting 19:13:23 ... use of crest means republican, colgate is democrat 19:13:38 q? 19:13:41 ... inferences and corrolations, even if not identifiable 19:13:52 q+ 19:14:06 I suspect that today, data is just kept to allow later re-mining with new algorithms. 19:14:36 Mike: ways data is currently used, if I were a marketer running super bowl ads, sponsor webpage for it plus TV commercial. 6 months later, want to know if someone came back to your site to get more info 19:14:47 ... would want to measure effectiveness of ad campaign 19:15:02 ... was it worth it to sponsor the site? Want to measure long term. 19:15:21 ... marketers, ad networks, would want to know which creative on that site was more effective. 19:15:40 ... immediate conversion may not be what builds longer-term brand recognition 19:15:42 q+ 19:15:52 ack aleecia 19:15:53 ... insights are valuable throughout supply chain 19:16:42 aleecia: we've heard in the past that buckets change 19:17:07 bryan_ has joined #dntd 19:17:32 ... trying to predict/mine the correlations in advance is difficult 19:17:56 q? 19:18:08 ... tradeoffs will vary from company to company; some tech to bridge the gaps 19:18:14 of course, what Aleecia just said assumes that a significant percentage of the market will not be DNT users--which, I'm not sure we can assume, given Microsoft, etc. 19:18:42 aleecia: may be able to get the bulk of the value with buckets rather than URIs, do lose time / money if you need to start data collection from scratch on a new unforeseen topic. Different companies have different costs. 19:18:52 ... may be able to sample from non-DNT users 19:19:05 John: can't we draw the inferences and get rid of the URIs? 19:19:24 Mike: don't disagree. Identifying how data is used. 19:19:30 q+ to mention retargeting 19:19:46 ... What would be impacted if group changed focus to what you're jumping ahead to. 19:20:00 ack johnsimpson 19:20:17 John: think we need to understand that. If end goal is to in fact eliminate URIs, let's think of ways to make inferences necessary 19:20:23 q? 19:20:51 Mike: not just inferences. Cost per impression moves to cost per click or cost per action. Purchase funnel and ads paid for in different way 19:21:03 ... valuable to know how an action came about 19:21:13 ... importatnt to the analytics of the internet 19:21:24 ... could carve out ad delivery and reporting 19:21:49 q+ 19:22:01 ... perhaps this new approach on "data hygiene" for URIs -- sometimes URIs are really necessary 19:22:09 ... do we carve them out or can we do better than that? 19:22:18 ... can we find a better balance? 19:22:24 q? 19:22:28 ack next 19:22:47 SamSilberman has joined #dntd 19:23:02 matthias: question on campaign measurement. Not a good use. If you do a superbowl campaign, after 90 days wouldn't be able to know if actions were impressed by this campaign or by another 19:23:02 peterswire has joined #dntd 19:23:14 ... big reaction in first 30 days, maybe long tail, but not so big 19:23:38 q+ to ask sampling? 19:23:40 Mike: good use case, but your point is you don't need it for a life time. There is a shorter effective useful life for that URI in that example 19:23:52 ... for web analytics and attribution, but maybe not for a full year 19:24:04 ... do I want to pay an ad network a year later after they run an ad? 19:24:18 Matthias: use case makes sense, but longevity of browsing history is limited. 19:24:21 ... can cut it. 19:24:37 Dan: another question, let's get into use cases for over 90 days. Maybe we can bracket that. 19:24:42 ack 19:24:44 ack schunter 19:24:48 ack next 19:24:49 vincent, you wanted to mention retargeting 19:25:04 vincent: use case where you need full URL for retargeting need exact URL 19:25:10 ... know which products viewed 19:25:18 dan: need exact URL, or the product? 19:25:26 vincent: not sure 19:25:47 matthias: is seasonal common? 19:26:01 ... if valentine's day, view flowers, will a year later remember me? 19:26:22 Mike: long time period, likely to try mother's day 19:26:29 ... depends on who's doing it 19:26:57 ... is it the website, a 3rd party, how granular do they need -- it varies. Plethora of different business models 19:27:12 Matthias: know of any long-term focus companies? 19:27:32 Mike: tried to limit to 3rd party, easier to answer. If 1st party, answer is yes. 19:27:53 ... small publisher (missed) gets 90% of traffic in November. may re-target in November. 19:28:04 ... ad networks not as much, but 1st parties do 19:28:06 Q+ 19:28:07 q+ SamSilberman 19:28:15 Matthias: 1st parties more likely to keep longer than 3rd parties 19:28:27 Mike: yes, more valuable than for 3rd parties 19:28:41 Mike: 3rd Q on low-entropy cookies, please describe? 19:28:48 Dan: different issue 19:29:02 ... keep them separate 19:29:08 ... move to client-side solutions 19:29:21 ... browser stores user info and selectively doles that out to advertisers 19:29:30 ... browser makes decisions about targetting 19:29:39 ... low-entropy cookies is a simple way to do this 19:29:57 ... instead of unique identifier, set a cookie for "sports person" on millions or thousands of users 19:30:08 ... small set of different sorts of cookies, all client side 19:30:16 ... don't need to retain it all on the server side 19:30:26 ... clinet-side will evolve over the next year or so 19:30:30 q+ 19:30:33 q- 19:30:48 mike: my publisher members will *hate* that, but thanks for the description 19:30:53 q- later 19:31:11 ack dan_auerbach_ 19:31:13 dan: example of super bowl, 6 months later, value of data drops off 19:31:28 ... assume a visit / impression is less valuable information than a click or an action 19:31:42 ... wondering relative weight of URIs for impression than click or action data 19:32:10 rob: for targeting, we don't do this and Mike just left, we should not lose sight of other use cases 19:32:20 ... might want to know if ad campaigns are performing well 19:32:39 ... might want to know looked at site after campaign ran and was due to that campaign 19:32:50 q? 19:32:54 rachel: super bowl is just a moment in time. Not the normal case. 19:33:22 peter: Q about 1st and 3rd party. 19:33:58 ... 3rd party networks have visibility across more sites. Can ask to delete data / portability from 1st parties, moving that way. Harder to do with 3rd parties users haven't seen. 19:34:00 q+ 19:34:02 ... seeing as same? 19:34:22 Dan: 3rd parties more likely to not need data as long, either, as 3rd difference 19:34:27 ack peterswire 19:34:49 ack rachel_thomas 19:35:01 Rachel: FB letting you delete is different from Amazon letting you delete purchase history, which you could not do 19:35:23 Peter: transactions and financial, but not need URI details 19:35:32 ... would be purple shirt not the green shirt 19:35:36 q? 19:35:40 q+ 19:35:41 q+ 19:35:49 q? 19:35:54 sam: long tail and first party issues 19:36:04 ... seasonal business for our customers 19:36:17 ... want to know how they got the customer in the first place 19:36:59 ack SamSilberman 19:37:00 ... how do I acquire new customers 19:37:31 Rachel: if you include browsing history with any identifier, need small business to know XYZ identifier from what source, might be more useful than who the user is. 19:37:36 cross talk 19:37:51 Rachel: still on conversation about what would be necessary for this information? 19:37:54 my regrets, I have to drop off for a while. 19:37:57 q? 19:37:58 -[Mozilla] 19:38:00 ack next 19:38:02 wseltzer_cpdp, you wanted to ask sampling? 19:38:03 sidstamm has left #dntd 19:38:07 ack wseltzer_cpdp 19:38:25 Wendy: hearing some uses, and sampling could be effective. Others where it is not. 19:38:27 q- 19:38:44 ... could sample time slices or user segments. 19:38:57 ... retargeting is specific and sampling does not work 19:39:08 dan: been blurring these. 19:39:23 q? 19:39:26 ... high level statistics v. exact URI, should keep clear 19:39:43 Rob: Peter's question, assuming LBH is across sites over time 19:39:52 Dan: not so clear to me 19:39:58 Rachel jumps in: unclear 19:40:01 ack robsherman 19:40:07 Rob: went to WaPo - 19:40:08 yes yes 19:40:12 Rachel interrupts 19:40:23 Rob: do think it different for reasons Peter describes 19:40:39 Rob: can look at retention or not visit a first party, situation is different 19:40:41 [could be a question to consider: does an LBH across single site pose fewer user concerns than LBH across many sites?] 19:40:45 q+ 19:40:45 ... doesn't require DNT 19:40:49 q+ 19:41:13 Dan: not disagreeing, but for users, understanding what happens on FB is not always clear 19:41:22 ... may not have clear mental model on FB 19:41:37 ... may not affect DNT discussion though 19:41:39 q? 19:41:39 q? 19:41:47 zakim, close queue 19:41:47 ok, wseltzer, the speaker queue is closed 19:42:05 q? 19:42:11 ack next 19:42:38 john: if LBH, 1st, 3rd, 5th and 6th parties. May have different requirements though for 1st and 3rd party. 19:42:51 ... but LBH involves all the pages you view on a site if it's kept 19:43:02 ... what we do about that is different. But defn is not just x-site 19:43:06 q- 19:43:15 +1 19:43:27 I'd be happy to defn now and add "we may not care about 1st parties" 19:43:34 Rob: contexts are different 19:43:37 q? 19:43:43 zakim, reopen queue 19:43:43 ok, wseltzer, the speaker queue is open 19:43:43 See no reason not to defn... 19:44:02 topic: Understanding use cases for long-term retention 19:44:16 Rob: use cases for long periods. Wendy brought up keeping retarget data as different 19:44:39 Dan: other use cases beyond seasonal to need full URI 1 year later? 19:44:45 q+ 19:44:46 ... anyone able to offer those? 19:44:52 ack ra 19:45:05 rachel: IP, fraud detection 19:45:18 ... verify users for IP perspective 19:45:26 ... access to accounts, subscription accounts 19:45:37 [IP as intellectual property] 19:45:57 Dan: worked in fraud detection in industry, but click data is more useful than impression 19:46:12 Rachel: fraud areas not just in delivery and reporting but also for IP 19:46:15 q? 19:46:18 what kind of IP issues? 19:46:20 Dan: would be interested in hearing more 19:46:32 ... can we learn more? 19:46:41 Rachel: will see about finding a resource 19:46:57 Sam: subpoenas for data 19:47:14 ... if court ordered, that's an exception, and you have to retain and produce it. 19:47:25 q+ 19:47:26 matthias: that's a reason for keeping less data 19:47:51 Well one of the reasons not to keep data is precisely so it won't be subpoenaed. 19:47:51 ... large enterprises have retention policies to avoid costs of discovery 19:48:02 ?: as policy, don't keep what you don't need 19:48:19 Dan: we all agree you have to produce data if compelled 19:48:29 Mike: they can go on quite a long time 19:48:49 Sam?: fraud can be someone breaking into your system and need proof, that's first person 19:48:58 ... might want to retain that data 19:48:59 q? 19:49:01 s/Mike/Bryan/ 19:49:10 Dan: permitted uses, that's interesting 19:49:24 Dan: what's needed over a year? 19:49:24 ack next 19:49:27 q? 19:49:49 Rob: bleeds into permitted uses 19:49:54 q+ 19:50:02 ... things folks reasonably want to do beyond short span of time 19:50:15 ... we do a lot of analytical work on FB 19:50:42 ... fake accounts, child predators, don't disclose details 19:50:53 ... not fraud or security but site integrity 19:51:00 Dan: would "abuse" work? 19:51:10 Rob: in general, but don't know how you write that 19:51:15 q+ 19:51:31 Bryan: terms of use. Need users to follow them. 19:51:52 Rob: broader than that, policy might not say "no child abusing" but we should deal with it 19:52:10 ?: have same thing 19:52:12 q? 19:52:15 Dan: need to end soon 19:52:16 q? 19:52:31 dan: go through queue then summarize 19:52:31 s/?:/Sam:/ 19:52:39 q? 19:52:52 wendy: useful in this exercise, different data needs for different uses 19:53:22 ... might be the case that no one needs URIs plus time stamps plus sites visited, but someone needs URIs but fuzzy time, someone else needs both but for a subset of users for sampling 19:53:42 ... another is URIs and times at suspicion of fraudulent access 19:53:49 +1 19:53:50 ... the more specific we can be, 19:54:04 dan: great idea, and understanding tradeoffs would be great 19:54:10 ack aleecia 19:54:12 ack wseltzer 19:54:28 aleecia: let's write down definition of LBH 19:54:46 ... note we're not currently contemplating 1st parties and 3d parties doing the same things 19:55:02 ... 3, let's try a strawman ona specific timeframe 19:55:48 Dan: talking about 1st and 3rd parties, not context of data collection 19:56:02 ... not an explicit user action generating the data, FB timeline isn't what we mean 19:56:29 ... collection of info derived from site visits, from same person / device, would be a LBH 19:56:34 q+ 19:56:41 ... books looked at on AMZN would be LBH 19:56:56 (that sounds right to me) 19:57:17 Rachel: even if it's not connected to a unique id? If there's no connection, why is there concern? 19:57:31 ... the idea you would suppress something not identifiable expands the world 19:57:42 ... need some ability to be identified 19:58:01 Dan: has to be some sense in which there's knowledge that things are linked 19:58:10 [some anonymized info can easily be re-linked to an individual] 19:58:23 ... if collection of ISBN numbers and it's random, ok 19:58:37 ... if collection all from one person, that makes it a LBH without an identifier 19:58:42 Matthias: can re-identify 19:58:56 ... my browsing history for two months, use schunter.org regularly 19:59:03 ... could make a good guess it's from me 19:59:06 q+ 19:59:12 ack ra 19:59:15 ... search histories have identifying terms 19:59:19 ... can make good guesses 19:59:27 rachel: if in buckets? 19:59:35 matthias: that would be ok 19:59:44 ... can you re-associate is the question 19:59:57 ... if "went to FB, GOOG," that could be ok 20:00:06 peterswire has joined #dntd 20:00:07 q? 20:00:18 rachel: important because (sorry, missed - please fill in) 20:00:30 matthias: if browsing history is shared, k-anon, typical 20:00:42 rachel: how do we get that in the defn? 20:00:45 q+ 20:01:00 matthias: in LBH, can only do top 10 sites :-) 20:01:14 rob: google.com/dansinbox is identifying 20:01:23 ... bucketing to google.com might be reasonable 20:01:24 s/rob/dan_auerbach 20:01:35 ... smaller sites into sports sites might be useful 20:01:38 (thanks!) 20:01:50 Dan: k-anon has no ambiguity 20:01:55 q? 20:01:57 ... can navigate those waters 20:02:01 q? 20:02:15 rob: being careful that we're not conflating LBH and de-id'ed data 20:02:18 ... in theory ... (afaik the def. contains "background knowledge" of the adversary) 20:02:19 ... two concepts 20:02:19 ack rob 20:02:55 rob: in example, amazon could say "here's the list of all the books a person looked at" not sensitive but valuable 20:03:13 ... different from "and I can tell Matthias is the person who looked at them" 20:03:18 ... no privacy problem 20:03:33 ... get worried if one of those books is "Matthias' web mailer" it's linkable. 20:03:44 Bryan?: what is an individual 20:03:52 ... if not tied to a person, not indivisual 20:04:04 ... not a history, just a record, if it's not tied 20:04:17 dan: hear you, important to keep LBH separate 20:04:19 q? 20:04:24 ... more on this tomorrow with Ed 20:04:31 ... papers on re-id 20:04:53 ... you have databases and can re-id 20:05:08 ... don't need to answer that now, just what is a LBH 20:05:33 q+ 20:06:08 aleecia: Papers, Netflix contest (Narayan & Shmatikov) Anonymized users can be id'd by reference to another database, and you dojn' thave control over others' databases 20:06:41 ... k-anonymity and buckets, ways of thinking about the long tail of re-identifiable data 20:06:53 q- 20:07:03 q+ 20:07:09 ack aleecia 20:07:11 q? 20:07:19 ... we don't have to solve it here, can set aside with "if you have an unlinkable data-set" 20:07:31 matthias: shorter histories -> easier k-anon 20:07:49 ... 4 days of sites, not full URLs, then many users will be the same 20:08:03 ... the longer, the more difficult to get k-anon 20:08:16 ... month-long history is not as likely as possible with full URLs 20:08:19 Dan: agree 20:08:45 matthias: longer the history, the more difficult the k-anon. the more data, the less likely users have the same profile 20:08:49 Dan: agree on that too 20:09:00 ("just agree" and "disagree" sound similar :-) 20:09:15 q+ to mention that the potential that unlinked data is somehow made linkable later is real, but should not impact the compliance of who recorded the unlinked data, instead it's the fault of the party that relinked the data. Thus a record of unlinked data does not represent a long-term browsing history. 20:09:20 Dan: if no timestamp, easier too. fewer fields -> easier to have de-linked data set 20:09:22 ack schu 20:09:23 ack schunter 20:09:28 ack bryan 20:09:28 bryan, you wanted to mention that the potential that unlinked data is somehow made linkable later is real, but should not impact the compliance of who recorded the unlinked data, 20:09:32 ... instead it's the fault of the party that relinked the data. Thus a record of unlinked data does not represent a long-term browsing history. 20:10:05 bryan: unlinked data that was recorded but later turns out to be linkable, that data as recorded doesn't represent a browsing history 20:10:22 ... that some future party can resurrect it doesn't make it a browsing history 20:10:28 -1 20:10:47 bryan: if there is a fault in this, it is the fault of the person who does the resurrection 20:11:08 ... if there's no link to an individual, it doesn't represent a browsing history 20:11:21 ... if in the future it's not the fault of the recording company 20:11:32 ... the only response if you disagree is not to record anything 20:11:41 matthias: netflix example is nice 20:11:50 bryan: get it's possible to re-link 20:12:33 ... but if you've done everything to the state of tech today, you've fulfilled expectations. If addition of other data that's put together and the user didn't authorize it, then it's the party who ressurected that data 20:12:41 Dan: grey area but need to end in 5 minutes 20:13:31 ... don't quite share that view. What if dataset was linked to Mr. Man, and he did bad things, and works at EFF, can get down to a few people. Then all it takes is one fact not in the db to identify it 20:13:46 Different argument: being able to relink *is* a current known threat, Bryan 20:13:47 q+ 20:13:55 We know this is real. We should account for it. 20:14:01 q- 20:14:05 Rachel: fault is not helpful yet 20:14:23 ... that would include identifiers. 20:14:43 Rachel: there is the possibility of a browser history that is not identifiable 20:15:06 Bryan: if no identifier, it's not linked, it's not a browsing history. Period. 20:15:13 (full disagreement from me) 20:15:25 Dan: we'll get back to defn 20:15:50 ... Bryan disagreeing on defn in that, don't think a set of ISBN numbers from a specific user are an LBH 20:15:58 ... don't know which user, just that it's one user 20:16:05 ... just a list of movies a person watched 20:16:20 Rob: don't think that's consensus for that. not an LBH 20:16:41 Dan: hearing no consensus there. But is automatic collection of data, rather than affirmative user choice 20:16:49 ... has to be retained, haven't picked a time limit 20:17:01 ... do we want to say a month as a working limit? 20:17:08 ... just as a defn 20:17:08 no. a day 20:17:08 +1 I agree that "fault" was not the intent of my point, but that the party that saves an unlinked (to a person) set of related browsing records is not recording an individual's browsing history. 20:17:19 Rob: 6 weeks, 90 days, 365 20:17:27 ... 30 days too short 20:17:30 i'm serious 20:17:35 I can live with 6 weeks 20:17:55 Mike: ad campaign for 1 month at least 20:18:01 ... and months are longer than 30 days 20:18:07 ... need to batch & process data 20:18:10 ... 30 doesn't work 20:18:34 Dan: not sure we want to link this to retention for de-id 20:18:44 Can we agree under 3 months? 20:18:52 Where we still debate, but under 3 months? 20:19:09 Dan: 1st parties may want to keep things longer, use cases for 3rd parties too 20:19:25 Matthias: 3rd parties less likely to need long-term retention 20:19:42 Mike: agree, but marketer may find more useful for longer 20:19:57 ... most ad networks won't use it for really long, but marketer may 20:20:09 Rachel: can use de-id'ed but need inferences 20:20:26 ?: seasonal is a common thing 20:20:30 Dan: anything else? 20:20:36 when is next session? 20:20:46 thanks, Dan 20:20:49 main room when? 20:20:59 thanks Wendy! 20:21:01 3:45pm main room 20:21:05 -W3C 20:21:05 thanks 20:21:06 - +1.415.920.aaaa 20:21:07 -johnsimpson 20:21:10 -vincent 20:21:10 vincent has left #dntd 20:21:12 -aleecia 20:21:13 johnsimpson has left #dntd 20:21:28 thanks, all! 20:21:52 -BerinSzoka 20:21:54 Team_(dntd)18:30Z has ended 20:21:54 Attendees were BerinSzoka, vincent, +1.415.920.aaaa, johnsimpson, bryan, +1.650.723.aabb, aleecia, sidstamm, +1.425.214.aacc, wseltzer, schunter, Mike_Zaneis, Rachel_Thomas, 20:21:54 ... sherman, Adam_Turkel, Sam_Silberman 20:37:24 rrsagent, draft minutes 20:37:24 I have made the request to generate http://www.w3.org/2013/02/11-dntd-minutes.html fielding 20:56:01 peterswire has joined #dntd 21:03:52 bryan has left #dntd 21:10:01 q? 21:16:49 schunter has joined #DNTD 21:28:05 q? 21:33:15 schunter has joined #DNTD 21:38:39 rrsagent, draft minutes 21:38:39 I have made the request to generate http://www.w3.org/2013/02/11-dntd-minutes.html tlr 21:39:36 q? 21:40:03 peter, wrong channel 21:54:27 peterswire has joined #dntd 22:04:56 schunter has joined #DNTD 23:44:25 Zakim has left #dntd