See also: IRC log
<jmayer> BRB
<kulick> i am hearing nothing on the phone
<kulick> okay... just making sure... thx
<Joanne_> thanks Nick
<fielding> now being heard
<vinay> yep
<dwainberg> hello
<Joanne_> hello
can someone scribe this?
<fielding> npdoty, can you type the questions in irc when you get a chance?
“Lifetime browsing history” is a phrase that is often used, but never defined clearly. What would LBH mean as a technical matter?
In light of this definition, what technical measures would suppress or delete LBH?
<kulick> Is someone breathing heavily into their microphone? Could everyone please check if they are? It is making it difficult to hear on the phone.
<jmayer> back
<scribe> scribenick: npdoty
endpoints vs. isp's or browsers or browser plugins
dwainberg: what the concern is?
jmayer: do we have ambiguity on that point?
<vinay> Are we being tasked to define a hypothetical scenario?
jmayer: the URLs that a user has visited
<fielding> "URLs the user has visited"? Does that include third-party URLs? Does it include single-site knowledge vs cross-site knowledge?
jmayer: difference between URLs on a single site, and URLs across multiple sites
dwainberg: kind of new to focus on browsing history
<jmayer> I don't think there's anything new here. The EFF/Mozilla/Stanford proposal focuses extensively on linkability of user activity.
<fielding> current discussion on phone is not relevant to this discussion
npdoty: a definition could be: "list of URLs a user visited on multiple sites"
<fielding> so, LBH is collection of URIs visited over time beyond the scope of a single first party?
<peterswire> #dntc
marc: but for a large first party (AOL owns many different publications), that party might know URLs I've visited on Huffington Post and other publications
jmayer: we've made progress on first vs. third party, can we agree with that?
room: yeah
paul: is the harm the transfer to a third party?
<fielding> assume there is no harm and solve the technical issue for the sake of not having to meet forever
jmayer: had the questions on harm already
npd: repeat of our high-level questions
<vinay> wouldn't LBH mean = the collection of all URLs the user visits (spanning all sites).
<fielding> let's assume lifetime == more than the current browsing session and less than browser product lifetime
marc: retention policies, and minimization policies
<kulick> agree with vinay
dwainberg: different amounts of
time vs different breadths of sites -- not sure there's a
quantitative limit
... domain vs. full URI; retained in a linkable or unlinked
form
npd: could we accomplish business cases with just the domain?
<fielding> The parts of the URI that are needed to retain depends on who is doing the collecting.
npd: minimization of reducing just to domain (rather than path or parameters) could help with privacy concerns
dwainberg: limit to a legitimate business purpose (not disclosed publicly but to an auditor)
<fielding> npdoty, I'd be shocked if folks who think "what you read" is private would be willing to accept domains as "private enough".
ronan: want time limits on retention in addition to amount of data collected
dwainberg: but that could fix the maximum too high
fielding: most concerns are about
domain viewing, rather than page viewing
... technical: not save the data
... 2) cryptographically hash the data
... ... strong enough to not be easily broken
... ... save categories/buckets associated with a URL, rather
than the URL itself
dwainberg: for many businesses,
it's true, but ability to target depends on the time
collected
... converting URLs into interest categories can be done in a
very short time
... reporting might need the domain or path for a longer period
of time
<fielding> vinay: some ad reporting requires proof of the negative -- that a given ad did not appear next to a competitor or on a "bad" site
<dwainberg> (adding some notes, to be sure it's clear)
<dwainberg> For targeting purposes, most 3rd party biz models, have limited need for full URI to be retained -- some 2 secs, some 2 days, some not much longer. For targeting only, there is not a long term need for the URIs.
<dwainberg> However, for measurement, billing, etc, there is a longer term need to retain URI, or at least domain information.
npdoty: could retain full URI but not retain a user identifier
ronan: frequency capping might be a case that requires user identifer
dwainberg: attribution or
conversion tracking
... what is the harm? if it's data breach, then that requires a
different set of solution
<fielding> it sounds like what we are saying is that the mechanical means to suppress LBH will have to differ based on the purpose and timeframe of permitted use
npdoty: concerns identified have been multiple: data breach, government access, malicious use, or just the presence/retention: why does this site know that about me? (trying to give a very brief summary)
<fielding> Another way to look at it … one can disassociate LBH by either 1) reduce data collected about BH; or, 2) remove association of BH with the user/agent/device
+1 to fielding
Are there any compelling use cases for retaining detailed browsing history beyond a general time limit on retention?
If so, how would you limit those use cases consistent with the goals of: (1) limiting LBH; while (2) enabling “buckets” or “low-entropy cookies”?
defining browsing history: URLs (including domain, path, parameters) across multiple sites beyond a session (or request?)
<fielding> Leave it as a question: what would the user find as an acceptable lifetime for their BH? Browsers keep 14 days.
fielding: common default configuration of a browser is keeping history data for 14 days
ronan: but cache could potentially be a lot longer
dwainberg: we should be vague about the length of time
ronan: history might also refer to the content
npdoty: similarly sensitive profile in the ads that I've seen, not just the articles I've read online
<fielding> I am not following the freq capping use case -- it does not mean that you keep a list of every ad seen
ronan: need to keep a list / history of all the ads he had seen
dwainberg: data isn't kept in a single list, would have to join multiple tables
npdoty: does that make a distinction for a user concern?
dwainberg: less likely for an
attacker to breach multiple tables/databases at the same
time
... reduce the concern if you have good internal operational
controls
<Joanne_> its 3:15
paul: "lifetime browsing history"
a scarier term than our scoped definition
... correctly captured the two general techniques (reducing
data, or de-identifying)
... users using multiple devices
... most third parties don't have a Web-wide breadth
... can draw a bright line, but depends on purposes
<fielding> suppressed LBH
<fielding> it isn't quite the same as de-identification since there is still some potential of identifying within the context (e.g., user sends their own name in submit)
third dimension of time, keeping a history for only a minute or only a hour might satisfy suppressing lifetime browsing history
regarding technical measures, can suppress on any of those three
<fielding> hash by site, hash by campaign, hash with limited lifetime salt
deleting data (addressing time); reducing specificity of data (so that you have less than "domain"); removing association to a user (either "de-identified" or aggregation)
dwainberg: use cases -- targeting is easier under suppressing history than financial reporting
fielding: can hash a user to a campaign rather than having a full list of ads seen by a user?
<fielding> more expensive to process identifier when hashed by campaign, but the process is trivially parallel (meaning it can be done at scale)
ronan: more computationally expensive to do so, though
dimensions: data -> full URI; domain & path; domain; extracted category data
assocation -> uid; de-id*; aggregate
time -> [continuous]
<fielding> okay
<Joanne_> thanks
This is scribe.perl Revision: 1.137 of Date: 2012/09/20 20:19:01 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/not relevant/current discussion on phone is not relevant/ Found ScribeNick: npdoty Inferring Scribes: npdoty WARNING: No "Topic:" lines found. Default Present: vinay, Jonathan_Mayer, hefferjr, kulick, Joanne, Fielding, +1.617.253.aaaa Present: vinay Jonathan_Mayer hefferjr kulick Joanne Fielding +1.617.253.aaaa WARNING: No meeting title found! You should specify the meeting title like this: <dbooth> Meeting: Weekly Baking Club Meeting WARNING: No meeting chair found! You should specify the meeting chair like this: <dbooth> Chair: dbooth Got date from IRC log name: 11 Feb 2013 Guessing minutes URL: http://www.w3.org/2013/02/11-dntb-minutes.html People with action items: WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report[End of scribe.perl diagnostic output]