11 Feb 2013

See also: IRC log


vinay, Jonathan_Mayer, hefferjr, kulick, Joanne, Fielding, +1.617.253.aaaa


<jmayer> BRB

<kulick> i am hearing nothing on the phone

<kulick> okay... just making sure... thx

<Joanne_> thanks Nick

<fielding> now being heard

<vinay> yep

<dwainberg> hello

<Joanne_> hello

can someone scribe this?

<fielding> npdoty, can you type the questions in irc when you get a chance?

“Lifetime browsing history” is a phrase that is often used, but never defined clearly. What would LBH mean as a technical matter?

In light of this definition, what technical measures would suppress or delete LBH?

<kulick> Is someone breathing heavily into their microphone? Could everyone please check if they are? It is making it difficult to hear on the phone.

<jmayer> back

<scribe> scribenick: npdoty

endpoints vs. isp's or browsers or browser plugins

dwainberg: what the concern is?

jmayer: do we have ambiguity on that point?

<vinay> Are we being tasked to define a hypothetical scenario?

jmayer: the URLs that a user has visited

<fielding> "URLs the user has visited"? Does that include third-party URLs? Does it include single-site knowledge vs cross-site knowledge?

jmayer: difference between URLs on a single site, and URLs across multiple sites

dwainberg: kind of new to focus on browsing history

<jmayer> I don't think there's anything new here. The EFF/Mozilla/Stanford proposal focuses extensively on linkability of user activity.

<fielding> current discussion on phone is not relevant to this discussion

npdoty: a definition could be: "list of URLs a user visited on multiple sites"

<fielding> so, LBH is collection of URIs visited over time beyond the scope of a single first party?

<peterswire> #dntc

marc: but for a large first party (AOL owns many different publications), that party might know URLs I've visited on Huffington Post and other publications

jmayer: we've made progress on first vs. third party, can we agree with that?

room: yeah

paul: is the harm the transfer to a third party?

<fielding> assume there is no harm and solve the technical issue for the sake of not having to meet forever

jmayer: had the questions on harm already

npd: repeat of our high-level questions

<vinay> wouldn't LBH mean = the collection of all URLs the user visits (spanning all sites).

<fielding> let's assume lifetime == more than the current browsing session and less than browser product lifetime

marc: retention policies, and minimization policies

<kulick> agree with vinay

dwainberg: different amounts of time vs different breadths of sites -- not sure there's a quantitative limit
... domain vs. full URI; retained in a linkable or unlinked form

npd: could we accomplish business cases with just the domain?

<fielding> The parts of the URI that are needed to retain depends on who is doing the collecting.

npd: minimization of reducing just to domain (rather than path or parameters) could help with privacy concerns

dwainberg: limit to a legitimate business purpose (not disclosed publicly but to an auditor)

<fielding> npdoty, I'd be shocked if folks who think "what you read" is private would be willing to accept domains as "private enough".

ronan: want time limits on retention in addition to amount of data collected

dwainberg: but that could fix the maximum too high

fielding: most concerns are about domain viewing, rather than page viewing
... technical: not save the data
... 2) cryptographically hash the data
... ... strong enough to not be easily broken
... ... save categories/buckets associated with a URL, rather than the URL itself

dwainberg: for many businesses, it's true, but ability to target depends on the time collected
... converting URLs into interest categories can be done in a very short time
... reporting might need the domain or path for a longer period of time

<fielding> vinay: some ad reporting requires proof of the negative -- that a given ad did not appear next to a competitor or on a "bad" site

<dwainberg> (adding some notes, to be sure it's clear)

<dwainberg> For targeting purposes, most 3rd party biz models, have limited need for full URI to be retained -- some 2 secs, some 2 days, some not much longer. For targeting only, there is not a long term need for the URIs.

<dwainberg> However, for measurement, billing, etc, there is a longer term need to retain URI, or at least domain information.

npdoty: could retain full URI but not retain a user identifier

ronan: frequency capping might be a case that requires user identifer

dwainberg: attribution or conversion tracking
... what is the harm? if it's data breach, then that requires a different set of solution

<fielding> it sounds like what we are saying is that the mechanical means to suppress LBH will have to differ based on the purpose and timeframe of permitted use

npdoty: concerns identified have been multiple: data breach, government access, malicious use, or just the presence/retention: why does this site know that about me? (trying to give a very brief summary)

<fielding> Another way to look at it … one can disassociate LBH by either 1) reduce data collected about BH; or, 2) remove association of BH with the user/agent/device

+1 to fielding

Are there any compelling use cases for retaining detailed browsing history beyond a general time limit on retention?

If so, how would you limit those use cases consistent with the goals of: (1) limiting LBH; while (2) enabling “buckets” or “low-entropy cookies”?

defining browsing history: URLs (including domain, path, parameters) across multiple sites beyond a session (or request?)

<fielding> Leave it as a question: what would the user find as an acceptable lifetime for their BH? Browsers keep 14 days.

fielding: common default configuration of a browser is keeping history data for 14 days

ronan: but cache could potentially be a lot longer

dwainberg: we should be vague about the length of time

ronan: history might also refer to the content

npdoty: similarly sensitive profile in the ads that I've seen, not just the articles I've read online

<fielding> I am not following the freq capping use case -- it does not mean that you keep a list of every ad seen

ronan: need to keep a list / history of all the ads he had seen

dwainberg: data isn't kept in a single list, would have to join multiple tables

npdoty: does that make a distinction for a user concern?

dwainberg: less likely for an attacker to breach multiple tables/databases at the same time
... reduce the concern if you have good internal operational controls

<Joanne_> its 3:15

paul: "lifetime browsing history" a scarier term than our scoped definition
... correctly captured the two general techniques (reducing data, or de-identifying)
... users using multiple devices
... most third parties don't have a Web-wide breadth
... can draw a bright line, but depends on purposes

<fielding> suppressed LBH

<fielding> it isn't quite the same as de-identification since there is still some potential of identifying within the context (e.g., user sends their own name in submit)

third dimension of time, keeping a history for only a minute or only a hour might satisfy suppressing lifetime browsing history

regarding technical measures, can suppress on any of those three

<fielding> hash by site, hash by campaign, hash with limited lifetime salt

deleting data (addressing time); reducing specificity of data (so that you have less than "domain"); removing association to a user (either "de-identified" or aggregation)

dwainberg: use cases -- targeting is easier under suppressing history than financial reporting

fielding: can hash a user to a campaign rather than having a full list of ads seen by a user?

<fielding> more expensive to process identifier when hashed by campaign, but the process is trivially parallel (meaning it can be done at scale)

ronan: more computationally expensive to do so, though

dimensions: data -> full URI; domain & path; domain; extracted category data

assocation -> uid; de-id*; aggregate

time -> [continuous]

<fielding> okay

<Joanne_> thanks

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.137 (CVS log)
$Date: 2013-02-11 21:39:23 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.137  of Date: 2012/09/20 20:19:01  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/not relevant/current discussion on phone is not relevant/
Found ScribeNick: npdoty
Inferring Scribes: npdoty

WARNING: No "Topic:" lines found.

Default Present: vinay, Jonathan_Mayer, hefferjr, kulick, Joanne, Fielding, +1.617.253.aaaa
Present: vinay Jonathan_Mayer hefferjr kulick Joanne Fielding +1.617.253.aaaa

WARNING: No meeting title found!
You should specify the meeting title like this:
<dbooth> Meeting: Weekly Baking Club Meeting

WARNING: No meeting chair found!
You should specify the meeting chair like this:
<dbooth> Chair: dbooth

Got date from IRC log name: 11 Feb 2013
Guessing minutes URL: http://www.w3.org/2013/02/11-dntb-minutes.html
People with action items: 

WARNING: No "Topic: ..." lines found!  
Resulting HTML may have an empty (invalid) <ol>...</ol>.

Explanation: "Topic: ..." lines are used to indicate the start of 
new discussion topics or agenda items, such as:
<dbooth> Topic: Review of Amy's report

[End of scribe.perl diagnostic output]