Privacy/DNT-Breakouts

From W3C Wiki

Main room

Star Conference Room, D463

IRC: #dnt

Phone: +1.617.761.6200, conference code 87225

Breakout rooms

Group A

IRC: #dnta

Phone: +1.617.761.6200, conference code 26631

Meeting room: G451

Last Names: A-D

Section Leader: Justin Brookman

rough minutes

Section Leader Day 2: Justin Brookman

day 2 minutes

Group B

IRC: #dntb

Phone: +1.617.761.6200, conference code 26632

Meeting room: Danny's office, 5th floor, W3C Offices

Last Names: E-L

Section Leader: Nick Doty

rough minutes

Section Leader Day 2: Frank Wagner / Nick Doty

day 2 minutes

Group C

IRC: #dntc

Phone: +1.617.761.6200, conference code 26633

Meeting room: G631

Last Names: M-R

Section Leader: David Singer

rough minutes

Section Leaders Day 2: Ed Felten / Thomas Roessler

day 2 minutes

Group D

IRC: #dntd

Phone: +1.617.761.6200, conference code 26634

Meeting room: Jeff's office, 5th floor, W3C Offices

Last Names: S-V

Section Leaders: Dan Auerbach / Wendy Seltzer

rough minutes

Section Leader Day 2: Yianni Lagos

day 2 minutes

Group E

IRC: #dnte

Phone: +1.617.761.6200, conference code 87225

Meeting room: Star Conference Room, D463

Last Names: W-Z

Section Leaders: Heather West / Thomas Roessler

rough minutes

Section Leader Day 2: Heather West

day 2 minutes

Tuesday break-out session

1. What term should be used to describe what is out-of-scope for DNT? “De-identified”, “unlinkable”, some other?

2. The FTC definition of de-identified is reproduced below. Are there any changes from it that should become the normative text for DNT on this topic?

3. What are some examples of technical measures that clearly ARE or ARE NOT strong enough to meet the de-identification standard?

4. When, if ever, should pseudonyms be permitted for information held in de-identified form? Is that the same as asking when a unique or persistent identifier should be permitted?


FTC Language that matches closely with bare bones:

data is not “reasonably linkable” to the extent that a company: (1) takes reasonable measures to ensure that the data is de-identified;

(2) publicly commits not to try to reidentify the data; and

(3) contractually prohibits downstream recipients from trying to re-identify the data.

Commission's definition of "de-identified":

"First, the company must take reasonable measures to ensure that the data is de-identified. This means that the company must achieve a reasonable level of justified confidence that the data cannot reasonably be used to infer information about, or otherwise be linked to, a particular consumer, computer, or other device."


From bare bones circulated to group:

3.6.1 Option 1: Unlinkable in Ordinary Course of Business

A party render a dataset unlinkable when it 1. takes [commercially] reasonable steps to de-identify data such that there is high probability that it contains information which could not be [reasonably] linked to a specific user, user agent, or device [in a production environment] 2. publicly commits to retain and use the data in unlinkable fashion, and not to attempt to re-identify the data 3. contracually prohibits any third party that it transmits the unlinkable data to from attempting to re-identify the data.

Parties should provide transparency to their delinking process (to the extent that it will not provided confidential details into security practices) so external experts and auditors can assess if the steps are reasonably given the particular data set.

3.6.2 Option 2: Unlinkable Data

A dataset is unlinkable when there is a high probability that it contains only information that could not be linked to a particular user, user agents, or device [by a skilled analyst]. A party renders a dataset unlinkable when either: 1. it publicly publishes information that is sufficiently detailed for a skilled analyst to evaluate the implementation, or 2. it ensures that the dataset is at least 1024-unlinkable.

Monday break-out session

High-level questions for group leaders:

1. “Lifetime browsing history” is a phrase that is often used, but never defined clearly. What would LBH mean as a technical matter?

2. In light of this definition, what technical measures would suppress or delete LBH?

3. Tying LBH to the previous group discussions of “buckets” or “low-entropy cookies,” how can the latter continue while suppressing or deleting LBH?

4. Are there any compelling use cases for retaining detailed browsing history beyond a general time limit on retention?

5. If so, how would you limit those use cases consistent with the goals of: (1) limiting LBH; while (2) enabling “buckets” or “low-entropy cookies”?


Background and more detailed set of questions for group leaders to consider:

1. Describing the task: what would it mean to say that a standard means that a user will have “no lifetime browsing history” (“LBH”) or “no long-term browsing history” (also “LBH”) across multiple sites? Roughly speaking: (a) limit on specific content in refers (such as search terms); (b) limit on specific story title on a newspaper site (“newspaper.com” is not suppressed, but “newspaper.com.specific story on a government leader’s personal life” is suppressed); (c) also suppress “newspaper.com”?, or (d) anything else?

[Note: Today we are focusing on this task; not taking a position in this exercise about what other mechanisms, inside or outside of DNT, may address user choice about target marketing.]

2. Given that task definition, what measures exist that could address or achieve suppression of LBH? Deletion? Of what? Delinking or de-identification? (Note – the group session on Tuesday will be on specific techniques of de-identification/delinking, after Ed Felten’s presentation on that subject.) Other ways so that the detailed URIs do not go past a certain time?

3. There is interest in “low-entropy cookies” or “buckets” continuing along with the limits on LBH. What would it mean, technically, to continue these while suppressing LBH? Where use buckets/low-entropy cookie, how define minimum bucket size? Any other dimensions relevant to designing what would qualify as a bucket or low-entropy cookie?

4. Any other big-picture things to consider if DNT standard leads to suppression of LBH, while permitting low-entropy cookies?

5. What role for retention of IP addresses, for what purposes, in suppressing LBH?

6. Here is one option for “short term use”: “Operators may collect and retain data related to a communication in a third-party context for up to N weeks. During this time, operators may render data deidentified or perform processing of the data for any of the other permitted uses.” To what extent would this approach fit with the goal of suppressing LBH? If this approach to short term use is in the standard, are there any uses where details about the browsing history would be retained longer? What are those, and why? Length of time – how would you think about a possible time limit for “short term use”?

7. Moving to specific uses, we had the recent presentation from Media Rating Council about a general one-year retention, but with exceptions allowed where companies have cited privacy concerns. A different but similar audit function concerns financial payments – did a site deliver the promised advertisements? Query – many audit functions are based on sampling rather than having every transaction audited. To what extent has sampling been considered in the DNT process, and to what extent would retention of samples be consistent with suppressing LBH?

8. The MRC speaker said that most campaigns tested by his group are short-term, such as a few weeks or less. What about a presumption that campaigns are that length, but with an exceptions process if a campaign is, of its nature, longer-term?

9. What about cybersecurity, and keeping the detailed URIs? When asked about a one year limit, one person mentioned Black Friday (the day after Thanksgiving), as an example where annual events are important for telling a denial of service attack from heavy shopping traffic. What is the relevance of highly detailed content of this sort over the long term? Suggestions for mitigating the risks that these security databases become the target for subpoenas or other requests that show LBH?

10. How would other possible permitted uses interact with a limit on LBH? [Leader – you can refer to bare bones text for the current list.] If there is a market research permitted use (and some have objected to that), any reason to have the level of detail of the specific URI for more than the length of the short term use?

11. Wrapping up. In light of the discussion, is the goal of suppressing LBH a useful task to address in DNT process? Do you have a coherent way to do that? What are the pros and cons of working on this goal?