Privacy/TPWG/Change Proposals on data minimization

From W3C Wiki
< Privacy‎ | TPWG

This page will summarize all change proposals regarding the topic of data minimization. This includes text for the issues 31, 199 with previous wiki page, 211] with previous wiki page, 220, and 233.

Current Editors' Draft Text

4.2.1.2 Data Minimization, Retention and Transparency

Data collected by a party for permitted uses MUST be limited to the data reasonably necessary for such permitted uses. Such data MUST NOT be retained any longer than is proportionate to and reasonably necessary for such permitted uses.

A party MUST provide public transparency of the time periods for which data collected for permitted uses are retained. The party MAY enumerate different retention periods for different permitted uses. Data MUST NOT be used for a permitted use once the data retention period for that permitted use has expired. After there are no remaining permitted uses for given data, the data MUST be deleted or de-identified.

A party that collects data for a permitted use MUST make reasonable data minimization efforts to ensure that only the data necessary for the permitted use is retained, and MUST NOT rely on unique identifiers if alternative solutions are reasonably available.

Proposal: clarify restrictions on unique identifiers

Proposal from Mike O'Neill as modified during June 25th Working Group call

Add to section on third-party compliance:

3. the third party MUST NOT store in the user-agent, or derive from data already stored in the user-agent, any unique identifiers other than with the users explicit consent or to support permitted uses as defined within this recommendation.

Add to data minimization section:

If unique identifiers are relied upon then their duration SHOULD be limited to the maximum necessary for such permitted use.

Add to definitions:

A unique identifier is an arbitrary value held in, or derived from other data in, the user agent whose purpose is to identify the user agent in subsequent transactions to a particular web domain. It may be encoded for example as the name or value attribute of an HTTP cookie, as an item in DOM storage or recorded in some way in the cache.

The duration of a unique identifier is the maximum period of time it will be retained in the user agent. This could be implemented for example using the Expires or Max-Age attributes of an HTTP cookie so that it is automatically deleted by the user agent after the specified time period is exceeded.

Browser fingerprinting is a method of tracking based on creating a unique identifier from other information either inherent in the content request or already stored in the user agent. Such an identifier may not need itself to be stored in the user-agent as it can be calculated again in subsequent transactions. It follows from this that its duration is effectively unlimited.

Old change Proposals

Change Proposals on Unique Identifiers

Proposal: Limits on unique identifiers in permitted uses

Proposal from Dan Auerbach; issue-199

For the permitted uses, except for short-term debugging:

Parties must not collect or use unique identifiers of users, user agents or devices in association with this data.

Proposal: Persistent identifiers

Proposal from Mike O'Neill; issue-199


Propose referring to "persistent identifier" instead of "unique identifiers" in each case, with the following definition:

A persistent identifier is an arbitrary value held in, or derived from other data in, the user agent whose purpose is to identify the user agent in subsequent transactions to a particular web domain. It may be encoded for example as the name or value attribute of an HTTP cookie, as an item in localStorage or recorded in some way in the cache.

The duration of a persistent identifier is the maximum period of time it will be retained in the user agent. This could be implemented for example using the Expires or Max-Age attributes of an HTTP cookie so that it is automatically deleted by the user agent after the specified time period is exceeded.

Browser fingerprinting is a method of tracking based on creating a persistent identifier from other information either inherent in the content request or already stored in the user agent. Such an identifier may not need itself to be stored in the user-agent as it can be calculated again in subsequent transactions. It follows from this that its duration is effectively unlimited.

Justification.

With the duration definition, restrictions on permitted uses could then be made that limit the duration of persistent identifiers. Because browser fingerprinting cannot be given a finite duration this tracking method should not be used when DNT is set even if it is for a permitted use. In reality browser fingerprinting solely based on examining initial content requests is usually not an effective tracking method because the combination of IP addresses and other headers are not sufficiently user specific, but we should rule out at least the more complex form when DNT is set.

Change Proposal Retention Permitted Uses

Proposal: Limits on unique identifiers in permitted uses

Proposal from Dan Auerbach; issue-211

For each permitted use, the proposal adds a retention limit with the requirement that retention beyond a limit be justified in a privacy policy. Key changes around that are italicized below.

A third party MAY also use protocol information (e.g. HTTP header information and IP information) for any purpose, subject to a one week retention period. Limited retention of data beyond this period for debugging purposes may occur, provided the data is only used for debugging purposes and only retained as long as necessary for those purposes. If data is being retained for more than 6 months for debugging purposes, notice must be given in the privacy policy that some data is being retained for greater than 6 months for debugging.

Frequency capping

Regardless of DNT signal, protocol information may be collected, retained and used for up to 4 weeks to limit the number of times that a user sees a particular advertisement, often called frequency capping, as long as the data retained do not reveal the user’s browsing history. Parties must not construct profiles of users or user behaviors based on their ad frequency history, or otherwise alter the user’s experience.

Billing and auditing

Regardless of DNT signal, protocol information may be collected, retained and used for billing and auditing for up to 6 months, or longer if notice is given in the privacy policy with an explanation of why the extra retention is necessary. This may include, for example, counting ad events, verifying positioning and quality of ad impressions, or data that an auditor explicitly requires to be held.

Security and Fraud

To the extent proportionate and reasonably necessary for detecting security risks and fraudulent or malicious activity, parties may collect, retain, and use protocol data regardless of a DNT signal for up to 6 months, or longer if notice is given in the privacy policy with an explanation of why the extra retention is necessary. This includes data reasonably necessary for enabling authentication/verification, detecting hostile and invalid transactions and attacks, providing fraud prevention, and maintaining system integrity. In the context of this specific permitted use, this information may be used to alter the user's experience in order to reasonably keep a service secure or prevent fraud. Data may be kept beyond 6 months or the published retention period for a specific ongoing investigation or for legal purposes, but general data collection for security and fraud must be limited to 6 months or the published retention period.

It is a best practice to approach security and fraud issues with a graduated response where appropriate, retaining the minimal amount of data that is necessary for security and fraud purposes, and expanding the scope of data retention only when it becomes necessary to do so once a particular issue has been discovered.

Proposal: Limit duration of persistent identifiers

Proposal from Mike O'Neill

To be added to the section "Data Minimization, Retention and Transparency":

If persistent identifiers are used then their duration SHOULD be limited to the maximum necessary for such permitted use.

Change Proposal Wording

By Jack Hobaugh Issue-233

Make compliance for this section party-neutral and replace "limited" with "minimized."