Fingerprinting Guidance for Web Specification Authors (Draft)

Abstract

Exposure of settings and characteristics of browsers can impact user privacy by allowing for browser fingerprinting. This document defines different types of fingerprinting, considers distinct levels of mitigation for the related privacy risks and provides guidance for Web specification authors on how to balance these concerns when designing new Web features.

1. Browser fingerprinting

1.1 What is fingerprinting?

In short, browser fingerprinting is the capability of a site to identify or re-identify a visiting user, user agent or device via configuration settings or other observable characteristics.

A more detailed list of types of fingerprinting is included below. A similar definition is provided by [RFC6973].

1.2 Privacy impacts and threat models

Browser fingerprinting can be used as a security measure (e.g. as means of authenticating the user). However, fingerprinting is also a potential threat to users' privacy on the Web. This document does not attempt to provide a single unifying definition of "privacy" or "personal data", but we highlight how browser fingerprinting might impact users' privacy. For example, browser fingerprinting can be used to:

identify a user
correlate a user’s browsing activity within and across sessions
track users without transparency or control

The privacy implications associated with each use case are discussed below. Following from the practice of security threat model analysis, we note that there are distinct models of privacy threats for fingerprinting. Defenses against these threats differ, depending on the particular privacy implication and the threat model of the user.

1.2.1 Identify a user

There are many reasons why users might wish to remain anonymous or unidentified online, including: concerns about surveillance, personal physical safety, concerns about discrimination against them based on what they read or write when using the Web. When a browser fingerprint is correlated with identifying information (like a real name), an application or service provider may be able to identify an otherwise pseudonymous user.

Users concerned about physical safety from, for example, a governmental adversary might employ onion routing systems such as Tor to limit network-level linkability but still face the danger of browser fingerprinting to correlate their Web-based activity.

Note

Is this privacy implication usefully distinct from unexpected correlation? How does this relate to linkability to other identities? [via TAG feedback]

1.2.2 Correlation of browsing activity

Browser fingerprinting provides privacy concerns even when real-world identities are not implicated. Some users may be surprised or concerned that an online party can correlate multiple visits (on the same or different sites) to develop a profile or history of the user. This concern may be heightened because (see below) it may occur without the user's knowledge or consent and tools such as clearing cookies do not prevent further correlation.

Browser fingerprinting also allows for tracking across origins [RFC6454]: different sites may be able to combine information about a single user even where a cookie policy would block accessing of cookies between origins, because the fingerprint is relatively unique and the same for all origins.

1.2.3 Tracking without transparency or user control

In contrast to other mechanisms defined by Web standards for maintaining state (e.g. cookies), browser fingerprinting allows for collection of data about user activity without clear indications that such collection is happening. Transparency can be important for end users, to understand how ongoing collection is happening, but it also enables researchers, policymakers and others to document or regulate privacy-sensitive activity. Browser fingerprinting also allows for tracking of activity without clear or effective user controls: a browser fingerprint cannot be cleared or re-set. (See the TAG finding on unsanctioned tracking [TAG-UNSANCTIONED].)

1.3 What can we do about it?

Advances in techniques for browser fingerprinting (see A. Research, below), particularly in active fingerprinting, suggest that elimination of the capability of browser fingerprinting by a determined adversary through solely technical means that are widely deployed is implausible. However, mitigations in our technical specifications are possible, as described below (5. Mitigations), and may achieve different levels of success (4. Feasibility).

Mitigations recommended here are simply mitigations, not solutions. Users of the Web cannot confidently rely on sites being completely unable to correlate traffic, especially when executing client-side code. A fingerprinting surface extends across all implemented Web features for a particular user agent, and even to other layers of the stack. In order to mitigate the risk as a whole, fingerprinting must be considered during the design and development of all specifications.

The TAG finding on Unsanctioned Web Tracking, including browser fingerprinting, includes description of the limitations of technical measures and encourages minimizing and documenting new fingerprinting surface [TAG-UNSANCTIONED]. The best practices below detail common actions that authors of specifications for Web features can take to mitigate the privacy impacts of browser fingerprinting.

3. Types of fingerprinting

3.1 Passive

Passive fingerprinting is browser fingerprinting based on characteristics observable in the contents of Web requests, without the use of any code executing on the client side.

Passive fingerprinting would trivially include cookies (often unique identifiers sent in HTTP requests) and the set of HTTP request headers and the IP address and other network-level information. The User-Agent string, for example, is an HTTP request header that typically identifies the browser, renderer, version and operating system. For some populations, the user agent string and IP address will commonly uniquely identify a particular user's browser [NDSS-FINGERPRINTING].

3.2 Active

For active fingerprinting, we also consider techniques where a site runs JavaScript or other code on the local client to observe additional characteristics about the browser. Techniques for active fingerprinting might include accessing the window size, enumerating fonts or plug-ins, evaluating performance characteristics, or rendering graphical patterns. Key to this distinction is that active fingerprinting takes place in a way that is potentially detectable on the client.

3.3 Cookie-like (setting/retrieving local state)

Users, user agents and devices may also be re-identified by a site that first sets and later retrieves state stored by a user agent or device. This cookie-like fingerprinting allows re-identification of a user or inferences about a user in the same way that HTTP cookies allow state management for the stateless HTTP protocol [RFC6265].

Cookie-like fingerprinting can also circumvent user attempts to limit or clear cookies stored by the user agent, as demonstrated by the "evercookie" implementation [EVERCOOKIE]. Where state is maintained across user agents (as in the case of common plugins with local storage), across devices (as in the case of certain browser syncing mechanisms) or across software upgrades, cookie-like fingerprinting can allow re-identification of users, user agents or devices where active and passive fingerprinting might not.

5. Mitigations

5.1 Weighing increased fingerprinting surface

The fingerprinting surface of a user agent is the set of observable characteristics that can be used in concert to identify a user, user agent or device or correlate its activity. Web specification authors regularly attempt to strike a balance between new functionality and fingerprinting surface. For example, feature detection functionality allows for progressive enhancement with a small addition to fingerprinting surface; detailed enumerations of plugins, fonts, connected devices may provide a large fingerprinting surface with minimal functional support.

Authors and Working Groups determine the appropriate balance between these properties on a case-by-case basis, given their understanding of the functionality, its likely implementations and the entropy of increased fingerprinting surface. However, given the distinct privacy impacts described above and in order to improve consistency across specifications, these practices provide some guidance:

Best Practice 1: Avoid unnecessary increases to the surface for passive fingerprinting.

Unless a feature cannot reasonably be designed in any other way, increased passive fingerprintability should be avoided. Passive fingerprinting surface allows for much easier fingerprinting, without opportunities for external detection (by users or third parties).

Best Practice 2: Prefer functionally-comparable designs that don't increase the surface for active fingerprinting.

If comparable functionality could be accomplished without increasing the surface for active fingerprinting, prefer the less fingerprintable alternative. Defining "equivalent" or "comparable" functionality can be difficult; use your best judgment and avoid unnecessary fingerprintability.

The difference between these practices recognizes that passive fingerprinting surface has lesser options for mitigation (lacking external detectability and client-side preventability) and greater feasibility for reduction.

Best Practice 3: Mark features that contribute to fingerprintability.

This feature may contribute to browser fingerprintability. Where a feature does contribute to the fingerprinting surface, indicate that impact, by explaining the effect (and any known implementer mitigations) and marking the relevant section with a fingerprinting icon, as this paragraph is.

Note

This practice (and this image) is drawn from the HTML5 specification, which uses it throughout. Can we get feedback from the HTML WG or from readers of that specification as to whether the practice has been useful?

5.2 Standardization

Specifications can mitigate against fingerprintability through standardization; by defining a consistent behavior, conformant implementations won't have variations that can be used for browser fingerprinting.

Randomization of certain browser characteristics has been proposed as a way to combat browser fingerprinting. While this strategy may be pursued by some implementations, we expect in general it will be more effective for us to standardize or null values rather than setting a range over which they can vary. The Tor Browser design provides more detailed information, but in short: it's difficult to measure how well randomization will work as a mitigation and it can be costly to implement in terms of usability (varying functionality or design in unwanted ways), processing (generating random numbers) and development (including the cost of introducing new security vulnerabilities).

Best Practice 4: Specify orderings and non-functional differences.

To reduce unnecessary entropy, specify aspects of API return values and behavior that don't contribute to functional differences. For example, if the ordering of return values in a list has no semantic value, specify a particular ordering (alphabetical order by a defined algorithm, for example) so that incidental differences don't expose fingerprinting surface.

Issue 1

TODO: why we don't typically recommend that we try to do this across user agent implementations ... that is, why we're not advocating for getting rid of the User Agent string

5.3 Detectability

Where a client-side API provides some fingerprinting surface, authors can still mitigate the privacy concerns via detectability. If client-side fingerprinting activity is to some extent distinguishable from functional use of APIs, user agent implementations may have an opportunity to prevent ongoing fingerprinting or make it observable to users and external researchers (including academics or relevant regulators) who may be able to detect and investigate the use of fingerprinting.

Best Practice 5: Design APIs to access only the entropy necessary.

Following the basic principle of data minimization [RFC6973], design your APIs such that a site can access (and does access by default) only the entropy necessary for particular functionality.

Authors might design an API to allow for querying of a particular value, rather than returning an enumeration of all values. User agents and researchers can then more easily distinguish between sites that query for one or two particular values (gaining minimal entropy) and those that query for all values (more likely attempting to fingerprint the browser); or implementations can cap the number of different values. For example, Tor Browser limits the number of fonts that can be queried with a browser.display.max_font_attempts preference.

For more information, see:

Data Minimization in Web APIs, Draft TAG Finding, September 2011
Device API Privacy Requirements, DAP Working Group Note, June 2010

Issue 2

TODO: Other examples?

Best Practice 6: Enable graceful degradation for privacy-conscious users or implementers.

If your specification exposes some fingerprinting surface (whether it's active or passive), some implementers (e.g. the Tor Browser) are going to be compelled to disable those features for certain privacy-conscious users. Following the principle of progressive enhancement, and to avoid further divergence (which might itself expose variation in users), consider whether some functionality in your specification is still possible if fingerprinting surface features are disabled.

Explicit hooks or API flags may be used so that browser extensions or certain user agents can easily disable specific features. For example, the origin-clean flag allows control over whether an image canvas can be read (a significant fingerprinting surface).

5.4 Clearing all local state

Features which enable storage of data on the client and functionality for client- or server-side querying of that data can increase the ease of cookie-like fingerprinting. Storage can vary between large amounts of data (for example, the Web Storage API) or just a binary flag (has or has not provided a certain permission; has or has not cached a single resource).

Best Practice 7: Avoid unnecessary new cookie-like local state mechanisms.

If functionality does not require maintaining client-side state in a way that is subsequently queryable (or otherwise observable), avoid creating a new cookie-like feature. Can the functionality be accomplished with existing HTTP cookies or an existing JavaScript local storage API?

Where features do require setting and retrieving local state, there are ways to mitigate the privacy impacts related to unexpected cookie-like behavior; in particular, you can help implementers prevent "permanent", "zombie", "super" or "evercookies".

Best Practice 8: Highlight any local state mechanisms to enable simultaneous clearing.

Clearly note where state is being maintained and could be queried and provide guidance to implementers on enabling simultaneous deletion of local state for users. Such functionality can mitigate the threat of "evercookies" because the presence of state in one such storage mechanism can't be used to persist and re-create an identifier. As a result, your design should not rely on saving and later querying data on the client beyond a user's clearing all local state. That is, you should not expect any local state information to be permanent.

Though not strictly browser fingerprinting, there are other privacy concerns regarding user tracking for features that provide local storage of data. Mitigations suggested in the Web Storage API specification include: white-listing, black-listing, expiration and secure deletion [WEBSTORAGE-user-tracking].

5.5 Do Not Track: a cooperative approach

Expressions of, and compliance with, a Do Not Track signal does not inhibit the capability of browser fingerprinting, but may mitigate some user concerns about fingerprinting, specifically around tracking as defined in those specifications [TRACKING-DNT] [TRACKING-COMPLIANCE] and as implemented by services that comply with those user preferences.

The use of DNT in this way typically does not require changes to other functional specifications. If your specification expects a particular behavior upon receiving a particular DNT signal, indicate that with a reference to [TRACKING-DNT]. If your specification introduces a new communication channel that could be used for tracking, you might wish to define how a DNT signal should be communicated.

Fingerprinting Guidance for Web Specification Authors (Draft)

W3C Interest Group Note 24 November 2015

Abstract

Status of This Document

Table of Contents

1. Browser fingerprinting

1.1 What is fingerprinting?

1.2 Privacy impacts and threat models

1.2.1 Identify a user

1.2.2 Correlation of browsing activity

1.2.3 Tracking without transparency or user control

1.3 What can we do about it?

2. Best Practices Summary

3. Types of fingerprinting

3.1 Passive

3.2 Active

3.3 Cookie-like (setting/retrieving local state)

4. Feasibility

4.1 Fingerprinting mitigation levels of success

4.2 Feasible goals for specification authors

5. Mitigations

5.1 Weighing increased fingerprinting surface

5.2 Standardization

5.3 Detectability

5.4 Clearing all local state

5.5 Do Not Track: a cooperative approach

A. Research

Testing

B. Acknowledgements

C. References

C.1 Informative references