Privacy Principles

Abstract

Privacy is an essential part of the Web. This document provides definitions for privacy and related concepts that are applicable worldwide as well as a set of privacy principles that should guide the development of the Web as a trustworthy platform. People using the Web would benefit from a stronger relationship between technology and policy, and this document is written to work with both.

This is a document containing technical guidelines. However, in order to put those guidelines in context we must first define some terms and explain what we mean by privacy.

The Web is for everyone ([For-Everyone]). It is "a platform that helps people and provides a net positive social benefit" ([ETHICAL-WEB], [design-principles]). One of the ways in which the Web serves people is by protecting them in the face of asymmetries of power, and this includes establishing and enforcing rules to govern the power of data.

The Web is a social and technical system made up of information flows. Because this document is specifically about privacy as it applies to the Web, it focuses on privacy with respect to information flows. Our goal is not to cover all possible privacy issues, but rather to provide enough background to support the Web community in making informed decisions about privacy and in weaving privacy into the architecture of the Web.

Few architectural principles are absolute, and privacy is no exception: privacy can come into tension with other desirable properties of an ethical architecture, and when that happens the Web community will have to work together to strike the right balance.

Information is power. It can be used to predict and to influence people, as well as to design online spaces that control people's behaviour. The collection and processing of information in greater volume, with greater precision and reliability, with increasing interoperability across a growing variety of data types, and at intensifying speed is leading to an unprecedented concentration of power that threatens private and public liberties. What's more, automation and the increasing computerisation of all aspects of our lives both increase the power of information and decrease the cost of a number of intrusive behaviours that would be more easily kept in check if the perpetrator had to be in the same room as the victim.

These asymmetries of information and of automation create significant asymmetries of power.

Data governance is the system of principles that regulate information flows. When people are involved in information flows, data governance determines how these principles constrain and distribute the power of information between different actors. Such principles describe the ways in which different actors may, must, or must not produce or process flows of information from, to, or about other actors ([GKC-Privacy], [IAD]).

It is important to keep in mind that not all people are equal in how they can resist the imposition of unfair principles: some people are more vulnerable and therefore in greater need of protection. This document focuses on the impact that differences in information power can have on people, but those differences can also impact other actors, such as companies or governments.

Principles vary from context to context ([Understanding-Privacy], [Contextual-Integrity]): people have different expectations of privacy at work, at a café, or at home for instance. Understanding and evaluating a privacy situation is best done by clearly identifying:

Its actors, which include the subject of the information as well as the sender and the recipient of the information flow. (Note that recipients might not always want to be recipients.)
The type of data involved in the information flow.
The principles that are in use in this context.

It is important to keep in mind that there are always privacy principles and that all of them imply different power dynamics. Some sets of principles may be more permissive, but that does not make them neutral — it means that they support the power dynamic that comes with permissive processing. We must therefore determine which principles best align with ethical Web values in Web contexts ([ETHICAL-WEB], [Why-Privacy]).

Information flows as used in this document refer to information exchanged or processed by actors. The information itself need not necessarily be personal data. Disruptive or interruptive information flowing to a person is in scope, as is de-identified data that can be used to manipulate people or that was extracted by observing people's behaviour on a website.

Information flows need to be understood from more than one perspective: there is the flow of information about a person (the subject) being processed or transmitted to any other actor, and there is the flow of information towards a person (the recipient). Recipients can have their privacy violated in multiple ways such as unexpected shocking images, loud noises while they intend to sleep, manipulative information, interruptive messages when their focus is on something else, or harassment when they seek social interactions.

On the Web, information flows may involve a wide variety of actors that are not always recognizable or obvious to a user within a particular interaction. Visiting a website may involve the actors that operate that site and its functionality, but also actors with network access, which may include: Internet service providers; other network operators; local institutions providing a network connection including schools, libraries or universities; government intelligence services; malicious hackers who have gained access to the network or the systems of any of the other actors. High-level threats including surveillance may be pursued by these actors. Pervasive monitoring, a form of large-scale, indiscriminate surveillance, is a known attack on the privacy of users of the Internet and the Web [RFC7258].

Information flows may also involve other people — for example, other users of a site — which could include friends, family members, teachers, strangers, or government officials. Some threats to privacy, including both disclosure and harassment, may be particular to the other people involved in the information flow.

A person's autonomy is their ability to make decisions of their own personal will, without undue influence from other actors. People have limited intellectual resources and time with which to weigh decisions, and by necessity rely on shortcuts when making decisions. This makes their preferences, including privacy preferences, malleable and susceptible to manipulation ([Privacy-Behavior], [Digital-Market-Manipulation]). A person's autonomy is enhanced by a system or device when that system offers a shortcut that aligns more with what that person would have decided given arbitrary amounts of time and relatively unlimited intellectual ability; and autonomy is decreased when a similar shortcut goes against decisions made under such ideal conditions.

Affordances and interactions that decrease autonomy are known as deceptive patterns (or dark patterns). A deceptive pattern does not have to be intentional ([Dark-Patterns], [Dark-Pattern-Dark]).

Because we are all subject to motivated reasoning, the design of defaults and affordances that may impact autonomy should be the subject of independent scrutiny.

Given the large volume of potential data-related decisions in today's data economy, complete informational self-determination is impossible. This fact, however, should not be confused with the idea that privacy is dead. Studies show that people remain concerned over how their data is processed, feeling powerless and like they have lost agency ([Privacy-Concerned]). Careful design of our technological infrastructure can ensure that people's autonomy with respect to their own data is enhanced through appropriate defaults and choice architectures.

Privacy labour is the practice of having a person carry out the work of ensuring data processing of which they are the subject or recipient is appropriate, instead of putting the responsibility on the actors who are doing the processing. Data systems that are based on asking people for their consent tend to increase privacy labour.

More generally, implementations of privacy are often dominated by self-governing approaches that offload labour to people. This is notably true of the regimes descended from the Fair Information Practices (FIPs), a loose set of principles initially elaborated in the 1970s in support of individual autonomy in the face of growing concerns with databases. The FIPs generally assume that there is sufficiently little data processing taking place that any person will be able to carry out sufficient diligence to enable autonomy in their decision-making. Since they offload the privacy labour to people and assume perfect, unlimited autonomy, the FIPs do not forbid specific types of data processing but only place them under different procedural requirements. Such an approach is appropriate for actors that are processing data in the 1970s.

One notable issue with procedural, self-governing approaches to privacy is that they tend to have the same requirements in situations where people find themselves in a significant asymmetry of power with another actor — for instance a person using an essential service provided by a monopolistic platform — and those where a person and the other actor are very much on equal footing, or even where the person may have greater power, as is the case with small businesses operating in a competitive environment. They further do not consider cases in which one actor may coerce other actors into facilitating its inappropriate practices, as is often the case with dominant players in advertising or in content aggregation ([Consent-Lackeys], [CAT]).

Reference to the FIPs survives to this day. They are often referenced as "transparency and choice", which, in today's digital environment, is often an indication that inappropriate processing is being described.

Privacy principles are defined through social processes and, because of that, the applicable definition of privacy in a given context can be contested ([Privacy-Contested]). This makes privacy a problem of collective action ([GKC-Privacy]). Group-level data processing may impact populations or individuals, including in ways that people could not control even under the optimistic assumptions of consent. For example, based on group-level analysis, a company may know that site.example is predominantly visited by people of a given race or gender, and decide not to run its job ads there. Visitors to that page are implicitly having their data processed in inappropriate ways, with no way to discover the discrimination or seek relief ([Relational-Governance]).

What we consider is therefore not just the relation between the people who share data and the actors that invite that sharing ([Relational-Turn]), but also between the people who may find themselves categorised indirectly as part of a group even without sharing data. One key understanding here is that such relations may persist even when data is de-identified. What's more, such categorisation of people, voluntary or not, changes the way in which the world operates. This can produce self-reinforcing loops that can damage both individuals and groups ([Seeing-Like-A-State]).

In general, collective issues in data require collective solutions. Web standards help with data governance by defining structural controls in user agents and establishing or delegating to institutions that can handle issues of privacy. Governance will often struggle to achieve its goals if it works primarily by increasing individual control instead of by collective action.

Collecting data at large scales can have significant pro-social outcomes. Problems tend to emerge when actors process data for collective benefit and for self-dealing purposes at the same time. The self-dealing purposes are often justified as bankrolling the pro-social outcomes but this requires collective oversight to be appropriate.

There are different ways for people to become members of a group. Either they can join it deliberately, making it a self-constituted group such as when joining a club, or they can be classified into it by an external actor, typically a bureaucracy or its computerised equivalent ([Beyond-Individual]). In the latter case, people may not be aware that they are being grouped together, and the definition of the group may not be intelligible (for instance if it is created from opaque machine learning techniques).

Protecting group privacy can take place at two different levels. The existence of a group or at least its activities may need to be protected even in cases in which its members are guaranteed to remain anonymous. We refer to this as "group privacy." Conversely, people may wish to protect knowledge that they are members of the group even though the existence of the group and its actions may be well known (eg. membership in a dissidents movement under authoritarian rule), which we call "membership privacy". An example privacy violation for the former case is the fitness app Strava that did not reveal individual behaviour or identity but published heat maps of popular running routes. In doing so, it revealed secret US bases around which military personnel took frequent runs ([Strava-Debacle], [Strava-Reveal-Military]).

When people do not know that they are members of a group, when they cannot easily find other members of the group so as to advocate for their rights together, or when they cannot easily understand why they are being categorised into a given group, their ability to protect themselves through self-governing approaches to privacy is largely eliminated.

One common problem in group privacy is when the actions of one member of a group reveal information that other members would prefer were not shared in this way (or at all). For instance, one person may publish a picture of an event in which they are featured alongside others while the other people captured in the same picture would prefer their participation not to be disclosed. Another example of such issues are sites that enable people to upload their contacts: the person performing the upload might be more open to disclosing their social networks than the people they are connected to are. Such issues do not necessarily admit simple, straightforward solutions but they need to be carefully considered by people building websites.

While transparency rarely helps enough to inform the individual choices that people may make, it plays a critical role in letting researchers and reporters inform our collective decision-making about privacy principles. This consideration extends the TAG's resolution on a Strong and Secure Web Platform to ensure that "broad testing and audit continues to be possible" where information flows and automated decisions are involved.

Such transparency can only function if there are strong rights of access to data (including data derived from one's personal data) as well as mechanisms to explain the outcomes of automated decisions.

The user agent acts as an intermediary between a person (its user) and the web. User agents implement, to the extent possible, the principles that collective governance establishes in favour of individuals. They seek to prevent the creation of asymmetries of information, and serve their user by providing them with automation to rectify automation asymmetries. Where possible, they protect their user from receiving intrusive messages.

The user agent is expected to align fully with the person using it and to operate exclusively in that person's interest. It is not the first party. The user agent serves the person as a trustworthy agent: it always puts that person's interest first. In some occasions, this can mean protecting that person from themselves by preventing them from carrying out a dangerous decision, or by slowing down the person in their decision. For example, the user agent will make it difficult for someone to connect to a site if it can't verify that the site is authentic. It will check that that person really intends to expose a sensitive device to a page. It will prevent that person from consenting to the permanent monitoring of their behaviour. Its user agent duties include ([Taking-Trust-Seriously]):

Duty of Protection: Protection requires user agents to actively protect their user's data, beyond simple security measures. It is insufficient to just encrypt at rest and in transit, but the user agent must also limit retention, help ensure that only strictly necessary data is collected, and require guarantees from any actor that the user agent can reasonably be aware that data is shared to.
Duty of Discretion: Discretion requires the user agent to make best efforts to enforce principles by taking care in the ways it discloses the personal data that it manages. Discretion is not confidentiality or secrecy: trust can be preserved even when the user agent shares some personal data, so long as it is done in an appropriately discreet manner.
Duty of Honesty: Honesty requires that the user agent try to give its user information of which the user agent can reasonably be aware, that is relevant to them and that will increase their autonomy, as long as they can understand it and there's an appropriate time to do so. This is almost never when the person is trying to do something else such as read a page or activate a feature. The duty of honesty goes well beyond that of transparency that is often included in older privacy regimes. Unlike transparency, honesty can't hide relevant information in complex legal notices and it can't rely on very short summaries provided in a consent dialog. If the person has provided consent to processing of their personal data, the user agent should inform the person of ongoing processing, with a level of obviousness that is proportional to the reasonably foreseeable impact of the processing.
Duty of Loyalty: Because the user agent is a trustworthy agent, it is held to be loyal to the person using it in all situations, including in preference to the user agent's implementer. When a user agent carries out processing that is not in the person's interest but instead benefits another actor (such as the user agent's implementer) that behaviour is known as self-dealing. Behaviour can be self-dealing even if it is done at the same time as processing that is in the person's interest, what matters is that it potentially conflicts with that person's interest. Self-dealing is always inappropriate. Loyalty is the avoidance of self-dealing.

These duties ensure the user agent will care for its user. In academic research, this relationship with a trustworthy agent is often described as "fiduciary" [Fiduciary-UA]. Some jurisdictions may have a distinct legal meaning for "fiduciary."

Many of the principles described in the rest of this document extend the user agent's duties and make them more precise.

While privacy principles are designed to work together and support each other, occasionally a proposal to improve how a system follows one privacy principle may reduce how well it follows another principle.

Principle: When confronted with an apparent tradeoff, first look for ways to improve all principles at once.

Given any initial design that doesn't perfectly satisfy all principles, there are usually some other designs that improve the situation for some principles without sacrificing anything about the other principles. Work to find those designs.

Another way to say this is to look for Pareto improvements before starting to trade off between principles.

Once one is choosing between different designs at the Pareto frontier, the choice of which privacy principles to prefer is complex and depends heavily on the details of each particular situation. Note that people's privacy can also be in tension with non-privacy concerns. As discussed in the W3C TAG Ethical Web Principles, "it is important to consider the context in which a particular technology is being applied, the expected audience(s) for the technology, who the technology benefits and who it may disadvantage, and any power dynamics involved". Despite this complexity, there is a basic ground rule to follow:

Principle: If a service needs to collect extra data from its users in order to protect those or other users, it must take extra technical and legal measures to ensure that this data can't be then used for other purposes, like to grow the service.

This is a special case of the more general principle that data should not be used for more purposes than the data's subjects understood it was being collected for.

A service should explain how it uses people's data to protect them and other people, and how it might additionally use someone's data if it believes that person has broken the rules.

It is attractive to say that if someone violates the norms of a service they're using, then they sacrifice a proportionate amount of their privacy protections, but

Often the service can only prevent the norm violation by also collecting data from innocent users. This extra collection is not always appropriate, especially if it allows pervasive monitoring ([RFC7258], [RFC7687]).
If a service operator wants to collect some extra data, it can be tempting for them to define norms and proportionality that allow them to do so.

The following examples illustrate some of the tensions:

Example 5: Preventing Profiling

Some actors on the web place a high value on building a detailed profile of each user's behavior, across websites. User agents are trying to enforce the principles in 2.1 Identity on the Web by blocking this profiling, but because the profiles are valuable, there's a large incentive to work around user agent measures, sometimes by using techniques that are very expensive or impossible to block. If user agent behavior causes websites to adopt these alternate tracking methods, the web as a whole won't respect the 2.1 Identity on the Web principles.

User agents can reduce the incentive to develop these alternate tracking methods by building APIs to facilitate the most common and least harmful uses of user profiles. However, those APIs usually still reveal some information about a user's behavior. For example, even the most privacy-respecting conversion attribution API will reveal a limited amount of information about each user in order to replace the use of profiles to measure the success of advertising campaigns. Even this small amount of information is still personal data.

As indicated above, different contexts require different principles. This section describes a set of principles designed to apply to the Web context in general. The Web is a big place, and we fully expect more specific contexts of the Web to add their own principles to further constrain information flows.

To the extent possible, user agents are expected to enforce these principles. However, this is not always possible and additional enforcement mechanisms are needed. One particularly salient issue is that a context is not defined in terms of who owns or controls it. Sharing data between different contexts of a single company is just as much a privacy violation as if the same data were shared between unrelated actors.

Principle: A user agent should help its user present the identity they want in each context they are in.

A person's identity is the set of characteristics that define them. Their identity in a context is the set of characteristics they present in that context. People frequently present different identities to different contexts, and also frequently share an identity among several contexts. People may also wish to present an ephemeral or anonymous identity, which is just a set of characteristics that is too small or unstable to be useful for following them through time.

Recognition is the act of realising that a given identity corresponds to the same person as another identity which may have been observed either in another context or in the same context but at a different time.

In order to uphold the above principle, sometimes a user agent needs to prevent recognition, for instance so that one site can't learn anything about its user's behavior on another site. Other times, the user agent needs to support recognition, for instance to help its user prove to one site that they have a particular identity on another site. Similarly, a user agent can help its user to separate or communicate identity across repeat visits to the same site.

There are several types of recognition that may take place. These rely on different methods and present different challenges.

Cross-context recognition is recognition between different contexts. It contributes to surveillance, correlation, and identification.

Cross-context recognition is only appropriate when the person being recognized can reasonably expect that recognition to happen and can control whether it does. Note that a person can use a piece of identifying information in two different contexts (e.g. their email or phone number) without that implying that they're using the same identity in both contexts. Unless there's some other indication that they intended to use a single identity, it is inappropriate to recognize them using that information, or to seek extra identifying information to help with cross-context recognition.

Systems which recognize people across contexts need to be careful not to apply the principles of one context in ways that violate the principles around use of information acquired in a different context. This is particularly true for vulnerable people, as recognising them in different contexts may force traits into the open that reveal their vulnerability. For example, if you meet your therapist at a party, you expect them to have different discussion topics with you than they usually would, and possibly even to pretend they don't know you.

Cross-site recognition is when a site determines with high probability that a visit to the site comes from the same person as another visit to a different site. In the usual case that the sites are different contexts, cross-site recognition is a privacy harm in the same cases as cross-context recognition.

Same-site recognition is when a single site discovers and uses the fact that two or more visits probably came from the same person.

A privacy harm occurs if a person reasonably expects that they'll be using a different identity for different visits to a single site, but the site recognizes them anyway. This harm can be accomplished through a variety of means detailed in 2.1.3 Recognition Methods.

Note that these categories overlap: cross-site recognition is usually cross-context recognition (and always recognizes across partitions); and same-site recognition is sometimes cross-context recognition (and may or may not involve multiple partitions).

A partition is the user agent's attempt to match how its user would understand a context. User agents don't have a perfect understanding of how their users experience the sites they visit, so they often need to approximate the boundaries between contexts when building partitions. In the absence of better information, a partition can be defined as:

a set of environments (roughly same-site and cross-site iframes, workers, and top-level pages)
whose top-level origins are in the same site (but see [PSL-Problems])
being visited within the same user agent installation (and browser profile, container, or container tab for user agents that support those features)
between points in time that the person or user agent clears that site's cookies and other storage (which is sometimes automatic at the end of each session).

When a user agent knows that a site includes multiple contexts, it should adjust its partitions accordingly, for instance by partitioning identities per subdomain or site path.

Where possible, user agents should prevent people from being recognized across partitions unless they intend to be recognized. Note that sites can do harm even if they can't be completely certain that visits come from the same person, so user agents should also take steps to prevent such probabilistic recognition. The Target Privacy Threat Model discusses the tradeoffs involved.

If a user agent can tell that its user is using a particular identity on a website, for example because the user used an API like Credential Management Level 1 to log into the site, it should make that active identity clear to the user.

The web platform offers many ways for a website to recognize that a person is using the same identity over time, including cookies, localStorage, indexedDB, CacheStorage, and other forms of storage. This allows sites to save the person's preferences, shopping carts, etc., and people have come to expect this behavior in some contexts.

People are unlikely to expect the recognition and will find it difficult to mitigate when it is automated, which can happen in different ways:

through the use of cross-site cookies,
by having someone navigate to a link that has been decorated with an identifier ([Nav-Tracking]),
collecting the same piece of identifying information on both sites, or
by correlating the timestamps of an event that occurs nearly-simultaneously on both sites (this is an example of a timing attack).

In addition to recognition methods that can operate automatically across contexts, recognition can also be made persistent such that it will defeat potential mitigations like partitions or clearing one's cookies. This constitutes unsanctioned tracking ([UNSANCTIONED-TRACKING]) and can take multiple forms.

Fingerprinting consists of using attributes of the person's browser and platform that are consistent between two or more visits and probably unique to the person.

The attributes can be exposed as information about the person's device that is otherwise benign (as opposed to 2.4 Sensitive Information). For example:

language and time zone;
window size;
system preferences (such as dark mode, serif font, etc.).

Preventing fingerprinting can be particularly challenging in cases that only affect a small group of people who use the web. For example, people who configure their systems in unique ways, such as by using a browser with a very small number of users. As long as a tracker can't track a significant number of people, it's likely to be unviable to maintain the tracker. However, this doesn't excuse making small groups of people trackable when those people didn't choose to be in the group.

See [fingerprinting-guidance] for how to mitigate threats that result from fingerprinting.

Supercookies occur when a user agent stores data for a site but makes that data more difficult to clear than other cookies or storage, typically because of a bug, of features relating to cache storage and network state (eg. ETag, HSTS), or because the browser restores the browser vendor's cookies when local state is cleared. Fingerprinting Guidance § Clearing all local state discusses how specifications can help user agents avoid this mistake.

Header enrichment happens when a network operator adds HTTP request headers to identify their customers to sites that they visit. It is unfortunately difficult for a user agent to mitigate against header enrichment.

Cross-device communication is communication between code on one device and code running on another device. For example, sounds or light emitted from one device could be detected by a microphone or light sensor on another device [SILVERPUSH]. Cross-device communication enables cross-device tracking, a form of cross-context recognition, but it can also be used for other inappropriate information flows.

Principle: Sites, user agents, and other actors should minimize the amount of personal data they transfer between actors on the Web.

Data minimization limits the risks of data being disclosed or misused, and it also helps user agents more meaningfully explain the decisions their users need to make.

Principle: Web APIs should be designed to minimize the amount of data that sites need to request to carry out their users' goals and provide granularity and user controls over personal data that is communicated to sites.

Because personal data may be sensitive in unexpected ways, or have risks of future uses that could be unexpected or harmful, minimization as a principle applies to personal data that is not currently known to be identifying, sensitive, or otherwise potentially harmful.

Note that this principle was further explored in an earlier TAG draft on Data Minimization in Web APIs.

Principle:

Websites sometimes use data in ways that aren't needed for the user's immediate goals. These uses are known as ancillary uses, and data that is primarily useful for ancillary uses is ancillary data.

Different users will want to share different kinds and amounts of ancillary data with websites, including possibly no ancillary data.

Aggregation or de-identification of data may make users interested in sharing ancillary data in cases were the user was otherwise not interested. These techniques may be especially useful and important when ancillary data contributes to a collective benefit in a way that reduces privacy threats to individuals (see collective privacy).

Principle: Sites and user agents should seek to understand and respect people's goals and preferences about use of data about them.

Agents should aggressively minimize ancillary data and should avoid burdening the user with additional privacy labor when deciding what ancillary data to send. To that end, user agents may employ user research, solicitation of general preferences, and heuristics about sensitivity of data or trust in a particular context. To facilitate site understanding of user preferences, user agents can provide browser-configurable signals to directly communicate common user preferences (such as a global opt-out).

Principle:

Data exposed for ancillary uses including telemetry and analytics may often reveal characteristics of user configuration, device, environment, or behavior that could be used as part of browser fingerprinting to identify users across sites. Revealing user preferences or other heuristics in providing or disabling functionality could also contribute to a browser fingerprint.

The many APIs available to websites expose lots of data that can be combined into information about people, web servers, and other things. We can divide that information into three categories:

Information that's fine to expose, for example because a person or group with sufficient authority intended to expose that information or to do something that necessarily exposes the information, or because it's not about people at all. For example:
- The geolocation and camera APIs ask whether a person wants to expose their data.
- The URL a person is visiting must be sent to a server in order to navigate to that URL, and known private-information-retrieval methods are too expensive to avoid that exposure.
- The distribution of Largest Contentful Paint timings for a website is about a website rather than about the people browsing it, even if the data that informs that measure can also reveal information about people.
Information that we don't want to expose and have a plausible plan for removing access to. For example, browsers are gradually removing the ability to join identities between different partitions.
Information that we'd rather not expose, but that we don't have a plausible plan for removing access to. For example:
- Some users are disappointed that the page they're visiting can discover which link they clicked to leave that page. We can't block that information that information because the page can use HTTP redirects to learn it, and redirection is a core feature of the web.
- Some users are disappointed that a page with permission to run Javascript can record their pattern of interaction with that page. However, the page does this by using the same events it would use to make the page interactive, so we can't block this information access either.
These principles don't describe exactly how to distinguish acceptable information from information we'd rather not expose. API designers instead need to balance the harm to users from exposing information against the harm to users from blocking that exposure. When in doubt, designers should ensure that different user agents can help their users balance the costs in different ways.

The following subsections discuss how to review an API proposal that exposes data that provides a new way to infer each of the above categories of information. They explain how to leave the web better than you found it.

Principle: New APIs can add ways of getting acceptable information that are guarded at least as strongly as the existing ways.

Acceptable information exposure is always qualified by the (possibly empty) set of user-controlled settings or permissions that guard access to it. For example, the URLs of resources, the timing of link clicks, and the referrer chain within a single origin are not guarded by anything; the scroll position is guarded by the setting to turn off javascript; and access to the camera or geolocation are guarded by permission prompts.

Information that would be acceptable to expose under one set of access guards might be unacceptable under another set, so when an API designer intends to explain that their new API is acceptable because an existing acceptable API already exposes the same information, they must be careful to ensure that their new API is only available under a set of guards that's at least as strict. Without those guards, they need to make the argument from scratch, without relying on the existing API.

Principle: If existing APIs provide access to some information, but we have a plan to change those APIs to prevent that access, new APIs must not be added that provide that same information without extra access guards that make the access to the information acceptable.

Principle: New APIs that provide access to undesirable information should not make that information easier to access, unless they add access guards that make the information acceptable.

If future web platform changes make it possible to remove other access to the undesirable information, it should be clear how to extend those changes to the proposed API.
If an existing browser does block access to the undesirable information, perhaps by breaking some experiences on the Web that other browsers don't wish to break, it should be clear how the more-private browser can also prevent the new API from exposing that information without breaking additional sites or user experiences.
When a developer is trying to access the undesirable information, a new API should be at least as difficult to use as the existing APIs. For example, it shouldn't require less code, less maintenance, or less runtime cost.

The third consideration can be surprising. In many other cases, we can think in terms of a threat model and use designs familiar from security to make information either available or unavailable. In this third case, however, we have to think more economically and consider the cost to a website of inferring the relevant information from whatever data the web's APIs expose. If the cost of inferring the undesirable information is high, fewer websites will gather it, and privacy will be generally better. If a new API makes the cost go down, more websites will start inferring the information, and overall privacy will worsen.

Usually, acceptable APIs in this category will be designed to expose data that makes some acceptable information easier to discover. For example, they might reveal a performance metric for a website directly instead of requiring it to be computed from the timing of onload events. The challenge for the new API's designer is to ensure that the data it exposes doesn't make it cheaper to compute information about people than it would have been through other methods.

Contributes to correlation, identification, secondary use, and disclosure.

Many pieces of information about someone could cause privacy harms if disclosed. For example:

Their location.
Video or audio from the their camera or microphone.
The content of certain files on their filesystem.
Financial data.
Contacts.
Calendar entries.
Whether they are using assistive technology.

A particular piece of information may have different sensitivity for different people. Language preferences, for example, might typically seem innocent, but also can be an indicator of belonging to an ethnic minority. Precise location information can be extremely sensitive (because it's identifying, because it allows for in-person intrusions, because it can reveal detailed information about a person's life) but it might also be public and not sensitive at all, or it might be low-enough granularity that it is much less sensitive for many people.

When considering whether a class of information is likely to be sensitive to a person, consider at least these factors:

whether it serves as a persistent identifier (see severity in Mitigating Browser Fingerprinting in Web Specifications);
whether it discloses substantial (including intimate details or inferences) information about the person using the system or other people;
whether it can be revoked (as in determining whether a permission is necessary);
whether it enables other threats, like intrusion.

Principle: People have certain rights over data that is about themselves, and these rights should be facilitated by their user agent and the actors that are processing their data.

While data rights alone are not sufficient to satisfy all privacy principles for the Web, they do support self-determination and help improve accountability. Such rights include:

The right to access data about oneself.

This right includes both being able to review what information has been collected or inferred about oneself and being able to discover what actors have collected information about oneself. As a result, databases cannot be kept secret and data collected about people needs to be meaningfully discoverable by those people.

The right to erase data about oneself.

The right to erase applies whether or not terminating use of a service altogether, though what data can be erased may differ between those two cases. On the Web, people may wish to erase data on their device, on a server, or both, and the distinctions may not always be clear.

The right to port data, including data one has stored with another actor, so it can easily be reused or transferred elsewhere.

Portability is needed to realize the ability for people to make choices about services with different data practices. Standards for interoperability are essential for effective re-use.

The right to correct data about oneself, to ensure that one's identity is properly reflected in a system.
The right to be free from automated decision-making based on data about oneself.

For some kinds of decision-making with substantial consequences, there is a privacy interest in being able to exclude oneself from automated profiling. For example, some services may alter the price of products (price discrimination) or offers for credit or insurance based on data collected about a person. Those alterations may be consequential (financially, say) and objectionable to people who believe those decisions based on data about them are inaccurate or unjust. As another example, some services may draw inferences about a user's identity, humanity, or presence based on facial recognition algorithms run on camera data. Because facial recognition algorithms and training sets are fallible and may exhibit certain biases, people may not wish to submit to decisions based on that kind of automated recognition.

The right to object, withdraw consent, and restrict use of data about oneself.

People may change their decisions about consent or may object to subsequent uses of data about themselves. Retaining rights requires ongoing control, not just at the time of collection.

The OECD Privacy Principles [OECD-Guidelines], [Records-Computers-Rights], and the [GDPR], among other places, include many of the rights people have as data subjects. These participatory rights by people over data about themselves are inherent to autonomy.

Principle: Whenever possible, processors should work with data that has been de-identified.

Data is de-identified when there exists a high level of confidence that no person described by the data can be identified, directly or indirectly (e.g. via association with an identifier, user agent, or device), by that data alone or in combination with other available information. Note that further considerations relating to groups are covered in the Collective Issues in Privacy section.

We talk of controlled de-identified data when:

The state of the data is such that the information that could be used to re-identify an individual has been removed or altered, and
there is a process in place to prevent attempts to re-identify people and the inadvertent release of the de-identified data. ([De-identification-Privacy-Act])

Different situations involving controlled de-identified data will require different controls. For instance, if the controlled de-identified data is only being processed by one actor, typical controls include making sure that the identifiers used in the data are unique to that dataset, that any person (e.g. an employee of the actor) with access to the data is barred (e.g. based on legal terms) from sharing the data further, and that technical measures exist to prevent re-identification or the joining of different data sets involving this data, notably against timing or k-anonymity attacks.

In general, the goal is to ensure that controlled de-identified data is used in a manner that provides a viable degree of oversight and accountability such that technical and procedural means to guarantee the maintenance of pseudonymity are preserved.

This is more difficult when the controlled de-identified data is shared between several actors. In such cases, good examples of typical controls that are representative of best practices would include making sure that:

the identifiers used in the data are under the direct and exclusive control of the first party (the actor a person is directly interacting with) who is prevented by strict controls from matching the identifiers with the data;
when these identifiers are shared with a third party, they are made unique to that third party such that if they are shared with more than one third party these cannot then match them up with one another;
there is a strong level of confidence that no third party can match the data with any data other than that obtained through interactions with the first party;
any third party receiving such data is barred (eg. based on legal terms) from sharing it further;
technical measures exist to prevent re-identification or the joining of different data sets involving this data, notably against timing or k-anonymity attacks; and
there exist contractual terms between the first party and third party describing the limited purpose for which the data is being shared.

Note that controlled de-identified data, on its own, is not sufficient to make data processing appropriate.

Principle: Groups and various forms of institutions should best protect and support autonomy by making decisions collectively rather than individually to either prevent or enable data sharing, and to set defaults for data processing rules.

Privacy principles are often defined in terms of extending rights to individuals. However, there are cases in which deciding which principles apply is best done collectively, on behalf of a group.

One such case, which has become increasingly common with widespread profiling, is that of information relating to membership of a group or to a group's behaviour, as detailed in 1.2.1 Group Privacy. As Brent Mittelstadt explains, “Algorithmically grouped individuals have a collective interest in the creation of information about the group, and actions taken on its behalf.” ([Individual-Group-Privacy]) This justifies ensuring that grouped people can benefit from both individual and collective means to support their autonomy with respect to data processing. It should be noted that processing can be unjust even if individuals remain anonymous, not from the violation of individual autonomy but because it violates ideals of social equality ([Relational-Governance]).

Another case in which collective decision-making is preferable is for processing for which informed individual decision-making is unrealistic (due to the complexity of the processing, the volume or frequency of processing, or both). Expecting laypeople (or even experts) to make informed decisions relating to complex data processing or to make decisions on a very frequent basis even if the processing is relatively simple, is unrealistic if we also want them to have reasonable levels of autonomy in making these decisions.

The purpose of this principle is to require that data governance provide ways to distinguish appropriate data processing without relying on individual decisions whenever the latter are impossible, which is often ([Relational-Governance], [Relational-Turn]).

Which forms of collective governance are recognised as legitimate will depend on domains. These may take many forms, such as governmental bodies at various administrative levels, standards organisations, worker bargaining units, or civil society fora.

It must be noted that, even though collective decision-making can be better than offloading privacy labour to individuals, it is not necessarily a panacea. When considering such collective arrangements it is important to keep in mind the principles that are likely to support viable and effective institutions at any level of complexity ([IAD]).

A good example of a failure in collective privacy decisions was the standardisation of the ping attribute. Search engines, social sites, and other algorithmic media in the same vein have an interest in knowing which sites that they link to people choose to visit (which in turn could improve the service for everyone). But people may have an interest in keeping that information private from algorithmic media companies (as do the sites being linked to, as that facilitates timing attacks to recognise people there). A person's exit through a specific link can either be tracked with JavaScript tricks or through bounce tracking, both of which are slow and difficult for user agents to defend against. The value proposition of the ping attribute in this context is therefore straightforward: by providing declarative support for this functionality it can be made fast (the browser sends an asynchronous notification to a ping endpoint after the person exits through a link) and the user agent can provide its user with the option to opt out of such tracking — or disable it by default.

Unfortunately, this arrangement proved to be unworkable on the privacy side (the performance gains, however, are real). What prevents a site from using ping for people who have it activated and bounce tracking for others? What prevents a browsers from opting everyone out because it wishes to offer better protection by default? Given the contested nature of the ping attribute and the absence of a forcing function to support collective enforcement, the scheme failed to deliver improved privacy.

Principle: User agents must not help a device administrator surveil the people using the devices they administrate without those people's knowledge.

Principle: User agents should only tell a device administrator about user behavior when that disclosure is necessary to enforce reasonable constraints on use of the device.

Computing devices have owners, and those owners have administrator access to the devices in order to install and configure the programs, including user agents, that run on them. Sometimes, as in the cases of an employer providing a device to an employee, a friend loaning a device to their visitor, or a parent providing a device to their small child, the person using a device doesn't own the device or have administrator access to it. Other times, as in the cases of intimate partners or one relative helping another relative with their device, the owner and primary user of a device might not be the only person with administrator access. As a program running on a device, a user agent generally can't tell whether the administrator who has installed and configured it was authorized by the device's actual owner.

These relationships can involve power imbalances. A child may have difficulty accessing any computing devices other than the ones their parent provides. A victim of abuse might not be able to prevent their partner from having administrator access to their devices. An employee might have to agree to use their employer's devices in order to keep their job.

While a device owner has an interest and sometimes a responsibility to make sure their device is used in the ways they intended, the person using the device still has a right to privacy while using it. The above principles enforce this right to privacy in two ways:

User agent developers need to consider whether requests from device owners and administrators are reasonable, and refuse to implement unreasonable requests, even if that means fewer sales. Owner/administrator needs must not simply trump user needs in the priority of constituencies.
Even when information disclosure is reasonable, the person whose data is being disclosed needs to know about it so that they can avoid doing things that would lead to unwanted consequences.

Some administrator requests might be reasonable for some sorts of users, like employees or especially children, but not be reasonable for other sorts, like friends or intimate partners. In those cases, the user agent can explain what the administrator is going to learn in a way that also says what sort of user is expected to agree. Users in other classes can then react appropriately.

Issue 1

Online harassment is the "pervasive or severe targeting of an individual or group online through harmful behavior" [PEN-Harassment]. Harassment is a prevalent problem on the Web, particularly via social media. While harassment may affect any person using the Web, it may be more severe and its consequences more impactful for LGBTQ people, women, people in racial or ethnic minorities, people with disabilities, vulnerable people and other marginalized groups.

Note

Harassment is both a violation of privacy itself and can be magnified or facilitated by other violations of privacy.

Abusive online behavior may include: sending unwanted information; directing others to contact or bother a person ("dogpiling"); disclosing sensitive information about a person; posting false information about a person; impersonating a person; insults; threats; and hateful or demeaning speech.

Disclosure of identifying or contact information (including "doxxing") can be used, including by additional attackers, to send often persistent unwanted information that amounts to harassment. Disclosure of location information can be used, including by additional attackers, to intrude on a person's physical safety or space.

Mitigations for harassment include but extend beyond mitigations for unwanted information and other privacy principles. Harassment can include harmful activity with a wider distribution than just the target of harassment.

Principle: Systems that allow for communicating on the Web must provide an effective capability to report abuse.

Reporting mechanisms are mitigations, but may not prevent harassment, particularly in cases where hosts or intermediaries are supportive of or complicit in the abuse.

Note

Effective reporting is likely to require:

standardized mechanisms to identify abuse reporting contacts
visible, usable ways provided by sites and user agents to report abuse
identifiers to refer to senders and content
the ability to provide context and explanation of harms
people responsible for promptly responding to reports
tools for pooling mitigation information (see Unwanted information, below)

Receiving unsolicited information that either may cause distress or waste the recipient's time or resources is a violation of privacy.

Principle: User agents and other actors should take steps to ensure that their user is not exposed to unwanted information. Technical standards must consider the delivery of unwanted information as part of their architecture and must mitigate it accordingly.

Unwanted information covers a broad range of unsolicited communication, from messages that are typically harmless individually but that become a nuisance in aggregate (spam) to the sending of images that will cause shock or disgust due to their graphic, violent, or explicit nature (eg. pictures of one's genitals). While it is impossible, in a communication system involving many people, to offer perfect protection against all kinds of unwanted information, steps can be taken to make the sending of such messages more difficult or more costly, and to make the senders more accountable. Examples of mitigations include:

Restricting what new users of a service can post, notably limiting links and media until they have interacted a sufficient number of times over a given period with a larger group. This helps to raise the cost of producing sockpuppet accounts and gives new users the occasion to understand local norms before posting.
Only accepting communication between people who have an established relationship of some kind, such as being part of a shared group. Protocols should consider requiring a handshake between people prior to enabling communication.
Requiring a deliberate action from the recipient before rendering media coming from an untrusted source.
Supporting the ability for people to block another actor such that they cannot send information again.
Pooling mitigation information, for instance shared block lists, shared spam-detection information, or public information about misbehaving actors. As always, the collection and sharing of information for safety purposes should be limited and placed under collective governance.

Issue 2

This section is still being refined. We expect additional principles to be added.

An individual may not realise when they disclose personal data that they are vulnerable or could become vulnerable. Some individuals may be more vulnerable to privacy risks or harm as a result of collection, misuse, loss or theft of personal data because of their attributes, interests, opinions or behaviour. Others may be vulnerable because of the situation or setting (e.g., where there is information asymmetry or other power imbalances), or they lack the capacity to fully assess the risks, or because choices are not presented in an easy-to-understand meaningful way (e.g., deceptive patterns). Yet others may be vulnerable because they have not been consulted about their privacy needs and expectations, or considered in the decisions about the design of the product of service.

Sometimes communities of individuals are classed as “vulnerable”, typically children and the elderly, but anyone could become privacy vulnerable in a given context. Additional privacy protections may be needed for personal data of vulnerable individuals or sensitive information which could cause someone to become vulnerable if their personal data is collected, used or shared.

Even in populations of individuals classed as “vulnerable” (such as children), each individual is unique with their own desires and expectations for privacy. While sometimes others can help vulnerable individuals assess privacy risks and make decisions about privacy (such as parents, guardians and peers), everyone has their own right to privacy.

Principle: User agents and sites should allow for gracefully degraded user experience where some features or functionality may not be available because users have chosen stronger privacy protections (e.g., blocking tracking elements, sensor data or information about installed software or connected devices).

Principle: A user agent may only provide information about a ward to a guardian for the purpose of helping that guardian uphold their responsibilities to their ward. This system must include measures to help wards who realize that their guardian isn't acting in the ward's interest.

Some classes of vulnerable people tend to be unable to make good decisions about their own web use, and need a guardian to help them. Children are a widely recognized example of this class, with their parents often acting as their guardians. A person with a guardian is known as a ward.

Many legal systems treat these guardianship relationships as a set of rights that the guardian possesses. We prefer to instead think of the ward having a right to make informed decisions and exercise their autonomy. Their guardian then has an obligation to help their ward do so when the ward's abilities aren't sufficient, even if that conflicts with the guardian's desires. In practice, many wards discover that their guardian is not making decisions in the ward's best interest, and it's critical that such wards have a way to escape their misbehaving guardian.

Historically, the Web has provided exactly this escape route, and user agents should preserve that feature by correctly balancing a benevolent guardian's need to protect their ward from dangers against other wards' need to protect themselves from their misbehaving guardians.

Issue 3

Issue 4

Notifications and other interruptive UI can be a powerful way to capture attention. Depending on the operating system in use, a notification can appear outside of the browser context (for example, in a general notifications tray) or even cause a device to buzz or play an alert tone. Like all powerful features, notifications can be misused and can become an annoyance or even used to manipulate behaviour and thus reduce autonomy.

Principle: A user agent should help users control notifications and other interruptive UI that can be used to manipulate behavior.

User agents should provide UI that allows their users to audit which web sites have been granted permission to display alerts and to revoke these permissions. User agents should also apply some quality metric to the initial request for permissions to receive notifications (for example, disallowing sites from requesting permission on first visit).

Principle: Web sites should use notifications only for information that their users have specifically requested.

Web sites should tell their users what specific kind of information people can expect to receive, and how notifications can be turned off, when requesting permission to send interruptive notifications. Web sites should not request permission to send notifications when the user is unlikely to have sufficient knowledge (e.g. information about what kinds of notifications they are signing up for) to make an informed response. If it's unlikely that such information could have been provided then the user agent should apply mitigations (for example, warning about potential malicious use of the notifications API). Permissions should be requested in context.

Principle: Actors must not retaliate against people who protect their data against non-essential processing or exercise rights over their data.

Whenever people have the ability to cause an actor to process less of their data or to stop carrying out some given set of data processing that is not essential to the service, they must be allowed to do so without the actor retaliating, for instance by artificially removing an unrelated feature, by decreasing the quality of the service, or by trying to cajole, badger, or trick the person into opting back into the processing.

Issue 5

Principle: User agents should support people in choosing which information they provide to actors that request it, up to and including allowing users to provide arbitrary information.

Actors can invest time and energy into automating ways of gathering data from people and can design their products in ways that make it a lot easier for people to disclose information than not, whereas people typically have to manually wade through options, repeated prompts, and deceptive patterns. In many cases, the absence of data — when a person refuses to provide some information — can also be identifying or revealing. Additionally, APIs can be defined or implemented in rigid ways that can prevent people from accessing useful functionality. For example, I might want to look for restaurants in a city I will be visiting this weekend, but if my geolocation is forcefully set to match my GPS, a restaurant-finding site might only allow searches in my current location. In other cases, sites do not abide by data minimisation principles and request more information than they require. This principle supports people in minimising their own data.

User agents should make it simple for people to present the identity they wish to and to provide information about themselves or their devices in ways that they control. This helps people to live in obscurity ([Lost-In-Crowd], [Obscurity-By-Design]), including by obfuscating information about themselves ([Obfuscation]).

Principle: APIs should be designed such that data returned through an API does not assert a fact or make a promise on the user's behalf about the user or their environment.

Instead, the API could indicate a person's preference, a person's chosen identity, a person's query or interest, or a person's selected communication style.

For example, a user agent might support this principle by:

Generating domain-specific email addresses or other directed identifiers so that people can log into the site without becoming recognisable across contexts.
Offering the option to generate geolocation and accelerometry data with parameters specified by the user.
Uploading a stored video stream in response to a camera prompt.
Automatically granting or denying permission prompts based on user configuration.

Sites should include deception in their threat modeling and not assume that Web platform APIs provide any guarantees of consistency, currency, or correctness about the user. People often have control of the devices and software they use to interact with web sites. In response to site requests, people may people may arbitrarily modify or select the information they provide for a variety of reasons, including both malice and self-protection.

In any rare instances when an API must be defined as returning true current values, users may still configure their agents to respond with other information, for reasons including testing, auditing or mitigating forms of data collection, including browser fingerprinting.

Privacy Principles

Abstract

Status of This Document

How This Document Fits In

Audiences for this Document

1. An Introduction to Privacy on the Web

1.1 Individual Autonomy

1.1.1 Opt-in, Consent, Opt-out, Global Controls

1.1.2 Privacy Labour

1.2 Collective Governance

1.2.1 Group Privacy

1.2.2 Transparency and Research

1.3 People's Agents

1.4 Incorporating Different Privacy Principles

2. Principles for Privacy on the Web

2.1 Identity on the Web

2.1.1 Recognition Types

2.1.2 User agent awareness of recognition

2.1.3 Recognition Methods

2.2 Data Minimization

2.2.1 Ancillary uses

2.3 Information access

2.3.1 Handling acceptable information

2.3.2 Handling information that's being removed

2.3.3 Handling information we can't completely block

2.4 Sensitive Information

2.5 Data Rights

2.6 De-identified Data

2.7 Collective Privacy

2.8 Device Owners and Administrators

2.9 Harassment

2.10 Unwanted Information

2.11 Vulnerability

2.11.1 Guardians

2.12 Consent, Withdrawal of Consent, Opt-Outs, and Objections

2.13 Notifications and Interruptions

2.14 Non-Retaliation

2.15 Support Choosing Which Information to Present

A. Common Concepts

A.1 People

A.2 Server-Side Actors

A.3 Acting on Data

B. High-Level Threats

C. Principles Summary

D. Acknowledgements

E. Issue summary

F. References

F.1 Informative references