Machine Interpretable Privacy Policies -- A fresh take on P3P

Dave Raggett <dsr at w3 dot org>, W3C

This work was conducted as part of the PrimeLife project with funding from the European Union's 7th Framework Programme. The work reported is experimental and the examples shown are ficticious, and taken from a working demonstator.

Introduction

The W3C Platform for Privacy Preferences (P3P) 1.0 was published as a W3C Recommendation in July 2002 [1]. It defines a machine interpretable format for websites to express their privacy practices. A revised format (P3P 1.1) was published as a W3C Note in November 2006, but failed to reach Recommendation status [2].

In summary, P3P describes the business name and address responsible for the website, the dispute resolution procedures, the means (if any) for users to access personal data collected by the website, the kinds of data collected, the purposes it will be used for, the data retention policy, and the recipients of the data.

P3P supports a notice and consent model of privacy, where websites describe their privacy policies and users can review the policy and decide whether to walk away or to proceed to interact with the site, and by so doing indicate their consent to that policy.

Rather than expecting users to review the privacy policy for each website that they visit, a P3P enabled web browser performs an automatic comparison of the user's recorded preferences with the website's policy, and only alerts the user if there is a mismatch.

P3P provides plenty of flexibility in the representation of privacy policies. This flexibility poses huge challenges for expressing user preferences in a practical way for the purposes of automatic comparison of preferences with policies. This problem was recognized early on in the development of P3P, and partially addressed through the introduction of compact policies. These were intended to enable an efficient comparison process, but only cover policy information related to cookies. The full P3P policy remains the authoritative statement of policy.

Browser support for P3P has been largely limited to Microsoft's Internet Explorer, which has included support for P3P compact policies since IE6. Microsoft's dominant market share has encouraged websites to implement P3P despite the lack of support from other browser vendors.

A fresh take on P3P

With increasing public awareness of the amount of information being collected by websites, it seems timely to consider new approaches covering more than just cookies, whilst enabling a practical treatment of the user interface for expressing privacy preferences.

To investigate this, a Firefox extension was developed to look at the issues involved. This had to support:

  1. auto-generation of a human readable version of the policy
  2. automatic comparison of the user preferences with the policy
  3. automatic generation of a human readable report on any mismatches
  4. user interface for viewing and changing user preferences

The scope was taken as the data that websites can collect from HTTP request headers during a session. This includes the IP address, cookies, the user agent header, information on user preferences for language and data formats, the requested URL, the date and time of day, and more.

To simplify the user interface for preferences, a subset of P3P was chosen. This has the following object model:

Note this uses P3P's data categories rather than the taxonomy of data items. This was found to be a much better fit to the needs for describing the kinds of data collected from HTTP requests.

The simple object model allows the preferences user interface to be provided as a set of grouped checkboxes, as shown below:

screenshot of preferences dialog

Accessing the policy and generating a human readable version

To reach a website, the user can type in a URL, follow a bookmark, or follow a link from another site, e.g. on the results page from query on a search engine like Google. The browser extension intercepts the Firefox location change event and cancels the HTTP request before it is sent. The extension then sends an HTTP HEAD request to the website's root. The response is examined to find a refererence to the site's generic privacy policy. This is represented as an HTTP Link header (analogous to the HTML link element), e.g.

Link: <http://localhost/w3c/policy.json>;
   rel="http://primelife.eu/generic-privacy-policy"

This header is easy to add to pages generated via PHP. The URI for the policy is then dereferenced to obtain the policy itself. Note P3P 1.0 defined a P3P HTTP header rather than using the generic Link header. This is something that could be considered if and when this work is brought into the standards track.

The object model for policies is decoupled from the on-the-wire transfer format, but from a practical point of view it was easiest to implement the transfer format with JSON [3]. Here is an example policy in JSON:

{
    "fullURI": null,
    "optURI": null,
    "name": "ACME widgets online inc.",
    "purposes": ["current", "admin", "tailoring", "individual-analysis" ],
    "recipients": [ "ours", "delivery", "same" ],
    "retention": "business-practices",
    "categories": [ "computer", "navigation", "interactive" ]
}

Generating a human readable version of the privacy policy

The P3P 1.1 specification includes suggested text for each element in the taxonomy. This was copied into JavaScript and used to generate a human readable version of the policy. Here is an example:

screenshot of auto-generated human readable policy

The same text was also used for constructing a dialog summarising the mismatch between the user's preferences and the website's policy, for example:

screenshot of mismatch dialog

If the site's policy matched the user's preferences, or the user decided to override the mismatch, the browser extension then proceeds to relaunch the HTTP request for the original URL.

The Firefox notification bar is shown when a site is found to lack a privacy policy.

screenshot of no policy notification

The Firefox notification bar is shown when a mismatch is found.

screenshot of mismatch notification

Clicking "View details" brings up the warning dialog shown earlier.

A local SQLite database was used to capture the user's preferences, and to cache the policy for sites as a performance optimization.

Anonymising Proxies

The act of making an HTTP HEAD request on a website's root discloses the browser's external IP address. This can be avoided by routing the request through an HTTP proxy. This could be configured via a user preference.

Summary and suggestions for further work

This paper has described a fresh take on P3P that goes beyond the limitations of compact policies, whilst still enabling a simple user interface for setting preferences. The object model lends itself to the use of JSON as a policy transfer format. The restricted semantics for a machine readable policy covering data collected in HTTP requests, is supplemented by a link to the site's full human readable policy.

A further consideration is the privacy policy for other kinds of personal information collected by websites, for example, credentials coupled to a user's public or partial identity. Can the P3P taxonomies be extended to support these?

P3P and the approach described in this paper are couched in legal terms relevant to the obligations extended by websites to their users. Websites also have the challenge of operationalizing privacy policies when it comes to controlling access and usages of personal data in the website's backend. This suggests the need for transforming privacy policies into data handling policies. The PrimeLife project is looking at extending the XACML access control language to cover data handling policies, see H5.3.2 [4].

Widespread support for machine readable privacy policies is likely to involve a legislative mandate with measures in place to ensure that sites conform to the policies they disclose. However, this would only apply to the countries with the corresponding laws. A way is needed to allow the browser to verify the jurisdiction a given website is subject to. This could take the form of digital certificates issued by national agencies.

A separate issue is many people aren't sufficiently motivated to set privacy preferences. One reason is the desire to just get to the website in question without having to bother with reviewing the policy. Another is a lack of knowledge sufficient for an informed decision. This points the way to the use of independent third parties for help with setting privacy preferences, and for monitoring the data handling practices of websites. Some progress has been made with the latter in terms of a browser extension (Privacy Dashboard) that tracks what information is collected by the websites you visit, together with a means to set your preferences on a site by site basis [5].


Further reading

[1] http://www.w3.org/TR/2002/REC-P3P-20020416/
[2] http://www.w3.org/TR/2006/NOTE-P3P11-20061113/
[3] http://www.json.org/
[4] http://www.primelife.eu/results/documents/activity-5-policies
[5] http://www.primelife.eu/results/opensource/76-dashboard