W3C W3C Member Submission

Web Tracking Protection

W3C Member Submission 24 February 2011

This version:
http://www.w3.org/submissions/2011/SUBM-web-tracking-protection-20110224/
Latest version:
http://www.w3.org/submissions/web-tracking-protection/
Editors:
Andy Zeigler, Microsoft Corporation
Adrian Bateman, Microsoft Corporation
Eliot Graff, Microsoft Corporation

This document is available under the W3C Document License. See the W3C Intellectual Rights Notice and Legal Disclaimers for additional information.


Abstract

The Web Tracking Protection specification is designed to enable users to opt-out of online tracking. The platform has two parts:

Together these technologies can be used to enforce privacy protection for users, and provide access to content and services that respect user privacy preferences.

A filter list contains parts of third-party URIs that a browser may access automatically when referenced within a web page that a user deliberately visits. Rules in a filter list may change the way the user agent handles third-party content. By limiting the calls to these websites and blocking resources from other web pages, the filter list limits the information other sites can collect about a user.

The Do Not Track user preference is a setting maintained by the user agent. It can be read by a webserver or client JavaScript. A webserver that respects the Do Not Track user preference will read this value and will not track the user when this setting is enabled.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications can be found in the W3C technical reports index at http://www.w3.org/TR/.

This document was published by the Microsoft Corporation as a Member Submission.

By publishing this document, W3C acknowledges that the Submitting Members have made a formal Submission request to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. A W3C Team Comment has been published in conjunction with this Member Submission. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership. Please consult the requirements associated with Member Submissions of section 3.3 of the W3C Patent Policy. Please consult the complete list of acknowledged W3C Member Submissions.

Table of Contents

1. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].

2. Introduction

This section is non-normative.

Today, consumers share information with more websites than the ones they see in the address bar in their browser. This is inherent in the design of the web and simply how the web works, and it has potentially unintended consequences. As consumers visit one site, many other sites receive information about their activities. For example, when a webpage includes a third-party image file—such as a “web beacon”—IP address information, cookies, and referrer data can be sent. A third-party script can have additional impact on user privacy and can collect arbitrary data from the first-party webpage.

This situation results from how modern websites are built. Typically, a website today might bring together content from many other websites, leaving the impression that the website appears to be its own entity. When the browser calls any other website to request anything (an image, a cookie, HTML, a script that can execute), the browser explicitly provides information in order to get information. By limiting data requests to these sites, it is possible to limit the data available to these sites for collection and tracking.

A filter list contains parts of third-party URIs that a browser may access automatically when referenced within a web page a user deliberately visits. Rules in a filter list may change the way the user agent handles third-party content. By limiting the calls to these websites and blocking resources from other web pages, the filter list limits the information other sites can collect about a user.

3. Third-Party URIs

A third-party URI [URI] is a URI with a (second-level) domain name that differs from that of the top-level containing document. A user agent must evaluate any URIs that indicate a sub-document—such as an iframe or any URIs defined in any sub-documents—as third-party with respect to the topmost document. For example, consider a top-level document whose URI is http://www.microsoft.com. This page might contain an iframe whose src URI is http://www.example.com. If the page at http://www.example.com contains an img element whose src is http://www.example.com/img.png, the URI http://www.example.com/img.png is a third-party URI, as its domain name differs from that of the top-level page.

A third-party download is any potential HTTP download request to a third-party URI.

A user-agent must apply a filter list to third-party URIs only.

3.1 Blocking Downloads

When a user agent issues a request for a webpage and receives an HTTP status code that returns a document, and the user or user agent has chosen to apply a filter list, all third-party URIs that can generate a download request must be evaluated against this filter list. When a user agent blocks a download, that user agent should fire any events pertaining to a download error, if applicable.

4. List Format

A filter list is a UTF-8 encoded text file that contains a header, comments, settings, and rules. Filter lists are parsed in a stateless manner across lines, meaning that the ordering of the lines has no effect on the meaning of the file. The only exception to this is the header, which must be the first line of the file.

01 	FilterList
02 	#
03 	# Line 1 is a header. 
04 	#
05 	# Lines 2-11 are comments. 
06 	# As a comment, any line that starts with a “#” character is ignored.
07 	#
08 	# Any line that begins with a “:” character is a setting, which is key-value pair.
09 	# The key-value pair Expires = n specifies to wait n days before checking for an update to the list.
10	#
11 	# Using a setting.
12	# Check for an update to the list in 3 days.
13 	: Expires=3  
14 	#
15 	# Domain rule
16	# Allow all URIs from the example.com domain name.
17 	+d example.com
18 	# 
19 	# Substring rule
20	# Block any URI containing “spamspam”.
21 	- spamspam
22 	#
23 	# Wildcard character
24	# Block any URI that has a “foo” followed by a “bar”.
25 	- foo*bar
26 	#
27 	# Domain rule
28	# Block anything from exampleexample.com.
29 	-d exampleexample.com
30 	#
31	# Domain rule with optional path
32	# Block any URI from example.com that contains the substring “bad.js” in the URI path.
33 	-d example.com bad.js

4.2 Comments

A comment line must start with a number sign (#) character.

4.3 Settings

The filter list format supports settings in the form of key-value pairs. A settings line begins with a colon, (:) and has two string values separated by an equal sign (=). If a setting is not recognized, the user agent must ignore that setting.

4.3.1 Expires

Expires = n

The Expires setting defines how frequently (in n days) the user-agent will check for updates to the list.

The value of n must be an integer between 1 and 30.

The following list file requests that the user agent checks every 10 days (or the next time the user-agent is launched, if greater than 10 days) to see if there are updates to the list.

01 	FilterList
02	: Expires = 10
03	+ example.org

4.4 Rules

Rules are the primary component of a filter list. A rule is a line in a filter list that changes the way the user-agent handles third-party content.

Rules are matched against the URI of each third-party subdownload in a page. A URI that has a different second-level domain name than the URI in the address bar is a third-party URI.

The basic format for a rule is as follows:

01	FilterList
02	#
03	# Allow rule 
04	+d string [string]
05	#
06	# Block rule
07	- string			

4.4.1 Allow Rules

Allow rules allow content from the specified entity to function within the instance of the user agent. Allow rules must begin with a plus sign (+). Allow rules must be domain rules.

4.4.2 Block Rules

Block rules block content from the specified entity from functioning within the instance of the user agent. Block rules must begin with a minus sign (-). Block rules may be either domain rules or substring rules.

4.4.3 Domain Rules

Domain rules allow or block content on a particular domain. Domain rules must begin with the string “+d” (to allow content) or the string “–d” (to block content). For allow rules, the user-agent must evaluate the string specified in the domain part of the allow rule against the target URI, starting from the topmost domain label. An additional and optional string match may be specified to further limit the scope.

For example, the following allow domain rules allow the URI, http://www.subdomain.example.com/file.html.

+d example.com
+d subdomain.example.com

The following allow domain rules, with the optional string, also allow the URI, http://www.subdomain.example.com/file.html.

+d example.com file
+d example.com file.html
+d example.com html

The following allow domain rules fail to match and therefore fail to allow the URI, http://www.subdomain.example.com/file.html.

+d subdomain.example
#  does not match starting at the topmost domain label 	
#
+d othersubdomain.example.com
#  not a complete match of specified domain labels
#
+d example.com /path/file.html
#  “/path/file.html” is not a substring of /file.html

For block rules, the user-agent must evaluate the string specified in the domain part of the block against any contiguous domain labels.

For example, the following block domain rules block the URI, http://www.subdomain.example.com/file.html.

-d example.com
-d subdomain.example.com

The following block domain rules, with the optional string, also block the URI, http://www.subdomain.example.com/file.html.

-d example.com file
-d example.com file.html
-d example.com html
-d subdomain.example

The following block domain rules fail to match and therefore fail to block the URI, http://www.subdomain.example.com/file.html.

#
-d othersubdomain.example.com
#  not contiguous domain labels
#
-d example.com /path/file.html
#  "/path/file.html" is not a substring of /file.html
#

4.4.4 Substring Rules

Substring rules match a substring in a URI, blocking content. For example, the following substring rules block the URI, http://www.example.com/test.html.

- example
- exam
- test.html
- ex*le

However, the following substring rule does not match and therefore does not block the URI, http://www.example.com/test.html

- test2
			

4.4.5 The Wildcard Character

The wildcard character (*) may be used within a substring rule. The wildcard character must match 0 or more of any character.

Wildcard characters are greedy, meaning the wildcard will match as much text as possible.

The wildcard character must not be used in the string representing the domain within a domain rule. The wildcard character may be used in the optional string part of a domain rule.

The following example is valid because the wildcard character is used in the optional string part of the domain rule.

+d example.com sub*string				

The following rule is invalid because the wildcard character is used in the domain part of the domain rule.

# Invalid!
+d domain*.com substring 				

5. Processing Filter Lists

5.1 Processing a Filter List

Filter Lists may contain allow rules, block rules, and even duplicate rules that match the same URI.

When a user agent evaluates a URI against a filter list, it must follow this algorithm:

  1. All allow rules in the filter list must be processed first. No duplicate removal or other processing can be done on the rules.
    • If the URI matches any allow rule, then the content at the URI must be allowed.
  2. All block rules must be processed.
    • If a URI matches any block rule, then the content at the URI must be blocked.
  3. If no rule matches, then the content must be allowed.

This algorithm effectively gives precedence to allow rules over block rules.

5.2 Processing Multiple Filter Lists

If a user-agent supports the use of multiple filter lists simultaneously, then all allow rules from all filter lists must be grouped together and all block rules from all filter lists must be grouped together, such that when the user agent evaluates a URI it first evaluates all allow rules from all filter lists and then evaluates all block rules from all lists. User-agents may remove duplicate rules in lists provided that the meaning of the rules is maintained after the removal of duplicate rules.

6. Do Not Track User Preference

[NoInterfaceObject]
interface NavigatorDoNotTrack {
    readonly attribute DOMString doNotTrack;
};
Navigator implements NavigatorDoNotTrack;

Objects implementing the Navigator interface (e.g. the window.navigator object) must also implement the NavigatorDoNotTrack interface [NAVIGATOR]. An instance of NavigatorDoNotTrack would be then obtained by using binding-specific casting methods on an instance of Navigator.

Attributes
doNotTrack of type DOMString, readonly
DOM property that webpages can use to detect and respect the user’s preference not to be tracked.
No exceptions.

6.1 User Agents

The Do Not Track user preference is an HTTP header and a DOM property that webpages can use to detect and respect the user’s preference not to be tracked. By having both a header and a DOM property, websites can easily detect the user preference from both client and server code. When the Do Not Track user preference is set, the user-agent must apply the HTTP header to all HTTP requests, and the DOM property must be applied to all documents. The user agent is responsible for determining the user experience by which the Do Not Track user preference is enabled.

When the Do Not Track user preference is set, the HTTP request to the webserver for the document must have the following header:

DNT: 1

When the Do Not Track user preference is set on a document, the following DOM Property method must return TRUE:

document.navigator.doNotTrack == "1"

6.2 Websites

Websites that track users across multiple first-party websites must check for the presence of the Do Not Track user preference. If a website detects that this preference is enabled, it must disable any tracking code or collection of data that can be used for tracking purposes, regardless of the level of identification of the user.

A. Augmented Backus–Naur Form

The following example is an Augmented Backus–Naur Form (ABNF) [ABNF] for the Filter List format.

FilterList     =     Header [lines]
Header         =     [UTF8BOM] "FilterList" EOL
lines          =     line *(EOL [line])
line           =     comment / key-value / rule
comment        =     "#" *(VCHAR / WSP)
key-value      =     ":" ALPHA 1*31(ALPHA/DIGIT) *WSP "=" *WSP 1*32(ALPHA/DIGIT)
rule           =     allow-rule / block-rule
allow-rule     =     "+" domain-exp
block-rule     =     "-" domain-exp / substring-exp
domain-exp     =     "d" 1*WSP string [substring-exp]
substring-exp  =     1*WSP wcstring
string         =     1*(ALPHA/DIGIT)
wcstring       =     1*(ALPHA/DIGIT/"*")
UTF8BOM        =     %xEF %xBB %xBF
EOL            =     [CR] LF

B. References

B.1 Normative references

[NAVIGATOR]
Ian Hickson, David Hyatt. Navigator interface in HTML5, Editors draft (Work in progress). URL: http://dev.w3.org/html5/spec/webappapis.html#navigator
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt
[URI]
T. Berners-Lee; R. Fielding; L. Masinter. Uniform Resource Identifiers (URI): generic syntax. January 2005. Internet RFC 3986. URL: http://www.ietf.org/rfc/rfc3986.txt

B.2 Informative references

[ABNF]
D. Crocker and P. Overell. Augmented BNF for Syntax Specifications: ABNF. January 2008. Internet RFC 5234. URL: http://www.ietf.org/rfc/rfc5234.txt