Copyright© 2011 Microsoft Corporation
This document is available under the W3C Document License. See the W3C Intellectual Rights Notice and Legal Disclaimers for additional information.
The Web Tracking Protection specification is designed to enable users to opt-out of online tracking. The platform has two parts:
Together these technologies can be used to enforce privacy protection for users, and provide access to content and services that respect user privacy preferences.
A filter list contains parts of third-party URIs that a browser may access automatically when referenced within a web page that a user deliberately visits. Rules in a filter list may change the way the user agent handles third-party content. By limiting the calls to these websites and blocking resources from other web pages, the filter list limits the information other sites can collect about a user.
The Do Not Track user preference is a setting maintained by the user agent. It can be read by a webserver or client JavaScript. A webserver that respects the Do Not Track user preference will read this value and will not track the user when this setting is enabled.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document was published by the Microsoft Corporation as a Member Submission.
By publishing this document, W3C acknowledges that the Submitting Members have made a formal Submission request to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. A W3C Team Comment has been published in conjunction with this Member Submission. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership. Please consult the requirements associated with Member Submissions of section 3.3 of the W3C Patent Policy. Please consult the complete list of acknowledged W3C Member Submissions.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words must, must not, required, should, should not, recommended, may, and optional in this specification are to be interpreted as described in [RFC2119].
This section is non-normative.
Today, consumers share information with more websites than the ones they see in the address bar in their browser. This is inherent in the design of the web and simply how the web works, and it has potentially unintended consequences. As consumers visit one site, many other sites receive information about their activities. For example, when a webpage includes a third-party image file—such as a “web beacon”—IP address information, cookies, and referrer data can be sent. A third-party script can have additional impact on user privacy and can collect arbitrary data from the first-party webpage.
This situation results from how modern websites are built. Typically, a website today might bring together content from many other websites, leaving the impression that the website appears to be its own entity. When the browser calls any other website to request anything (an image, a cookie, HTML, a script that can execute), the browser explicitly provides information in order to get information. By limiting data requests to these sites, it is possible to limit the data available to these sites for collection and tracking.
A filter list contains parts of third-party URIs that a browser may access automatically when referenced within a web page a user deliberately visits. Rules in a filter list may change the way the user agent handles third-party content. By limiting the calls to these websites and blocking resources from other web pages, the filter list limits the information other sites can collect about a user.
A third-party URI [URI] is a URI
with a (second-level) domain name that differs from that of the top-level containing document.
A user agent must evaluate any URIs that indicate a sub-document—such as an iframe
or any URIs defined in any sub-documents—as third-party
with respect to the topmost document.
For example, consider a top-level document whose URI is http://www.microsoft.com
.
This page might contain an iframe
whose src
URI is http://www.example.com
.
If the page at http://www.example.com
contains an img
element whose src
is http://www.example.com/img.png
,
the URI http://www.example.com/img.png
is a third-party URI,
as its domain name differs from that of the top-level page.
A third-party download is any potential HTTP download request to a third-party URI.
A user-agent must apply a filter list to third-party URIs only.
When a user agent issues a request for a webpage and receives an HTTP status code that returns a document, and the user or user agent has chosen to apply a filter list, all third-party URIs that can generate a download request must be evaluated against this filter list. When a user agent blocks a download, that user agent should fire any events pertaining to a download error, if applicable.
A filter list is a UTF-8 encoded text file that contains a header, comments, settings, and rules. Filter lists are parsed in a stateless manner across lines, meaning that the ordering of the lines has no effect on the meaning of the file. The only exception to this is the header, which must be the first line of the file.
01 FilterList 02 # 03 # Line 1 is a header. 04 # 05 # Lines 2-11 are comments. 06 # As a comment, any line that starts with a “#” character is ignored. 07 # 08 # Any line that begins with a “:” character is a setting, which is key-value pair. 09 # The key-value pairExpires = n
specifies to waitn
days before checking for an update to the list. 10 # 11 # Using a setting. 12 # Check for an update to the list in 3 days. 13 : Expires=3 14 # 15 # Domain rule 16 # Allow all URIs from the example.com domain name. 17 +d example.com 18 # 19 # Substring rule 20 # Block any URI containing “spamspam”. 21 - spamspam 22 # 23 # Wildcard character 24 # Block any URI that has a “foo” followed by a “bar”. 25 - foo*bar 26 # 27 # Domain rule 28 # Block anything from exampleexample.com. 29 -d exampleexample.com 30 # 31 # Domain rule with optional path 32 # Block any URI from example.com that contains the substring “bad.js” in the URI path. 33 -d example.com bad.js
Filter lists must start with a single line that contains the string FilterList
and is known as the header.
The header line may start with the UTF-8 Byte Order Mark (BOM) (EF BB BF)
, which is ignored.
Until this feature becomes standardized, vendors may use a prefix.
For example, the Microsoft implementation uses msFilterList
.
The filter list format supports settings in the form of key-value pairs.
A settings line begins with a colon, (:
) and has two string values separated by an equal sign (=
).
If a setting is not recognized, the user agent must ignore that setting.
The Expires setting defines how frequently (in n days) the user-agent will check for updates to the list.
The value of n must be an integer between 1 and 30.
The following list file requests that the user agent checks every 10 days (or the next time the user-agent is launched, if greater than 10 days) to see if there are updates to the list.
01 FilterList 02 : Expires = 10 03 + example.org
Rules are the primary component of a filter list. A rule is a line in a filter list that changes the way the user-agent handles third-party content.
Rules are matched against the URI of each third-party subdownload in a page. A URI that has a different second-level domain name than the URI in the address bar is a third-party URI.
The basic format for a rule is as follows:
01 FilterList 02 # 03 # Allow rule 04 +d string [string] 05 # 06 # Block rule 07 - string
Allow rules allow content from the specified entity to function within the instance of the user agent.
Allow rules must begin with a plus sign (+
).
Allow rules must be domain rules.
Block rules block content from the specified entity from functioning within the instance of the user agent.
Block rules must begin with a minus sign (-
).
Block rules may be either domain rules or substring rules.
Domain rules allow or block content on a particular domain.
Domain rules must begin with the string “+d
” (to allow content) or the string “–d
” (to block content).
For allow rules, the user-agent must evaluate the string specified in the domain part of the allow rule against the target URI,
starting from the topmost domain label.
An additional and optional string match may be specified to further limit the scope.
For example, the following allow domain rules allow the URI, http://www.subdomain.example.com/file.html
.
+d example.com +d subdomain.example.com
The following allow domain rules, with the optional string, also allow the URI, http://www.subdomain.example.com/file.html
.
+d example.com file +d example.com file.html +d example.com html
The following allow domain rules fail to match and therefore fail to allow the URI, http://www.subdomain.example.com/file.html
.
+d subdomain.example # does not match starting at the topmost domain label # +d othersubdomain.example.com # not a complete match of specified domain labels # +d example.com /path/file.html # “/path/file.html” is not a substring of /file.html
For block rules, the user-agent must evaluate the string specified in the domain part of the block against any contiguous domain labels.
For example, the following block domain rules block the URI, http://www.subdomain.example.com/file.html
.
-d example.com -d subdomain.example.com
The following block domain rules, with the optional string, also block the URI, http://www.subdomain.example.com/file.html
.
-d example.com file -d example.com file.html -d example.com html -d subdomain.example
The following block domain rules fail to match and therefore fail to block the URI, http://www.subdomain.example.com/file.html
.
# -d othersubdomain.example.com # not contiguous domain labels # -d example.com /path/file.html # "/path/file.html" is not a substring of /file.html #
Substring rules match a substring in a URI, blocking content.
For example, the following substring rules block the URI,
http://www.example.com/test.html
.
- example - exam - test.html - ex*le
However, the following substring rule does not match and therefore does not block the URI, http://www.example.com/test.html
- test2
The wildcard character (*
) may be used within a substring rule.
The wildcard character must match 0 or more of any character.
Wildcard characters are greedy, meaning the wildcard will match as much text as possible.
The wildcard character must not be used in the string representing the domain within a domain rule. The wildcard character may be used in the optional string part of a domain rule.
The following example is valid because the wildcard character is used in the optional string part of the domain rule.
+d example.com sub*string
The following rule is invalid because the wildcard character is used in the domain part of the domain rule.
# Invalid! +d domain*.com substring
Filter Lists may contain allow rules, block rules, and even duplicate rules that match the same URI.
When a user agent evaluates a URI against a filter list, it must follow this algorithm:
This algorithm effectively gives precedence to allow rules over block rules.
If a user-agent supports the use of multiple filter lists simultaneously, then all allow rules from all filter lists must be grouped together and all block rules from all filter lists must be grouped together, such that when the user agent evaluates a URI it first evaluates all allow rules from all filter lists and then evaluates all block rules from all lists. User-agents may remove duplicate rules in lists provided that the meaning of the rules is maintained after the removal of duplicate rules.
Navigator implements NavigatorDoNotTrack
;
Objects implementing the Navigator
interface (e.g. the window.navigator
object)
must also implement the NavigatorDoNotTrack
interface
[NAVIGATOR].
An instance of NavigatorDoNotTrack
would be then obtained
by using binding-specific casting methods on an instance of Navigator
.
The Do Not Track user preference is an HTTP header and a DOM property that webpages can use to detect and respect the user’s preference not to be tracked. By having both a header and a DOM property, websites can easily detect the user preference from both client and server code. When the Do Not Track user preference is set, the user-agent must apply the HTTP header to all HTTP requests, and the DOM property must be applied to all documents. The user agent is responsible for determining the user experience by which the Do Not Track user preference is enabled.
When the Do Not Track user preference is set, the HTTP request to the webserver for the document must have the following header:
DNT: 1
When the Do Not Track user preference is set on a document,
the following DOM Property method must return TRUE
:
document.navigator.doNotTrack == "1"
Websites that track users across multiple first-party websites must check for the presence of the Do Not Track user preference. If a website detects that this preference is enabled, it must disable any tracking code or collection of data that can be used for tracking purposes, regardless of the level of identification of the user.
The following example is an Augmented Backus–Naur Form (ABNF) [ABNF] for the Filter List format.
FilterList = Header [lines] Header = [UTF8BOM] "FilterList" EOL lines = line *(EOL [line]) line = comment / key-value / rule comment = "#" *(VCHAR / WSP) key-value = ":" ALPHA 1*31(ALPHA/DIGIT) *WSP "=" *WSP 1*32(ALPHA/DIGIT) rule = allow-rule / block-rule allow-rule = "+" domain-exp block-rule = "-" domain-exp / substring-exp domain-exp = "d" 1*WSP string [substring-exp] substring-exp = 1*WSP wcstring string = 1*(ALPHA/DIGIT) wcstring = 1*(ALPHA/DIGIT/"*") UTF8BOM = %xEF %xBB %xBF EOL = [CR] LF
4.2 Comments
A comment line must start with a number sign (
#
) character.