Rigo Why do-not-track is a very special case

Position Paper from Rigo Wenning for the W3C Workshop on Web Tracking and User Privacy

1. Background

W3C has been working and exploring the capabilities of Policy languages and Privacy protocols for a long time now. Our active participation in Privacy research was motivated by the aim to determine future path into enhancing Privacy on the Web while preserving web architecture and socially useful business models. With the PRIME project, the feasibility of data handling via a meta data systems was successfully explored. PrimeLife was subsequently trying to transform the scientific findings of PRIME into real advantages for every day use. Both projects were also exploring the possibility of negotiating the privacy handshake to come to an agreement. Issues and options were explored in Workshops organized by W3C:

All workshops explored issues related to finding a common understanding between a service and a user of said service on what actually happens to the data exchanged and whether collection of a certain type of data is really necessary. The workshop touched on all kinds of situations and data contexts.

But do-not-track is a special use case within the wide privacy field with a very limited scope. It isn't about some digital footprints we have left online, but about massive scale funneling of digital life trails into huge data warehouses to be sold to the best bid. This massiveness, at this very moment, still requires some bigger infrastructure and thus some bigger players. Not every individual is tracking every other individual.

2. Policy for Agreement

According to merriam webster's dictionary, an agreement is a harmony of opinion. A harmony requires two or more parties that are involved in our case. They share expectations. P3P, PPL and other policy languages always try to get to some final result of that type. This final result may be called an agreement where user and service are on the same page on what data is transferred and what it is used for. This is normally achieved by exchanging and matching policy files. One party sends a first file with semantics in it. The other party receives and parses the file and matches to established preferences. In case of a mismatch on the receiver site, there is usually human intervention necessary to adapt to get to a match or interrupt the interaction.

Without this matching procedure, there are just provisional expectations flying passed each other. But the matching and the consequences of the matching strongly depend on the protocol, namely on who sends the first policy file to open the debate. This opens another field of possible misunderstanding.

3. Terminology is important

To avoid the misunderstanding, experience in research has shown the paramount importance of a commonly agreed terminology. Failing to establish such a common terminology, Privacy semantics are such that people talk with the same words about at least three different things. E.g. the term data handler looks promising, but could be applied to a consumer receiving a cookie, to a service acquiring form data and processing it or to a third party processing data on behalf of a service. Only if all people discussing are really sure which party they are talking about, which role that party has and who says what when to whom, only then, an informed exchange is possible that prevents surprises (for all sides!) down the road. It is very important for services and businesses tracking to realize that they MUST block their political reflexes to inject fuzziness into the terms by redefining some of them or refuse a clear ontology. Even those thinking they will benefit from ambiguity will pay a high price once ambigous statements will be enforced and results from courts are not predictable.

The challenge on terminology comes at a time when an established set of terms is coming to its limits. Apart from being emotionally and politically loaden in the transatlantic relations, the clear and useful terms of data subject and data controller from the European data protection Directive have lost their precision as we are all data controllers today. The peak of this development can be seen in the advent of social networking. Fortunately, the discussion about do-not-track does not explode the semantic framework mentioned above and remains usable for this specific case, if the political dimension of the terms isn't creating obstacles that make the definition of a new set of terms necessary.

4. Services are still industrialized

Despite the advance IT landscape our society still is concentrated on creating an offer once that is consumed many times. This is true for B2B and B2C scenarios. Let's imagine a site with a lot of content that is designed for public consumption. The business plan requires some re-financing via advertisement. Advertisement networks are selected and their applications included into the system. There are months and months of work to get such a site going. Legal agreements go hand in hand with practical arrangements. Data flows are contracted first and then put into place technically.

This also means that terms and conditions of an industrialized service can only be offered by the service. Those terms and conditions are also there to describe the offer and all its components and obligations and performances attached to it. It can't be (yet) the other way around. Let's imagine a site with a map. While browsing the map, a user can choose to have hotels shown to him. The hotel information comes from a reservation service and is dynamically loaded via an API. The map also contains wheather information loaded from a wheather service via another iFrame. Imagine now that a user is able to tell the map-site that he doesn't want to be tracked for hotel information but tracking on wheater forecast is fine. The way the relations are set up are such, that the mapping service has a pre-packaged deal with the hotel reservation service. Do-not-track would mean that the map server records the preference of the user, informs all its subservices and dynamically changes the configuration of the site, so that the loading of iFrames will not trigger tracking by third parties receiving a request from the mapping service via iFrame anymore. Just because one within 10,000 users wants this option, the service will not set up such an option.

5. Why do-not-track is a special case

If the assumptions above are true, do-not-track is a very special case. It is a special case because it is not following the usual set up of policy languages, and it is also a special case as it needs to describe the industrialized setup of information flows in a detailed way. One could also say that we must describe the do-not-track offer alongside the other offers of using services that include the exchange of information for tracking. And this is why everybody is asking about the semantics of do-not-track. Because the exact expected behavior is needed in order to build a service that can accommodate the wish not to be tracked.

So while in protocols/services conceived so far, an offer is created by the service and this offer is accepted by the user via the continued use of the service, we have here a scenario where a standard would have to define the exact meaning of what sending the header means. Policy languages can help to handle data that was acquired together with a do-not-track header. Policy languages can help express the precise meaning of the presence of such header. But the way do-not-track is set up, the only way not to deceive user and service is to have one single set of semantics that apply equally to the user while sending the header and to the service declaring conformance with do-not-track. Every attempt to wiggle out of fixed semantics will have to alter the protocol and start with the service offering one or more policy options and allow the user to chose.

This is what P3P tried to achieve, so why do-not-track? Do-not-track and a fixed set of policy terms has the advantage of being simple and clear. They do not allow for a mismatch of terms, only for false claims of conformance. The latter is something law knows well to deal with, the former is not so easy.

Summary:

Do-not-track is a nice new light weight feature. But it can't be used to express a policy language. It can only express a fixed set of rules or expected behaviors without room for negotiation. The exact content of the fixed set of behaviors has to be determined in a social process and will be the same for all services claiming adherence or conformance. Because if do-not-track means A on site X and B on site Y, we are back in asking ourselves how we can determine A and B before engaging with a site and we are back to square one. For the privacy content and the values, I refer to my position paper submitted to the API-Privacy Workshop 2010


Created by Rigo Wenning (rigo@w3.org), last update $Id: wenning.html,v 1.2 2011/03/31 15:49:52 dsr Exp $