W3C has been working and exploring the capabilities of Policy languages and Privacy protocols for a long time now. Our active participation in Privacy research was motivated by the aim to determine future path into enhancing Privacy on the Web while preserving web architecture and socially useful business models. With the PRIME project, the feasibility of data handling via a meta data systems was successfully explored. PrimeLife was subsequently trying to transform the scientific findings of PRIME into real advantages for every day use. Both projects were also exploring the possibility of negotiating the privacy handshake to come to an agreement. Issues and options were explored in Workshops organized by W3C:
All workshops explored issues related to finding a common understanding between a service and a user of said service on what actually happens to the data exchanged and whether collection of a certain type of data is really necessary. The workshop touched on all kinds of situations and data contexts.
But do-not-track
is a special use case within the wide privacy field
with a very limited scope. It isn't about some digital footprints we have left
online, but about massive scale funneling of digital life trails into huge data
warehouses to be sold to the best bid. This massiveness, at this very moment,
still requires some bigger infrastructure and thus some bigger players. Not
every individual is tracking every other individual.
According to merriam webster's dictionary, an agreement is a harmony of
opinion
. A harmony requires two or more parties that are involved in our
case. They share expectations. P3P, PPL and other policy languages always try
to get to some final result of that type. This final result may be called an
agreement where user and service are on the same page on what data is
transferred and what it is used for. This is normally achieved by exchanging
and matching policy files. One party sends a first file with semantics in it.
The other party receives and parses the file and matches to established
preferences. In case of a mismatch on the receiver site, there is usually human
intervention necessary to adapt to get to a match or interrupt the interaction.
Without this matching procedure, there are just provisional expectations flying passed each other. But the matching and the consequences of the matching strongly depend on the protocol, namely on who sends the first policy file to open the debate. This opens another field of possible misunderstanding.
To avoid the misunderstanding, experience in research has shown the
paramount importance of a commonly agreed terminology. Failing to establish
such a common terminology, Privacy semantics are such that people talk with the
same words about at least three different things. E.g. the term data
handler
looks promising, but could be applied to a consumer receiving a
cookie, to a service acquiring form data and processing it or to a third party
processing data on behalf of a service. Only if all people discussing are
really sure which party they are talking about, which role that party has and
who says what when to whom, only then, an informed exchange is possible that
prevents surprises (for all sides!) down the road. It is very important for
services and businesses tracking to realize that they MUST block their political
reflexes to inject fuzziness into the terms by redefining some of them or
refuse a clear ontology. Even those thinking they will benefit from ambiguity
will pay a high price once ambigous statements will be enforced and results
from courts are not predictable.
The challenge on terminology comes at a time when an established set of
terms is coming to its limits. Apart from being emotionally and politically
loaden in the transatlantic relations, the clear and useful terms of data
subject
and data controller
from the European data protection
Directive have lost their precision as we are all data controllers
today. The peak of this development can be seen in the advent of social
networking. Fortunately, the discussion about do-not-track
does not
explode the semantic framework mentioned above and remains usable for this
specific case, if the political dimension of the terms isn't creating obstacles
that make the definition of a new set of terms necessary.
Despite the advance IT landscape our society still is concentrated on creating an offer once that is consumed many times. This is true for B2B and B2C scenarios. Let's imagine a site with a lot of content that is designed for public consumption. The business plan requires some re-financing via advertisement. Advertisement networks are selected and their applications included into the system. There are months and months of work to get such a site going. Legal agreements go hand in hand with practical arrangements. Data flows are contracted first and then put into place technically.
This also means that terms and conditions of an industrialized service can
only be offered by the service. Those terms and conditions are also there to
describe the offer and all its components and obligations and performances
attached to it. It can't be (yet) the other way around. Let's imagine a site
with a map. While browsing the map, a user can choose to have hotels shown to
him. The hotel information comes from a reservation service and is dynamically
loaded via an API. The map also contains wheather information loaded from a
wheather service via another iFrame. Imagine now that a user is able to tell
the map-site that he doesn't want to be tracked for hotel information but
tracking on wheater forecast is fine. The way the relations are set up are
such, that the mapping service has a pre-packaged deal with the hotel
reservation service. Do-not-track
would mean that the map server records
the preference of the user, informs all its subservices and dynamically changes
the configuration of the site, so that the loading of iFrames will not trigger
tracking by third parties receiving a request from the mapping service via
iFrame anymore. Just because one within 10,000 users wants this option, the
service will not set up such an option.
do-not-trackis a special case
If the assumptions above are true, do-not-track
is a very special
case. It is a special case because it is not following the usual set up of
policy languages, and it is also a special case as it needs to describe the
industrialized setup of information flows in a detailed way. One could also say
that we must describe the do-not-track
offer alongside the other offers
of using services that include the exchange of information for tracking. And
this is why everybody is asking about the semantics
of
do-not-track
. Because the exact expected behavior is needed in order to
build a service that can accommodate the wish not to be tracked.
So while in protocols/services conceived so far, an offer is created by the
service and this offer is accepted by the user via the continued use of the
service, we have here a scenario where a standard would have to define the
exact meaning of what sending the header means. Policy languages can help to
handle data that was acquired together with a do-not-track
header.
Policy languages can help express the precise meaning of the presence of such
header. But the way do-not-track
is set up, the only way not to deceive
user and service is to have
one single set of semantics that apply equally to the user while sending the
header and to the service declaring conformance with do-not-track
. Every
attempt to wiggle out of fixed semantics will have to alter the protocol and
start with the service offering one or more policy options and allow the user
to chose.
This is what P3P tried to achieve, so why do-not-track
?
Do-not-track
and a fixed set of policy terms has the advantage of being
simple and clear. They do not allow for a mismatch of terms, only for false
claims of conformance. The latter is something law knows well to deal with, the
former is not so easy.
Do-not-track
is a nice new light weight feature. But it can't be used
to express a policy language. It can only express a fixed set of rules or
expected behaviors without room for negotiation. The exact content of the fixed
set of behaviors has to be determined in a social process and will be the same
for all services claiming adherence or conformance. Because if
do-not-track
means A on site X and B on site Y, we are back in asking
ourselves how we can determine A and B before engaging with a site and we are
back to square one. For the privacy content and the values, I refer to my
position paper submitted to the API-Privacy
Workshop 2010