Privacy - Why on earth should we care?

Position Paper from Rigo Wenning for the W3C Workshop on Privacy for Advanced Web APIs

1. Introduction

There are several concepts of privacy in the world. Consequently, W3C as such can not have a fixed opinion of what Privacy is or should be. W3C's responsibility is to help building systems that work in the large variety of concepts about Privacy world wide. In order to understand where and how Working Groups and developers can help, it is very useful to understand why Privacy is important and what the aim of this protection is.

Fortunately, there is already a lot of harmonization achieved via organizations like the OECD. There are common principles and guidelines. But most privacy legislation and regulation is formulated in a best-effort-way. This means, seeking full compliance in a sector like the Web would be very hard to achieve. The high expectations set by most regulation is sometimes counterproductive as it puts so much burden on doing the right thing that it is so much easier to do nothing and put the effort into features that are more appreciated by users and/or customers. To counter that feeling, let us seek for incremental improvements to Privacy on the Web platform.

As can be deducted from the above, this paper wants to educate against pure administrative approaches towards privacy, against a bean counting exercise of what single data item/type is transferred for what purpose and whether this is legitimate in certain jurisdictions, but not in others. It is urgent for the Workshop participants to push their thinking one level up. New features and possibilities should be scrutinized in light of the goals of Privacy constraints.

We are also confronted with the trouble that users do not care too much about Privacy until there is an incident involving themselves. Large amounts of data can be collected and remain silent in a silo. Only when they re-surface and doing harm, the outcry is big. But often, it is hard or impossible to remedy the situation, erase data and re-start with a blank sheet.

2. Why Privacy

As an anecdote, let me remind you the story behind the birth of the right to privacy, invented by Judge Louis Brandeis in 1890. Brandeis had a law firm in Boston with a partner: Samuel D. Warren. At this time, the technical advances in photo cameras were such that journalists were just about to be able to carry them around. Warren was photographed walking hooked with a very nice women; who was not his wife. At those times a rather grave sinner. The photo was published in a newspaper. The consequences nearly killed the law-firm and Warren's career. Brandeis invented the right to be left alone. Today again, we have new technology , new challenges but still the same society that sanctions certain behaviors, links, contacts etc.

In the sixties and seventies, the mainframes used by governments and large corporations were opening new possibilities. Governments talked about registering the entire population in databases with a unique number per person. The famous books of Alan Westin Databases in a free society and Privacy and Freedom triggered the larger discussion on data protection that. The data protection aspect of Privacy was born. Westin and others demonstrated the dangers of large databases and warned against automatic decision making without human intervention. The chilling effects of surveillance and the power of higher knowledge by authorities would endanger democracy. A peak was reached 1984, when Germany started a very large census that was not accepted by the population. The German federal constitutional court dismissed the census and created the right on data self determination: This basic right warrants in this respect the capacity of the individual to determine in principle the disclosure and use of his/her personal data. Limitations to this informational self-determination are allowed only in case of overriding public interest. This principle is the basis for the European data protection system. In the US, the Notice & Choice approach was accompanied by some sectorial regulation. But all in all, the US system is significantly weaker.

The data self determination started a paradigm where watchmen looked at every single data item, its transfer and processing. As a long time privacy advocate, I was for a long time in the dark, counting beans like so many of us: Must this personal data be transferred, why, how, can it be under pseudonym, would a bug leak information etc.. Is anonymity the ultimate goal? But all this did not satisfy my curiosity on “why” Privacy is such a big thing. What is the reason why we do all this. Can we do without?

Beate Rössler from Amsterdam University opened my eyes with a book: “Der Wert des Privaten” (the value of Privacy) where she diligently and scientifically works out the variety of privacy concepts in philosophy, fears, claims, complaints, to the one and overarching architectural term: Autonomy! This is the overarching concept I found. It explains why we are talking about a human right. She showed that the various aspects of privacy, explained later in this paper, cumulate in an environment where the individual adapts behavior because of justified or unjustified fears of consequences. Those consequences can range from a loss of image and standing in society up to penal sanctions. Everyone of us sometimes makes things wrong. Regulation is so complex that one necessarily infringes one or the other rule. Thus, a system of surveillance, or even perceived surveillance, creates an environment where people hold back opinions, do not act freely anymore and constrain their behavior in various ways. This has a direct effect on the overall building of opinion that is so important for a democratic society. Whatever we create in features and new technology, we constantly have to question whether what we do is in line with the overall goal: Personal autonomy of the people that use the Web. The term autonomy is meant as autonomy of decision. This is still broad and abstract, but with the examples given later, a system will appear and be comprehensible.

Privacy discussions always come up, if society is confronted with a large technology step. With the advent of the Web, the mobile Web and social networking, we are confronted with such a new step. The question is, what is the difference to past technologies that justify a different approach. It may be the fact that society is now networked at any moment in time. It is also important to determine the new dangers out of the technology step for our autonomy and our autonomy of decision.

3. Privacy has a thousand faces

A laptop is stolen or forgotten in a bus. There were 7Mio health data records on it. The Press writes: Privacy incident! A women is filmed by a security camera in a bank. Her son leaves traces of dog shit on the carpet and the bank correlates video and accounting information to determine her identity to invoice the cleaning of the carpet to her. The Press writes: Privacy incident! A girl takes a picture of her friend and puts it under creative commons on a photo-sharing site. A large company uses the photo for a national advertisement campaign. The friend is confronted with the issues around becoming a celebrity over night without having chosen to be one. The Press writes: Privacy incident! A boy publishes ugly party photos on a social networking site. Looking for a job, he wonders why he gets rejected all the time until somebody points him to the photos. He tries to erase the traces but does not succeed as those photos were a hype at some point in time and are spread all over the place. The Press writes: Privacy incident! A person searches for several diseases and now gets ads about health insurance and medical services on every site she visits. The Press writes: Privacy incident! A person gets strange invoices from services. On asking back, she discovers that someone collected enough information from her blog and social networking presences and twitter and other available sources to make services (like banks and shops) believe that someone is really her. The corresponding trouble is now for her to prove to the service that this someone isn't really her and therefore she wouldn't have to pay all the invoices, get her account re-credited etc. The Press writes: Privacy incident!

3.1 Security

Very often, Privacy and Security are used interchangeably for things that are purely related to Security. This has a long tradition as already in the seventies, the data protection debate also addressed things like data quality (for decisions) and securing personal data. But there are no technology questions specific to Privacy. Whether business data is secured in a reliable way or personal data doesn't make too much of a difference. The main discussion in this field is how much (expensive) security technology is applied to personal data. The concept of autonomy gives a vague answer: The stronger the impact on one's image/reputation etc, the stronger the impact on one's autonomy, the more security technology is needed. It is just proportional to the value of data. And the value of data is determined by the goals of Privacy.

3.2 Image and reputation control

Social networks teach us that people do a lot to gain reputation and recognition. Reputation among peers is a great motivation. However we concretely define reputation, a given person judges the confidence of another person by extrapolating past behavior to expected behavior. Now, if all the past behavior is known and referencing the remark above that we all make mistakes, forging our image towards society is also a matter of control on who knows what about me. This system is perfectly exposed in the behavior of the movie stars and their aim to keep things private not to damage their image. In this extreme, some private space is also a necessity to keep a mind sane.

Everyone of us has a variety of different roles in society. There is private life, work life, social engagement, friends, sports and so on. All those roles come with different rules and expectations. A matrix perhaps hardly transposable into a computer program. In all those roles, humans are rather good in displaying a certain image that may vary from role to role. If information from one role spills over to another role, the image is tainted thus resulting in a risk of decreased social success. This is the very reason we see people fighting to prevent certain information about them leaking to the public. And this is not only an issue for movie stars, this is an issue for everybody.

Spilling over can come from deliberate release of said information in one context and re-surface in another. It can come from the data trails we leave on the net, that surface in some way to people of a different role/context. The most tangible and urgent issue are people that communicate with friends on social networks and do not realize that an employer can look up such information thus contradicting the polished image one gives in the candidature for a new job. But information can not only spill over from roles. Time is also a very important axis to think along. Would you want to be confronted with the stupid things you did at age 16? Unfortunately, the Web until today has no delete button and making one would be a herculean task. So carefully releasing information is the only remedy we have so far.

3.3 Unwanted disclosure

If we can't erase the information and the only option is careful release of information, this has not only consequences for one self, but also creates responsibility for data about others that one controls.

This may be in a small context: Compromising Information can be released by the data subject (the person the information is about), but also by third parties. DAP is a good example with the release of information from the address book. Imagine a person of doubtful renown who releases an address book containing your name and address. To all appearances there is a link between that person and yourself. Now imagine that the person of doubtful renown is a nice person that would not tell that you are in his or her address book. But with DAP, a third party would now be able to access that address book, find your name. This third person may think: interesting! Or would release this fact to many other people. The fact that someone is in President Obama's address book is a fact of interest. How could you find out? With DAP and no protections, that wouldn't be a problem. This is in practice today, if a social networking platform ask for access to your entire address book.

This may be in a bigger context: Compromising information from third parties can be acquired by the service actually used. The commerce with profiles would go into this category. Correlation of large amounts of information can identify persons by patterns. Under such circumstances, users can't really determine whether the release of a certain information allows to identify them with the help of third party data available to the service.

But how would we know whether somebody knows something about us that may taint our image? On the Internet and the Web we are well aware of the effects of fear, uncertainty and doubt. In the analog world there are two factors helping our data subject: Paper has to be handed over and the spreading is limited by the paper factor. So there is a willful act in some way. The issue people fear most with the new devices and the networked society is that human senses are not able to detect what is going on. This can lead to the fear that everything done is assumed to be known. This has scientifically proven to cause disastrous effects on the self-confidence with which one dares defending minority opinions.

In this case, transparency is key as the user should know that information released is correlated with third party information. But if a service would admit, only few users would still release data as the fear of the unknown and incomprehensible would press them into cautious behavior.

3.4 Equity in decisions

The equity aspect is less one of disclosure. How to make sure information is accurate, where accuracy is needed. Have you ever been put on the no-fly list or on the list of suspicious people by accident? How to get out again, if being on the list is by no means justified?

What about automated decisions concerning real people that are taken automatically and solely on the basis of the available data? They can be so wrong and occupy the data subject for a long time with rectification tasks. The EU answer here is subject access. An open question remains whether the industry will take on the laborious task of defining an API for subject access, which in turn needs good identity management not to convey the information to the wrong people.

The question is most prominently, in which situation do we need accurate information and in which situation accurate information can even be harmful. In geolocation, we need accurate location information for rescue, but we need the possibility of fuzzy or false information against tracking by other people. In geolocation, we additionally have the issue that exact location may allow for physical access to a person with potential great harm. How to distinguish in your device between the malicious and the helpful third party tracking my trails?

Technically, this is very hard to find out, but there is a very simple solution: Ask the user! Under normal circumstances the user will know.

3.5 Fraud

Everybody talks about identity theft. Like network neutrality, identity theft is an umbrella term of many aspects already described above. One aspect is security. Identity theft is more or less a preparatory action to allow for other more harmful actions. This includes draining accounts, ordering goods, getting administrative permissions and many more. Identity theft has more to do with Security than with the initial Privacy goals stated in here. In fact, the remedy proposed today against identity theft is the promotion of strong identity systems. But such systems would make Privacy harder as mostly, only one identity (the governmental one) is proposed. This one digital identity is then used in all the roles a human has (online and off-line) and will foster information spill-over from one role to another with great potential of harm. The goal of strong identity systems is mainly to protect the service against financial losses if a user can prove that the identity given to the services was a fake.

But there is also a psychologic aspect to this. We have seen that people build up their reputation with a lot of effort. A third party will be able to abuse that reputation, destroy that reputation by faking identity. Not only since Freud we know that the me is important to one self. Another person being me can be quite disturbing thus again influencing our autonomy of action and decision.

4. Who is responsible?

The ever repeating discussion is who should take the burden of implementing measures for the protection of the user: The people making sites or the developers making browsers?

4.1 Browsers

The argument was often heard that the browser is just a tool. This tool would only mediate the interaction between a services and a consumer or user. It is just an API after all.

This argument runs short, because the browser and/or API, by its features and possibilities, determines the framework of information exchange between the service and the user. (See code is law from Lawrence Lessig for the full development of this idea) Consequently, the browser can open new fields of interaction and DAP is the best example for this. As the browser defines the feature matrix, it has also some leverage concerning the limitations of features. So the browser can set hard limits by defining a reduced interaction matrix.

Consequently, the question to answer for the browser is e.g. does it make sense to allow third party to access users' address books without user interaction? If people say no for privacy reasons, the browser can implement a hard limit not allowing access without interaction. (Whatever interaction means here)

But the browser can't control exchanges within the defined matrix. As we know from the discussion around double use goods (think crypto e.g.), within a defined environment of interaction, there is good and bad behavior possible. In this context, a browser can only help people. Even within the defined matrix of exchanges, bad behavior often follows recognizable patterns. A browser detecting such patterns may ask the user if this is ok in the given context. If the browser is a tool, who would want a car where they forgot to plan for brakes? But browser or API will be unable to control such behavior as it would impose the developers will on the user.

4.2. Services

Services are first in line when it comes to compliance with regulation. They have a difficult balance to strike between the quality of interaction, income by profile building and user confidence by avoiding behavior that is seen to be harmful.

As services use the framework given by the protocols and the browser, it is much harder to find technical solutions within the large variety of interaction possibilities. Therefore, best practices and technical solutions to implement them may be a good first step.

Agreements are very important in privacy. They are of the of of an informed decisionthe proof of an informed decision. A services promises to do something with data and a user agrees to it. This is the core of data self determination. One of the issues today is how to record the agreement of the user in an easy way and store it such that it is apt to serve as evidence in case of dispute. Such things can be achieved in a rather lightweight way using timestamps and other means that make it hard to change the data recorded for agreement on both sides.

Services bear the responsibility to inform the user about the usage they will make of users personal data. The more information a service provides in machine readable form, the more a browser will be able to help the user understand what is happening; if the browser cares about this information.

But services also interact with other services. A service interacting with a consumer may subcontract another service to help fulfill the contract. An issue is how constraints from the context Service1 - user will be transported to the relation Service1 - Service2. There are challenges for interoperability to overcome. The API definition can have a secondary effect if data types are defined and re-used in the context Service1 - Service2.

Created by Rigo Wenning (rigo@w3.org), last update $Id: privacy-ws-37.html,v 1.3 2010/07/09 22:08:14 roessler Exp $