The Transparency Paradox: Privacy design strategies for open information networks

Extended Abstract*

Daniel J. Weitzner
Technology & Society Domain Leader
World Wide Web Consortium

Principal Research Scientist
MIT Computer Science and Artificial Intelligence Laboratory

The Challenge of Transparency

The challenge that new computer, network and sensor technologies pose for privacy is now beyond dispute. Could it be that the only way to protect fundamental privacy values is through greater exposure of personal information? After thousands of years of code-making, cryptographers learned that security by obscurity is no security at all.[1] While there are important differences between security and privacy, could it be the privacy by obscurity is about the go the way of security by obscurity? This paper suggests that system architects, privacy advocates, and government regulators ought to pay increasing attention transparency-building strategies as cornerstones of privacy protection in social spaces increasingly infused with data gathering and analysis capacity. Transparency as a primary privacy value, as opposed to, for example, collection limitation and minimization, represents a not-so-subtle change in approach from traditional privacy thinking. David Brin[2] is best known for suggesting that we embrace transparency. His work has been treated with considerable skepticism in the privacy community. Fundamental changes in the technology we chose to adopt, as well as limitations in privacy protection regulatory frameworks compel us, this paper will argue, to take a more careful look at what transparency has to offer.

The Motivation for Transparency

Three technical phenomena should encourage privacy-sensitive system designers to rethink their approach to privacy protection: first, the gradual demise of stove-pipe applications in favor of enterprise-wide data integration; second, and the rapidly declining cost of web-scale query; and third, the rapid spread of sensor networks in both public and private settings. While the legal limits on intrusion have not changed very much at all over the last thirty years (even with the Patriot Act), the actual ability of both police and private sector data collectors to learn more about our private lives has increased dramatically. In addition to legal norms,the level of privacy protection we experience has always been determined by a combination of what is legally permissible and what is technically possible. The dramatic privacy impact of cheap, web scale data integration is visible in our lives today though the operation of existing systems such as credit card fraud detection networks, location-aware vehicle guidance and telemetry systems built into may cars today (OnStar), transponder-based toll collection systems that also seem to monitor traffic flow, and the proximity cards tied to individual identity that are increasingly common in new office buildings. The United States Defense Department attracted a firestorm of criticism with its efforts to build data networks that would facilitate large-scale profiling of individuals in order to identify potential terrorists. This Terrorism Information Awareness program directed the public's attention at the vast new surveillance and analysis powers that could be in the the hands of the government. Responding to public fear, the US Congress required that DARPA cease work on TIA. While some privacy advocates claimed victory, it's worth looking more carefully at what was and was not actually accomplished: some suggest all that happened was to force the development of this particular sort of surveillance technology out of public view and beyond the reach of congressional oversight. A final irony in the TIA debate is that the main TIA-originated research activities not funded any more at DARPA is the work on privacy protection.

Pervasive inferencing (and intrusion) envisioned by TIA is with us already, regardless of whether DARPA continues to fund it or not. When I receive a call from my credit card company inquiring whether or not I really intended to make a $3500 cash advance in Atlantic City the previous evening, I experience simultaneously the pros and cons of cheap, large-scale data integration and analysis. The credit card company has been able to establish a detailed and intrusive enough profile of me to know that I was not likely to be gambling late Saturday night, resulting in preventing of a crime, but also the creation of a highly invasive picture of my life. On the roads, hundreds of thousands of people implicitly consent to location tracking through systems such as EZPass in exchange for saving a few minutes and a few pennies at tollbooths. In many cases, individuals are compelled to use invasive technologies such as proximity/RFID cards for building security as a condition of employment or participation in various institutions (from universities to health clubs). Beyond convenience or institutional pressure, public safety will be a compelling justification for scaling up data collection. For example, in the aftermath of five weeks of attacks by snipers John Mohammed and John Lee Malvo in the Washington, DC area in 2002, the more than twenty police agencies involved in the joint investigation realized than by the time several days has passed and five victims were killed, those many different and disconnected police investigators had run his license plate over ten times. Given the current state of their disconnected systems, the police could only learn this after-the-fact, but basic enterprise data integration techniques could have spotted this very really pattern on day eight, before the next ten victims were killed. No one would responsibility claim that the coincidence of spots established guilt, but it certainly raised suspicion. The grim irony in this case is that such pattern analysis would be perfectly legal, but wasn't done. One can only expect that the public, despite misgivings about loss of privacy, will at least come to tolerate, if not expect, that this sort of analytic capability will be deployed.

In the past, stove-pipped applications (which limited data integration) and the formerly high cost of query in networks of data were amongst the best friends of privacy. But in an era where data integration is becoming easier across applications, and web-scale searching is offered free to the public by Google and others, we can no longer rely on technical impediments to privacy intrusion. This paper will argue, though that the one thing we must NOT conclude in response to these hard problems is that privacy has been somehow superseded by 21st century information technology. Indeed, the increased data collection and inferencing power in today's information environments makes support for fundamental privacy values all the more important.

Legal privacy protection options

What tools do we have to help manage the growing transparency of personal information on the Web? National laws of most democratic countries and international human rights treaties do contain baseline support for many important privacy principles. But especially in the realm of law enforcement access (wiretapping and other electronic surveillance) the basic approach of privacy laws is fundamentally inadequate to address the privacy-intruding power of today's interconnected data networks.

Most privacy laws place heavy reliance on limiting collection of information. The European Union Data Protection Directive (considered the global gold standard for privacy protectiveness) requires that private sector data collectors limit collect to those data elements that are necessary for the original purpose of the transaction. The United States Constitution's Fourth Amendment also regulates what information (aka evidence) may be collected by limiting the evidence gathering process to searches or seizures that area 'reasonable.' However, neither the Fourth Amendment nor wiretapping statutes impose any limits on what the government can do with information once it has been collected. In legal terms: there are collection limitations but no use limitations. From a practical standpoint, once the data is gathered, any kind of inferencing at all can be performed on the information collected. [3] The weakness of current privacy law is even more pronounced when you consider that much of the revealing transactional data (bank records, credit card bills, logs from location-aware sensors, and most other sensitive data help by third parties) is available to law enforcement (and often to the private sector, too) simply upon request. If the police believe its relevant to an investigation, all they have to do is fill out a subpoena form and get the data. There's no judge or other impartial body that judges whether the data is actually required. What's more, in the post-9/11 world, it is easier for law enforcement to supplement analysis of evidence with publicly-available data. As privacy laws are mostly targeted at limitations on collection of data but leave inferencing power largely unchecked, we have a serious gap.

Technical options for privacy protection

Are there technical tools that we might use to supplement the gaps in existing law? The cryptography and computer security communities have historically devoted much attention to application of anonymization, pseudononymizaiton, and de-identification techniques to privacy problems. Recognizing the huge leap in inferencing power and the corresponding gap in regulation of inferencing, some have recently proposed using secure, private multiparty computation (SPMC) algorithms.[4] These complex cryptographic techniques allow the distributed evaluation of a function by a number of parties. They are currently the focus on much privacy-related research and touted by some as the solution to the daunting privacy challenges associated with data mining. Most relevant for our purposes are functions that query a series of databases looking for certain patterns that could indicate terrorist activity, fraud, or other behavior of concern whose signature could be hidden in a large collection of distributed databases. SPMC's cryptographic magic allows the function to be evaluated based on data held by each party in a manner that also enables each party to keep its own data secret from the others. Applied to the law enforcement context, these techniques are offered as a means of allowing a far-reaching inferences through public and private data stores in a manner that nevertheless limits the amount of personal information that available to the search agents. In this scenario, once these wideranging inferences turn up suspicious data, then the personal information for those records alone would be turned over to law enforcement. How far does this go in closing the gaps in the regulation of powerful inferencing? In tightly controlled applications, these algorithms do appear promising (modulo the fact that leading cryptographic researchers say there's still a large amount of mathematics and engineering to do in order to make this work.) Properly deployed, these algorithm's could help assure that distributed database queries succeed while limiting the disclosure of personal information.

However, the privacy protecting potential of SPMC may be limited when used in the real world across the Web. SPMC depends on operating in a closed universe where the functions to be evaluated are agreed upon in advance. Imagine if the Department of Homeland Security launches a series of queries to major airlines in order to spot terrorist activity. From these queries they may receive a list of 100 names to add to their watchlist. At the same time other agencies may also launch queries to an overlapping set of databases containing a variety of location-revealing sensor data. If those agencies compare notes (as they are now allowed to do under the post-9/11 USA Patriot Act) they may well be able to infer information that the original SPMC-protected queries were designed to keep private.

Active Transparency - A Way Forward

So where does this leave us? Are we simply exposed and unable to secure basic dignity and control over our identities? Has the analytic power and data-gathering reach of today's information networks rendered privacy a disappearing artifact of simpler, less-networked times? I don't believe so, but in order to retain the dignity, control, and occasional solitude that are at the heart of privacy we have to start designing systems differently. One common response from privacy advocates is to suggest (or demand) that certain systems just should not be deployed (MATRIX cross-jurisdictional law enforcement database)[5] or that privacy-invasive applications be redesigned (Google's gmail)[6]. This application by application effort looks like a losing proposition. The problem is that data is available. This will only more and more the case in the future. This paper will suggest instead that we concentrate on building positive capabilities into both laws and systems that enable greater accountability for privacy abuses. This paper will explore technical, legal and social direction that will help direct the increasing transparency of our information spaces toward privacy-protecting ends.

First, we should embrace transparency as a design philosophy that can help people to assure that information about them is not used contrary to legally-permissible purposes or in violation of agreements under which it was collected. Today, most network applications do a terrible job of providing transparent access to personal information. There are several areas in which technology designers can help provide active transparency to users. The paper will develop functional requirements in the following areas:

Exposure metrics for highly distributed networks: Privacy risks increase as barriers to datasharing break down. When people chose to participate in any data collection environment, whether sensor-based or otherwise, they should have tools that help them assess how this marginally-greater data collection about them will change what is publicly or semi-publicly known and inferrable about themselves.
Trusted papertrails: in order to monitor the actual usage of data, as opposed to just it's initial collection, we need tools that track data from collect through systems which it is process. These systems can be designed to provided verifiable reports to authorized parties describing how information is actually used.
Tools and techniques to help responsible data collectors remain true to their policy commitments: The explosion in collection of personal information is certainly a boon to those who would use (or abuse) it. But it also has downsides. Well-intentioned holders of large amounts of personal information have no practical means of tracking the various policy commitments under which different pieces of personal information were originally collected. Systems that support self-enforcing policies would go a long way (though not all the way) toward helping honest data collectors to honor their privacy commitments.
Personal privacy audits of how information is used: The right of access (individual participation) has long been a cornerstone of privacy protection. Though used with some success in the realm of consumer credit reports, this is basic privacy right is a critical part of active transparency. The right will be of little use, however, without systems that actually enable people to monitor the usage of personal information.

Second, protecting privacy in an increasingly transparent society will only be possible with privacy laws that are re-tuned to reflect the great expansion of data collection and inferencing capabilities. As demonstrated above, our privacy laws are directed more toward limiting the collection of information, than to controlling what uses data can be put. The paper will explore options for use limitation regulations as a means of assuring basic privacy values in the fact of increasingly exposed lives.

Finally, living with transparency will be a challenge. We aren't likely to get either legal or technical measures right the first time. Hence, we must devote resources to a wise and sustained dialogue between regulators, the citizenry and technical designers.

Notes & References

[*] Accepted for presentation at Location Privacy Workshop: Individual Autonomy as a Driver of Design

[1] Kerkhoff's Principle:"Assume the (unauthorized) user knows all ciphering procedures", Kerkhoff (1935-1903)

[2] Brin, D. Transparency Society (1998)

[3] Center for Democracy and Technology, Privacy's Gap: The Largely Non-Existent Legal Framework for Government Mining of Commercial Data [pdf], May 28, 2003

[4] for example "Secure Multiparty Computation of Approximations (Extended Abstract)," in Proceedings of the 28th International Colloquium on Automata, Languages, and Programming, Lecture Notes in Computer Science, vol. 2076, Springer, Berlin, 2001, pp. 927-938. (w/ Yuval Ishai, Tal Malkin, Kobbi Nissim, Martin Strauss, and Rebecca Wright)

Short Biograph:

Daniel Weitzner heads the World Wide Web Consortium's Technology and Society activities. He is responsible for development of technology that enables the Web to address legal and public policy requirements, including the Platform for Privacy Preference (P3P) and XML Security technologies. As a leading figure in the Internet policy community, he was the first to advocate user control technologies such as content filtering to protect children and avoid government censorship. These arguments played a critical role in the landmark Internet First Amendment case, Reno v. ACLU (1997). In 1994, he won legal protections for email and web logs in the Electronic Communications Privacy Act.

As Principal Research Scientist at MIT's Computer Science and Artificial Intelligence Laboratory, Weitzner teaches course on Internet policy and technology design, and is a founding member of MIT's Center for Information Security and Privacy. Weitzner was a member of the National Academy of Sciences committee on Authentication Technologies and Their Privacy Implications.

Previously, Mr. Weitzner was co-founder and Deputy Director of the Center for Democracy and Technology, and Deputy Policy Director of the Electronic Frontier Foundation.

Weitzner has law degree from Buffalo Law School, and a B.A. in Philosophy from Swarthmore College. His writings have appeared in the Yale Law Review, Global Networks, MIT Press, Computerworld, Wired Magazine, Social Research, Electronic Networking: Research, Applications & Policy, and The Whole Earth Review.

$Id: loc-priv-transparency-extab.html,v 1.14 2004/05/07 18:33:20 djweitzner Exp $