Minutes: Privacy and data usage control

04-05 October 2010
Hosted by

MIT, Cambridge (MA)

Minutes

I. Setting the scene

Introduction

(Lalana) Three issues for discussions in this workshop

* what kind of technical solutions are there for privacy, what can we do with current technology * challenges: what are issues that stop us from having perfect privacy * what are concrete next steps we can take now to have reasonable privacy in future

When presenting, please try to think of broader topic of workshop

Frederik Hirsch (Nokia): Putting device API privacy and policy into user context

W3C DAP (device API) working group is developing APIs to interact with devices, access contacts, notes, camera images,... don't want to slow down technology that makes money waiting for privacy don't want explicit privacy APIs, rather have users give permission implicitly by their productive actions, e.g., permission to call number in contacts because user picked number

David Chadwick: what about malware apps? the app can ask whether user is okay with one thing, but app will do different thing

we're good at APIs, but need help in policies have XACML-like language (more compact than XACML) to define permissions privacy is more than access control alone e.g., it may be fine for me to access someone's contacts, but not to publish them all over the Internet tuleset proposal under consideration at DAP focuses on sharing, secondary use, and retention, practical and pragmatic, see W3C Privacy Workshop July 2010 open question: at browser level or at applciation level? timing is a ocncern: privacy work is lagging on API development

Frank Wagner (Deutsche Telekom): Group privacy

synchronizing address/contact informatoin from social network onto cell phone linking phone book entries to Google Street View

Q: if you're concerned about privacy, why then not block your address and phone number data online altogether? A1: social decency: want to have family&friends to find my place for a party, but don't want any stranger to show up A2: have to attach policies to data A3: but can always cheat, do copy-paste to avoid policy; need more "embarrassment" in case of breaking policy

Q: Any sort of auditing that data stays on data center in germany, is not shipped into the cloud? A: Yes, covered with contracts. But for cloud is not audited, as processing personal information in the cloud is simply forbidden by german data protection agencies. Jacques Bus: Data protection agencies are aware of cloud evolutions and are changing their positions accordingly. Rigo: in spite of legal provisions, there was data loss with call center of Deutsche Telekom in Turkey, had to get their data centers back in to EU becaues of it

Betsy Masiello (Google): Google privacy

Betsy unfortunately had to cancel due to sickness, Frederik summarizes paper DRM has been very unsuccessful on data stored in "stock" (see DVD protection, Sony's rootkit) but has been quite effective on "flowing" data (see music subscription services, Youtube,...) maybe have to go same way with privacy

Rigo: music services systemetically IP addresses in log files -> are IP addresses personal data? Swiss court recently said it was Jacques Bus: in EU depends on context

Gregory Neven: don't think "flows" were meant as logfiles, rather as streaming data, and applying policies as data streams, enforced by client that (dis))plays stream DRM has failed spectacularly for any data that finds its way to user's hard drives (mp3, divx), but quite successful for streaming services (youtube, spotify) but wonder how can be applied to very compact, non-redundant data, where even typing over a user's phone number is possible Jacques Bus: can't expect thta will be technically completely closed, will always be possible to avoid need accountability and redress in court, rather than just technical solutions Frederik Hirsh: agree with Greg's interpretation of paper drawing out map of stocks, flows, and attacks is very interesting exercise should do the same thing for a privacy use case, even for a simple example

Discussion

Thomas Roessler: even if technology is in place, can't do it without support from a social context Lalalana Kagal: use cases here mainly about enterprises violating privacy how about private users, e.g., cyberbuddy? JC: privavcy policy cannot prevent misuse, can at most mitigate damage, but main goal is information: tell data controller what is allowed and what not David: design of privacy framework is hard on technical level, but even way harder on social Frederik: makes sense, but process could take long time look at PKI but what happens in the mean time? seem to have a timing mismatch ?: key is design based on principles ?: e.g., from header in email message nothing to prevent fake name in from header is one of problems when prosecuting spammers we should specify standards to carry policy info (e.g., this bit set means you shouldn't share it) as "please don't" ethical standard -- even if you have technical means to get around it maybe setting ethical standards is easier than preventing abuse Frederik: does it make sense to put sth in a standard if they don't want to deploy it David: just make it required, so you're not compliant if you don't Thomas: important meta-point: legitimacy of standards comes form consensus process ethical standards become important within that consensus if business interests are not in line with ethical statements, then it will be hard to get it through in technical standard have seen it in Geo privacy and device API Frederik: how about not making it mandatory, but make it optional, informative, but it is in standard maybe that changes expectation that people have about how standards are used Jacques Bus: be careful with words like "evil", "ethical" very culture-dependent, but need global consensus, cf. passports

Rigo: irrational reaction of society about privacy issues e.g., Germany's hysteric reaction on street view Frederik: exists open-source facebook alternative, funded by donations maybe this is how will fly: people see problem and build alternative from seperate grassroots Thomas Roessler: W3C wokring group for federated social web is bringing out report this week Jacques Bus: EU also got some SNs together on agreeing on set of rules decided that most are reasonably compliant, probably means that rules are not very strict ?: global compliance, or EU? Jacques Bus: rules are European, maybe also used globally ?: website with 3rd party advertisers that don't take privacy seriously user could sue website for hosting ads from malicious website and offering API to it if judge would say that website is indeed liable, that would be great help Thomas Roessler: would be wonderful ad network tracking geo location there is technical tweak: keep origin of top frame ad from evil.com on a.com will ask "can I show to evil.com"? is something that the browser could implement; question is what granularity is implemented ?: businesses need accurate geolocation for their business model if we ruin their bus model, won't work have to offer ways for them to adhere to privacy that doesn't ruin their business Jacques Bus: would be like accepting that we have to accept child labor in certain countries because that's their business model don't agree with point of respecting any business model

Thomas Roessler: on Twitter can click GeoLocation voluntarily, and can even change location as opposed to Facebook contact data, where I can upload my contacts to Facebook, so that Facebook will send emails to those contacts that are not on Facebook yet I'm creeped out by harvesting of consumer data, either on existing consumers, or about prospective consumers Frederik: exists startup that scans the web for your facebook, linkedin accounts when you're applying somewhere they protect your employer from legal persecution Thomas Roessler: questions by Mark Rotenberg: "if you're applying for a job, do you want employer to look at your profiel?" and "if you're hiring, will you look at social network profiles of candidate?" both answers get lots of "yes" answers

II. Keynote by Jacques Bus

Privacy in the Digital Environment: The European Perspective Jacques Bus

talk at a much higher, far more abstract level "I'm not a techie and I will never become a techie"

Maybe he was right (only five computers in the world market), just five or so clouds Companies will manage identity and other concepts normally the prerogative of the state?

what kind of tools do we need to enable person-to-person trust? trust on systems as well as people

creating trustworthy systems (reliability, transparency, etc.) services and tools for proper authentication/identification "solutions need an interdisciplinary approach (Web Science)"

Mireille Hildebrand working definition of privacy: "a reasonable measure of control on whether and to what extent one can be 'read' by what others in what context"

Technology & Innovation / Policy & Regulation / End-Users & Society

RISEPTIS research group, published a report of recommendations "a techno-legal ecosystem" research programs for Europe must be driven by consensus

industry consortia, with some involvement from academia

international jurisdiction, all about cultures which are all different start multi-lateral and build future alliances Lessig "code is law" [2nd mention today] implement part of the law at the level of process compliance, which requires sufficient transparency and accountability you can only do technology by also thinking about law and society; that will take a long time

EU Research in Trustworthy ICT -- large research grants "projects must ensure strong interplay with legal, social and economic research in view of development of a techno-legal system that is usable, socially accepted and economically viable." although proposals did try to take this into account, they tended to be soft and not highest quality

"can't make a comprehensive presentation in this field..."

q: you describe this as a complex problem involving social, technical, legal, etc.: what needs to come first? technology solutions to be adopted by policy makers, or vice versa? jb: needs to be an iteration -- introduce liability into software/technology, but this is very difficult to implement got to the point of 10 principles of responsibility, but politically couldn't move forward lots of lobbying to keep the only way to regulate a market properly is to create liability that's the one thing to be done first -- identification another thing that needs high priority would take something like 15 years to come up with a European framework for identity management

lalana: suggestions for how to provide transparency jb: a priori assurances on the systems that companies are using, a regular report which could then be audited based on regulation to begin with, which could provide a stimulus to technologies to help with these problems define metrics with which companies can make assurances technologies need to be prepared for that

frederick: controlling inferences that will be made with information, what is the intent? jb: control might be too strong, but you would want to know what kind of inferences are being made prohibit certain combinations or types of inferences that companies might regularly make we should at least know as much as possible which inferences are made and what the consequences are

q: W3C interface a lot with implementers of the tools; a common refrain is that we won't build anything into a product until we see user demand; how do we create/demonstrate that user demand? how do we get that iteration started? jb: one answer is to create regulations and laws to kick off that process in EU, data protection regulations led to enormous development of technical systems to try to be ready for those implementation dates

frederick: different parts of the world, how does this play out internationally? is this just an EU focus? jb: the european privacy commissioner believes that there is not much practical difference in privacy in EU/US discussions ongoing between the two are rather positive Japan kind of copied the European framework, though they may not understand how it really works just wanted to be sure that their market was assured (and probably the American market as well) identity metasystem paper (Kim Cameron), which comes from America although Kim considers himself European could work at this meta-level first, where there is sensible agreement across continents

III. Privacy Annotations

Privacy annotation challenge: choose a language/semantics and debate whether its a lightweight/heavyweight based on use case.

Privacy controls: Which tool is required for which requirement/functionality/situation ? Range of tools are required. What can we do to develop a suitable environment to get tools to flourish.

Data Envelops for Moving Privacy-sensitive Data

Private data envelop to allow: The data owner to specify which actions are authorized on each of data entity

The presenter talked about definition of PDE (Privacy data envelop)

Q: Signed the PDE?

A: It could be signed if you intend to do policy enforcement

Q: The use case is focused at enterprise level, sending content between departments

Q: when you passed the first group to the second 2, would the system implicit represent the possible choice, enumerate all possible implicit choices?

A: You could specify a “group” that cover a subset of people, and the PDE policy will apply to these people Simple policy negotiation for location disclosure

Nick Doty & Erik Wilde -

Websites do not proactively tell you how they gonna use the location data Presenter talked about GeoPriv and its pros & cons

Proposal:

Sites specify a range of policy options that fit their use case.
Users choose (potentially automatically) from these ranges.
Negotiated policy is returned attached to user data.
Four fields
1. Precision
2. Sharing
3. Retention
4. Usage

Q: policy semantic to be serialized in json?

Q: Is the semantic in this proposal enough? Too specific? Usability?

Discussion: the general framework force you to use all the semantics provided? Usability on the developer’s side or the user side?

Q: Where do we enforce the policy? The technology (browser) should not give away location if the website does not give intent upfront to the user. Two points: Choose the language and semantic for the appropriate challenges

Privacy controls: Two points - To marry meta-approach to control approach - Importing heavy way policy framework to - We need more tool for more cases

IV. Privacy Controls

Fuming Shih, Towards Supporting Contextual Privacy in Body Sensor Networks for Health Monitoring Service

Qs: How do you express contexts?

Ans: Using attribute and value pairs. For example: time and space of an activity. At a particular time and space, a signal like heartbeat has been captured.

Qs: Do you plan to use P3P for access control (for medical data especially)?

Ans: Haven't dealt with access control currently. There are existing papers that extend p3p to include context information into P3P (developer and user both can be integrated).

Qs: Access control involves multiple parties such as data subject, substitute decision maker, long term care home. physician to access care data. There are different languages to express different roles which is challenging. They in fact try to get rid of information very quickly rather than store them. How do you handle that?

Ans: We attach policy to data. Need a UI for physicians to view and regard the sticky policies. They can use the component to view and make use of context of the data.

Comments between the two presentations:

We have sticky policies lying around. What does it mean to liberate the sticky policies and integrate the policies? There are challenges and need a concrete implementation. Need to know where semantics come from (choose language and semantics) and determine how heavy weight the system should be (varies by use case)?

Michael Hart, Prevention and Reaction: Defending Privacy in the Web 2.0

Qs(by speaker): Name some popular tags?

Ans (by audience and speaker): secret/private and relationships/love.

Qs: What do you do with a scenario of a post with many tags (keywords)?

Ans: If we have policies with 2 tags we could either take the intersection or union of privacy policies. Not sure if a user has preference for one or another. From user studies, the author learned that taking union of the policies is the best way to go. Another interesting situation to investigate is the sensitivity of a particular tag. It could play a role about which accesses to give to a particular person. Eg: One-tag post/policy could be more sensitive than one with multiple tags. Eg: religious views could be more sensitive and so the policy for religion should super-cede any other policy.

Example scenario: when I invoke a tag by applying it to a post and want to keep it from becoming public, I can limit it to certain groups(friends, people I know and others).

Qs: Tags are things being described to the system. Extra data becomes available than was intended (especially with respect to privacy). The general idea of tags is that they are ad-hoc, context dependent, personal decisions on very little details that you don't want to express anywhere in writing. [Comments?]

Ans: Yes, I agree. We have implicit assumptions that the user is going to share certain information with the content provider (Eg. blog). Users expect that the providers will respect their privacy (especially in this country). If I have policy, am giving away information. Authentication and authorization will provide trust. We need general rules plus tools. People don't have mechanism and want to achieve these goals. They don't understand impact of their decisions. Here, we're at least giving them an option about tools to connect with their readers. Agree that it's better for users to have their own publishing warehouse and push data out thmeselves. However, that's unrealistic.

Comment: We need a third category (in addition to disclosed and incidental data) - contrived/constructed information. Have a page/ meta tag. Eg: Newspaper articles generally have a heading (such as, advertisement) so that people are not misled. On the Web, we need a way to identify malicious content and fake pages. Create a protocol that says "This page is ...". In other words, categorize the free speech

Comment: What are incentives for people to blog?

Qs: Incidental data lies along a spectrum. Challenges along spectrum (one end - child pornography). Companies spend money fighting it out. Information being displayed on commercial space like Facebook. They can't just walk away because they're in the middle of commercial litigation. Is the Google's case in Italy fair? an outlier? These days people are more cautious, checking insurance policies and so providers won't necessarily become advocates for end- users.

Ans: It is very difficult if users have to be their own advocate/ forensic analysts. They don't have the necessary tools like a private investigator or HR manager. This is not a situation of blaming search engines or aggregators; they're the challenges of the current technology.

Qs: Light-weight vs. heavy-weight mechanism?

Ans: Our research focuses on lightweight scenario. An end user wants to access policies immediately. The research shows a way to map language to complex rules. However, legal aspect/lightweight language will not cut it. For privacy legislation we need a more rigorous, heavyweight process. Eg: facebook user to tag every piece of data

Comment: A way to tag 'private' and reduce access. Need software to make environment react. Eg: post something and if I want to remove after a few weeks, the system should not make that data available anymore. Can do it using config/text file.

Qs: Can we have tags more private than others. Eg: in a blog, give access only to those already commented on my blog?

Ans: Difficult to conduct such an experiment or see that behavior in a user study because of two reasons: 1. Getting human subjects willing to participate 2. Have people submit their data to us with the cooperation of the platforms they're using.

Qs: In Flickr, we can have policies about co-worker and relationship. What about a situation where you want to restrict colleagues because the tag says 'relationship'. However, you also want to give to colleagues access because it contains co-worker?

Ans: The research deals with blogs which involves a simple model. Real world involves interaction between people and personal lives. However, the tag based ontology is simple. This is an open question - how to handle situations where policies are at odds. One probable solution: sensitivity. System should err on side of caution and infer which tags are more sensitive than others.

Qs: What ontology do you use? How do you express it?

Ans: Currently implemented as plug-in of Wordpress. Use simple predicates that are stored in database. Policies are basically an association between tag(s) and person(s). Mapped rules onto posts so that you can see what rules apply and create policy. The policy in turn says which people have access. Based on the list of people, can create access control and rules for anonymous access.

Qs: Do you have a predefined list of people that you know upfront?

Ans: Yes. How to make connections between unknown people is a separate issue. It is more concrete for users to create rules for people they know. We can have a special rule for anonymous user(s).

Comment: Right now, social networks offer one of two options - public vs anonymous. Forced to make data public that leads to disastrous consequences. One solution could be creating a different channel to post to. People who are genuinely interested (those who'll respect their privacy) in the blog will be able to read it. We can then have a hand shaking protocol.

Points to conclude the session:

Metadata approach versus controls approach
Aren't we importing the heavy weight identity management because we want to allow people to connect to people they don't know already.
Threat model for privacy - Many types of tools needed

Suggestion:

Find which tool suitable for which situation. General policies and negotiating can be done with tagging. Especially the health scenario we have to give data to the doctors and other people. Have simple policies and create an overall framework to do decisions and add simple tools. We can give tools to a user and let him use them as he wants. We can't do anything more.

Create an environment where tools flourish - content can be shared with a group of people. Danger: rules are created and then applied to everybody. Eg: scenario where browser displays commands is bugging. Decisions are then made for you (this can be against your own privacy!).

V. Privacy by design

Question 1 /\/ 7 embedded privacy principles. Very abstract principle. One issue - 7 principles, go to HTML5 working group. Do we expect SSL by default? How do you rephrase it.

I'll give you the first question I can ask. What a second, we're developing something that has a few category errors on the side... the set of things/associated with a particular origin ... control/data can flow ... we know how to do all what does this mean for the design of the system as opposed to the design of the application level? What does it really mean?

Presenter Take any of those - mapping against fair information principle, where are the FiPs. Map those two. Secondly, go into what does this mean. If privacy embedded by design - it looks like this and that.

One when we work with the future of privacy foundation - if smart grid , has all attributes, inclusive of all aspects, all CIOs, what does it look like? Privacy would look like this. Try to take broad subject areas like survelliance - where the attribute is - in the case of not with the end user, the company has to take the high level, they have to self answer the question, invoke help from privacy commission.

Follow up - the examples you describe are systems that are designed for specific purposes by a fixed set of actors, for a fixed set of applications. Trouble we are having, a generative system, used by anyone in this room by anyone for any use case. What does Privacy by Design mean for a generative purposes.

Take any of those - mapping against fair information principle, where are the FiPs. Map those two. Secondly, go into what does this mean. If privacy embedded by design - it looks like this and that.

Answer Notion is to have an idea this matters. Is there a thing like privacy, what are the attributes, what can you build, what can you fix. things move - keeping it in mind think about properties and elements about identities. How can you account for it? Is it hard yes but we don't give up. No perfect solution. We commend you.

Generative models have no ability for end-to end. Up to end-user to be worried about privacy. Not necessarily be able to imbue privacy by design.

Is it really true?!?!?!?! When creating an API - we have a list of use cases, drives the design of use case. Not saying only about these ideas. More complicated solution that cost associated not have meaning for everyone. There are a multitude of cases where not a concern. 80/20.

Rigo - consideration for privacy use case and functionality in web. Can I have some control / can it find data about me? Take functionality into account, have privacy by design. 80 percent don't need it. Importance to topic put it into 80 /\ 20. HTML5 say "we have this new kind of thing - we integrate, your phone has schedule, tells you to stop talking". Someone complains about privacy and the functionality is not on their radar. We need a change of philosophy apprecation for topic. It's hard to do privacy - but it's worthwhile.

Counter point - The critical point - functionality we add does not make it necessarily possible. I'm struggling - i'd like to find a way to effectively design privacy protection into platform /\ nudge towards more privacy and security /\ keep running into brick wall into of actually using those features, of implementators. A universal system that will do something

Issue we are not one monolithic system. There will be browsers that run on street. Best practice to instruct how to get the brake, not making happen.

In the health space - union of nurses, nurses want to do the right thing - not fault of the nurse when things happen. Give context - will play if the context. Go to all the players to see how it all facilitates. Early bits for car

Practical perspective - now you have privacy related product - it was expensive, what we now do is during dev, have a rule set. How can we implement nice privacy enhancing tools into the tools. Where can I buy it? Is it ready to run? Create privacy into product. We think that is OK, transparent to the user.

Survelliance -> can only reinsert pictures if you have a warrant. There is not a forced market. A lot of biometric encyrption - true biometric encryption market is not ready - it works. Audit trail work wanted in health care legislation. Doing this tracking to find inappropriate usages. Making a lot of money

What's the next thing - what kind of software engineering am I applying it to?

Two or three ways. Organizational work - People spending time understanding the approach used by company to implement. How do I make it more like an ISO standard. For the electrical smart grid. Work together with NIST etc. Detailed standards and performance standards to do these other levels of work. Like an environmental group, have to come up with vocabularly, then the standards. Not possible to take it off the shelf.

Should we accept a dispute - geolocation? Cannot achieve it peacefully.

When you think about doing e-passports with biometrics. At all different levels, identify and elicit usage of tools. Problem is we have several players we do not trust. In the design of the technology, the # of key players will not follow the protocol.

Ability to trust than verify.

Why not look at software/security development? How did we get this implemented from op dev standpoint. Take cues from lessons learned.

Do bit by bit.

Experience - if we go to real technology changes - you are washed away. It's very easy to pushback in violent way than make it happen. Privacy guys should how to create a concrete proposal that matters.

Geolocation - two distinct fights, one about transparency and user control about disclosure. Unclear to first fight - seeing implementations do what we would like them too.

Privacy by design - part of the discussion later this afternoon. Now time for lunch.

VI. Data usage control

Discussion on Yang’s presentation: Important to leverage the social relations for nudging the user behavior, one should not however require too much interactivity from users from nudging perspective (risk of creating a noise). More studies are needed to identify how to show the most relevant notifications/indications to the user

General discussion:

Supporting User Privacy Preferences on Information Release in Open Scenarios
Self-signed vs certified credentials and atomic properties, forming user’s portfolio.
Sensitivity labels over the properties forming a partial order (specified by the end user - could be cumbersome?)
Joint sensitivity computed based on elementary sensitivities of properties (can be more or less than a simple sum, associations have also values), disclosure limitations (number of elements), forbidden views.
Solve a Minimization problem given a server request.

Future work: sensitivity non-atomic credentials, providing proofs labels based on context, user-intuitive approaches for expressing preferences (need to care about side effects, consideration of previous disclosures.

Discussions: Usability: can we leverage the entropy wrt to previous disclosures? What brings each additional disclosed data wrt to the history of previous disclosure (in the case of non-atomic credentials)?

Q: Why not using rules for preferences instead of sensitivity level computation. A: You may end-up with a set of non-comparable sets of credentials/properties

Q: A lot of pressure on user to determine the hierarchy, why this is not done by an expert ?

# Basic Discussions what is the goal, haven't finished the overall discussions

Discussions about goals of privacy and how this translates into technology.

More workshops needed on the high level scope - with those who are interested in determining new goals and how to translate them into technical measures and requirements. Privacy by design is probably still a bit too high level. What does Privacy by design and areasonable expectation of privacy mean in terms of technology design?

Two fold results:

Solutions for the b2b business cases (exmpl: making databeses more intelligent)

Small solutions for specific ise cases (geolocation example)

Bring the outcome together in the api working group 2 unique pieces Privacy icons for simple representation on privacy ploicies and UI

Usability and komplex possibilty of user prefrences is hard to realize

Havyweight contol is to be translated into usefull scenarios

= Bring the the different contributions together

legal system doen't work for our issues

W3C privacy workshops are different from other W3C working groups

- - - Do we need another workshop ?

# impressions of the Workshop

- B2B approach

- Feedback, orientation for the nexts steps of my work

simple is king;

group can represant the variety between law, privacy requirements, user perspective and business models

VII. Final Discussions

In the final discussions people re-assessed the past 2 days and the diverse topics that were treated. All felt that we are at a fundamental turning point in the challenges on privacy. This fundamental turning point would have to be addressed by a discussion on the fundamental values we want the Web to have and how to achieve them. Most participants found the workshop a very useful exercise and wanted to renew the experience.

The debate continued on concrete next steps. Discussions were mainly on how to help the Device API Working Group to tackle privacy issues. The suggestion from Nick Doty augmented by semantics from the Raggett paper were seen as a simple but promising solution. Nick Doty was encouraged to bring this suggestion to the Device API Working Group.

W3C Workshop on Privacy and data usage control 04/05 October 2010, Cambridge (MA)