Future of P3P Position Paper JRC

Position paper for "Future of P3P" workshop, Dulles, Virginia, USA, 12-13 November 2002.

This paper represents the views of the Joint Research Center of the European Commission and has been endorsed in general terms by a meeting of the Internet Task Force of the EU Article 29 working group.

Abstract

This paper discusses problem areas in P3P 1.0 and proposes possible solutions. It focuses on P3P's relationship with EU law but also covers all issues we feel should be given priority. It is divided into a section on short-term issues and an annex on longer-term issues. The short term issues cover: cookie management, presentation of human readable information to users, jurisdictional semantics, purpose and recipient taxonomies and APPEL preference exchange language issues. The long-term issues include fundamental issues around the syntax and provenance of the P3P vocabulary as well as consent issues. We then discuss a group of issues relating to addressing the potential mismatch between "promise and practice", which we see as a fundamental problem in P3P1.0. These issues include a taxonomy of security measures, use of digital signatures to provide non-repudiatability and last but not least, an outline architecture for the use of P3P in audit trail systems to provide a check on actual practices.

A technical analysis of problems with P3P v1.0 and possible solutions. 1

Introduction
2. Short Term Issues
3. Long Term issues Summary.
Annex 1. Background
Annex 2. Long Term Issues
1. 5. Long Term Issues
2. 5.1 Long term vocabulary issues.
3. 5.2. Consent issues.
4. 5.3. Repudiatability of policies.
5. 5.4. Expression of security measures
6. 5.5. Identity Management and management of data flows
7. 5.6. Non-enforceability - automated audit trail systems.
Annex 3. Sample XML Digital Signature
REFERENCES

1. Introduction

This paper discusses problem areas in P3P 1.0 and proposes possible solutions. It focuses on P3P's relationship with EU law but also covers all issues we feel should be given priority.

In keeping with the theme of the workshop, we have focused on the issues which lend themselves to a more immediate solution. Since awareness of the long term issues is crucial to the planning process for short term issues, we include a summary of longer term issues with a full description in an annex. We plan to present these in detail at a workshop planned for spring 2003 in Europe.

The main section on short term issues covers: cookie management, presentation of human readable information to users, jurisdictional semantics, purpose and recipient taxonomies and APPEL preference exchange language issues. The long-term issues include fundamental issues around the syntax and provenance of the P3P vocabulary as well as consent issues. We then discuss a group of issues relating to addressing the potential mismatch between "promise and practice", which we see as a fundamental problem in P3P1.0. These issues include a taxonomy of security measures, use of digital signatures to provide non-repudiatability and an outline architecture for the use of P3P in audit trail systems to provide a check on actual practices.

2. Short Term Issues

2.1 Compliance with DPD principles.

2.1.1. Cookie Management

Problem:

There are two crucial data transfer events in the lifetime of a cookie. Firstly, when it is set by a remote server using a "set-cookie" http header. At this point the data, which the data controller wishes to make persistent is stored on the user's computer. The second event is called "replay", which refers to the time when the information stored in the cookie is sent back to the server. In P3P, policy evaluation may be applied just before either of these 2 events takes place. According to P3P, when a server stores information on a user's own computer, no data transfer event has yet occurred.

However, according to the EU evaluation group [3], this event already constitutes data processing and the rules of the directive must therefore be applied at set time. If data is stored, even if on the user's own computer, it is an act on behalf of the controller which constitutes data processing. According to EU policy therefore, Cookie policies should be evaluated in relation to the time when they are set, not the time they are replayed.

This requirement considerably complicates the issue. Its satisfaction is not a simple matter of applying P3P at the point of "cookie-set", because there are also counter-motivations for NOT requiring application of P3P at cookie-set time. These relate to the fact that cookies can be set at a domain level by multiple sub-hosts. For example geocities.com controls 1000's of subhosts (x.geocities.com), each of which may set a cookie, which could be replayed to every host in the geocities domain. If P3P policies for cookies must be applied at set-time, the setting host must be responsible for whatever is done by any of the 1000's of hosts in the geocities domain. If, however policies may be applied at send time only, then each host is responsible for what it does with the cookie data.

It is worth mentioning in this section that no current P3P user agent implementation except the JRC proxy implements a cookie management feature using full P3P policies (MS IE6 uses compact policies). Neither do they base their decisions in any way on the content of the purpose or recipient flags within a compact policy. This is a serious omission for the implementation of a specification, which must be able to operate within an environment which aims to protect users according to the statutes of European Law. Services contravening European data protection law may be physically outside of European jurisdiction but are still subject to European law if they make use of equipment placed within the European Union[1].It is also an obstacle to the progress of P3P implementation as cookies are one of the areas where the full weight of P3P should be applied. We suggest that steps should be taken in future to ensure that implementations give at least the possibility of imposing some purpose and recipient criteria on cookies.

Proposed Solution:

Short of altering the cookie concept itself, there is no way around the fact that cookies may be replayed to hosts within the same domain, other than those, which set them.

"Any host to which the cookie may be replayed MUST be able to honor all the policies associated with the cookie, regardless of whether that host declares a policy for that cookie." ,which covers this problem. However it still allows for P3P to be applied at cookie send time, which according to the EU working groups contravenes the rights of the data subject. Despite the restrictions this puts on companies which host large sets of hosts, the only solution to this problem appears to be to specify that P3P must be applied at cookie set-time and the above caveat maintained in the specification. This is not as draconian as it might appear. If a controller does not want a cookie to be applied to an entire domain, it can easily restrict its application. If it does, and the cookie contains sensitive information, then the company should be prepared for the consequences of this information getting into the wrong hands.

2.1.2. Point of provision of purpose information to users

Problem:

[It is required] "to mention clearly the existence of automatic data collection procedures, before using such a method to collect any data." [2]

Human readable information on data processing purposes should be conveyed to users prior to any data collection act. P3P1.0 does not provide mechanisms which adequately satisfy EU requirements in this respect. Although P3P policies can be used to pre-inform the user of privacy practices, the default implementation (for example in IE6) and the sense of the specification, is for the user to be informed ex post factum.

Proposed solution:

The solution to this problem is to emphasize somewhere in the P3P specification, and in any implementation guides that in order to satisfy European law for acts of data collection involving PII, display of human readable elements must not be ex post facto. This involves user interface issues because it is potentially very invasive to present all the information on data collection purpose prior to every collection event. We believe however that these issues can be solved and we present a potential solution in section 5 below.

2.1.3. Geographical and Jurisdictional Semantics

Problem Description:

" Where it is anticipated that the data will be transmitted by the controller to countries outside the European Union, to indicate whether or not that country provides adequate protection of individuals with regard to the processing of their personal data within the meaning of Article 25 of Directive 95/46/EC. In that case, specific information must be provided on the identity and address of the recipients (physical and/or electronic address)"[2]

But P3P does not allow controllers to inform data subjects about envisaged transfers to third countries because there is not any taxonomy of geographical or jurisdictional information, which can be attached to recipients.

Proposed solution:

Pragmatically speaking what is required is the ability for a user agent to distinguish whether a recipient complies to the level of privacy protection provided within a particular jurisdiction, or whether instead a transaction should be blocked because the data is taken outside a territory where it is adequately protected.In order to achieve this for the purposes of EU data protection law, a simple binary attribute such as Euprotectionleveljurisdiction="Yes/No" indicating whether the recipient operates under a jurisdiction which protects the user's data to at least the same level as European law would suffice.

Another quick solution is to use the provision existing in P3P 1.0 for text addresses within the ENTITY element. For example.

In conjunction with a list of countries and information about their legal systems, this could allow the agent to determine whether the controller was outside EU jurisdiction. If the same semantics could be included within <RECIPIENTS/> elements, then this would offer a quick fix for this problem. It is clearly not ideal, because it requires, but does not provide a standardization of CDATA elements within the DATA elements. It would also not be able to tell us for example whether a country was within a supranational area of jurisdiction such as the EU.

This solution however makes it difficult to determine jurisdiction, although it can be deduced to a certain extent from any country information contained. It would not for example be able to express the fact that a piece of information had passed with a US "safe harbour" zone. We suggest therefore that in the longer term, P3P should move to an RDF [6] + OWL [7] ontology solution. As mentioned below, regional and jurisdictional ontologies could then be plugged in or as may be necessary in the case of the jurisdictional ontology, created from scratch. These would then provide the semantic richness to make this kind of distinction, as well as other useful distinctions which would follow from such information. For example a taxonomy of jurisdictions and a dynamic ranking system which can be referred to in order to determine the relative levels of privacy protection would address the unsolved problems in expressing the EU directives. In the longer term, a specification should not show such obvious regional bias as the "EUleveljurisdiction" attribute mentioned above. Therefore a more neutral means of expressing the same thing must be developed.

2.2 User interface issues

The problem of user interface is a very important issue to solve in the immediate future. P3P provides a wealth of information, which by its nature of improving transparency of data processing practices, should somehow be available to users. It also provides a framework for specifying privacy preferences in a very granular way. However studies have shown that most users are willing to sacrifice very little time in order to protect their privacy [7]. This gives serious challenges to anyone developing an agent implementation for P3P user preference specification and checking.

P3P 1.0 makes no mention of user interface issues, neither does the P3P implementation guide. There are good reasons for this, as it is generally outside of the scope of a specification to specify how it will be implemented. However, there are considerable problems to be overcome in this area and there is considerable interplay between the specification and its implementations. For example, P3P 1.0 provides several attributes to enable a user interface to display a human readable distillation of a policy. It is therefore equally crucial for subsequent versions of P3P to make concessions to the user interface, which take into account likely solutions. To this end, we would make the general remark that it is important for the specification group to work alongside implementers and researchers on such issues.

We now describe some of the specific challenges to be overcome and some possible solutions:

2.2.1. How to create meaningful preference interfaces.

In matching P3P policies, there is a huge range of options, which an agent can try to look for. The W3C note, APPEL [8] gives a suggested specification for a matching algorithm and interchangeable XML rule format, which is in fact the only existing interoperable format for preference files. However, to implement a user interface to the full range of possibilities within APPEL results in an extremely complex interface and in fact there is only one utility existing, designed as part of the JRC P3P project. The experience of designing this interface has suggested several points, which are relevant to further development:

An interchangeable format for preference rules is very important as it allows data protection professionals to disseminate minimum guidelines and default privacy protection levels to users who have neither the time nor the knowledge to create themselves.

The user interface for this would be much improved by a move of P3P to a formalized ontology. If P3P were expressed within a formal ontology, tools for visualizing this could be used within user interfaces, and translations of interfaces, which are natural to users could be made into RDF based query languages for matching purposes. Ontology interfaces such as OILED [10] allow natural representations of conceptual frameworks to be made.

Interfaces should be sharply divided into sections for expert users and users who can choose between a small number of predetermined preference sets. However, it should be possible for users to make more advanced choices if they wish.

2.2.2. How to create meaningful feedback systems.

P3P 1.0 provides several different types of information, which can be fed back to the user during or after the P3P evaluation process. These types are:

P3P allows for a range of levels of feedback into the user experience - from a simple red light system as in Internet Explorer 6.0 - to a full page explanation of what happened in the evaluation in the resource. Users should be given a choice between levels of information they want to receive. This is achieved to a certain extent in existing implementations, however we feel that it could be improved upon if P3P were incorporated into a "tab" system such as is used to provide history information in IE. A "tab" system is a collapsible frame on the left hand side of every page, which can provide real-time information relating to pages displayed opposite, or other information, independent but simultaneous to that of the page being browsed. Netscape now makes a lot of use of a system of "tabs" to provide information such as bookmarks, news, history. This system is not currently used in any P3P system but would seem to be an ideal mode of providing feedback. The difference between this and providing information as IE and Netscape currently do is that if the user wishes, the tab can be a source of information which provides instantly available fresh information on the resources being accessed, simultaneous to page viewing. Such a tab could be configured, rather as the IE search tab to provide different levels of information.

The JRC's proxy implementation makes use of a similar system, which was implemented after experimentation with other systems. It uses an expandable floating privacy tab, which expands on mouseover and provides information on pages which have been evaluated in the current session. As it is part of a proxy server implementation, it cannot alter the browser interface and is inserted as a DHTML tab. However, it has many of the advantages of a tab

2.2.3. Specification Issues for User Interfaces.

From the point of view of the specification, we would suggest 2 points for improvement, which are relevant to the user interface:

1. A formalized ontology would also help in the presentation of information to the user. It would allow implementers to leverage conceptual visualization techniques, which have been developed within these systems, with minimal effort. A clearer class structure, with a transparent relationship to natural language, would also help.

2. If the specification could be designed so that the semantics were clearly divided between into 2 levels of detail, one meant for quick summaries, and the other for detailed presentations. For example, there could be a system of policy meta-data, which would provide key descriptive terms according to a metadata schema. Another example is to provide an XSL stylesheet for summarizing policies. The current compact policy specification has gone some way towards this.

2.3 Short term vocabulary issues

Apart from user interfaces, the issue of a semantically transparent taxonomy is perhaps the greatest challenge for P3P and is inseparable from many of the issues above. Despite incompleteness in certain respects, P3P 1.0 provides a sound foundation for such a taxonomy. Although P3P is not expressed in standard ontology syntax, it represents, through the W3C processes, which have underpinned it, a five year process for a data protection vocabulary. We feel however that the taxonomy represented by the existing version of P3P could be improved in the following ways.

3.3.1. An issue of urgency is P3P's schema [17] for specifying categories of personal data, which currently has the following problems.

It is defined in a very cumbersome way (by a P3P specific pointer mechanism within a flat definition scheme) - this is highly problematic for implementations and could easily be rectified by a move to a formal ontology, or at least to a standard XML schema syntax using REFID's for multiple subclassing.

It has not been subjected to rigorous use case scenarios. For example, the following category description from the base data schema is the closest we can get to describing the http header information.

http

Navigation and Click-stream Data, Computer Information

httpinfo

HTTP Protocol Information

However, http header information cannot be described by any of the terms in the sentence "Navigation and Click-stream Data, Computer Information ".

It does not have a well-defined semantics. It is not clear whether the items in the schema refer to categories, or to data objects. For example in discussions with the working group, we have seen the data schema term "email" described as referring to an email address, whilst we would argue that it refers to a class into which may be placed all data objects, which can be described as "email". There is an important difference because the term, "user" clearly does not refer to a user, but to his data. This needs to be rectified within the context of a more formal ontology development process, as described below.

Although it does provide a P3P-specific customization route, because of the non-standard syntax, this does not allow applications to leverage existing ontological frameworks and thus widen the descriptive power.

3.3.2. As mentioned above, the recipient taxonomy stands at the central point of any privacy taxonomy and as such it is not sufficiently descriptive in P3P 1.0. Specifically, it needs to be altered to be able to express the requirements of the European directive. Furthermore, as has been noted elsewhere in this paper, there are more fundamental requirements, which need to be addressed in order for the recipient taxonomy to be adequate. These include the ability to attach security and jurisdictional taxonomies as sub-trees of recipient instances.

3.3.3. The purpose specification taxonomy needs to be subjected to more rigorous use case scenarios. The evaluation group [4] felt that given the purpose specification of data collection is one of the most important elements in the taxonomy, the 12 cases provided did not cover what is required.

"DoubleClick does use information about your browser and Web surfing to determine which ads to show your browser."

P3P would cover the preceding sentence with the Element <customization/> and possibly <individual-decision/> and <tailoring/>- however it is not clear from any of these, and it cannot be expressed, that it is for the purposes of advertising third party products. This would however be something of concern to many users.

2.4. APPEL issues

Problems:

As we noted above, a preference exchange language is a very necessary part of P3P. However, there are various problems with the preference expression language. Constructing the logic of matching patterns is very complex, and involves various inherent contradictions. For example, the following rule looks for any information which is not the user's IP address or user agent string and blocks resources which ask for it.

This RULE will cause a block behavior for the following web site policy (only relevant parts quoted),

Note the presence of the "non-and" connective, which means - "only if not all sub-elements in the rule are present in the sub-elements of the matched element in the policy". This is true for the first policy snippet but not the second, which given that they have the same meaning is clearly unacceptable. We will look at solutions which address this problem below.

Proposed solutions:

We have already noted the benefits of moving APPEL to a version based on an OWL P3P ontology of P3P, namely improvements in visualization techniques and reasoning. Given the work involved, this may however be considered a long term objective.

A more immediate solution, which would an initial use of a standard query language for the condition matching part of APPEL. Instead of using APPEL's somewhat quirky APPEL connective system and recursive matching algorithm the rule condition could be specified by an XPATH [14] query (or by the time it becomes relevant, Xpath 2.0[12]). These query languages are designed to match arbitrary node sets with high efficiency. They have the advantage that developers are familiar with them and efficient algorithms exist to execute the queries. As it has become very clear that APPEL is not a language that will be written by anyone other than developers or ordinary users using a GUI, this is clearly the best approach.

E.g. a rule in this format, which would solve the above ambiguity problem would be:

<appel:RULE behavior="block" prompt="yes" promptmsg="Resource will use your home info beyond current purpose ">

"//DATA[not(substring(@ref,' dynamic.clickstream.clientip.fullip') or substring(@ref,' dynamic.http.useragent'))]"

It should be noted that the recent issue of the XPATH 2.0 [12] specification, which provides an even more powerful matching language, makes this an even more compelling solution.

3. Long Term issues Summary.

We suggest that P3P should be based on an OWL ontology, which has been developed according to a formally documented ontology capture process. We give motivations and the many advantages of this approach.

We suggest that P3P is the ideal specification in which to incorporate a mechanism for obtaining consent (signed or otherwise) from users for data processing. We suggest using an http header mechanism to achieve this.

We discuss the issues around the use of XML digitally signed policies in order to increase consumer trust and provide a watertight route of legal recourse.

We outline reasons for the inclusion of a security measures taxonomy within P3P, most notably that this is required for policies to be able to express compliance with EU law. We stress the link between this and a later discussion of audit trails.

We explain the need for granular management of data flow within P3P or APPEL and the need to link to the syntax of XForms.

We discuss the idea of addressing the problem that P3P provides no enforceability by outlining in detail a proposal for using P3P and the proposed P3P ontology to provide an interoperable system for privacy audit trails. This is to address the question: "That's what they say - but what if they don't do what they say?"

Annex 1. Background

1. Research into P3P and a full participation in the development process of the standard, including the development of a reference implementation (see http://p3p.jrc.it), which was the first (and until now the only) implementation, which fully complies with the P3P1.0 April 2002 specification. It consists of an open source Java User Agent and a model e-commerce site. The agent was built specifically with the intention of demonstrating and evaluating P3P from a research perspective. The architecture was designed with the following objectives:

2. A meeting of privacy experts held on May 27^th 2002. The meeting included a demonstration and covered concerns and issues, many of which are relevant to this paper. A report [4] was published with the findings of this meeting.

3. Research published in a peer-reviewed publication, "A Fully Compliant Research Implementation of the P3P Standard for Privacy Protection". This paper will be published at the European Symposium on Research in Computer Security (ESORICS) outlining our most important findings to a high level of detail. [3]

4. A special meeting of the Internet Task Force of the Art 29 WG on September 23^rd 2002, Brussels.

Annex 2. Long Term Issues

5. Long Term Issues

5.1 Long term vocabulary issues.

1. The P3P taxonomy should be given a formally documented consensus process, which traces its approval by stakeholders, and includes methods for eliciting expert knowledge from stakeholders who are not necessary technical experts. The JRC is developing such a process in conjunction with Aberdeen university and has published a description of it.

2. It should be expressed in formal Ontological syntax according to the latest OWL specification of W3C. This has the obvious advantage of integrating it in a well grounded semantic model and allows developers to leverage a large corpus of work done on ontology visualization, RDF query languages and rule based matching systems. It also allows the taxonomy to hook into related ontologies, such as a full geographical ontology, which, as mentioned above, is necessary for compliance with European law.

3.Clear understanding of terms allows ease of translation between alternative ontologies. For example if 2 competing data protection ontologies are developed, but use the same standard, then a translation service between them can easily be set up.

4. Clear separation of vocabulary and syntax allows the same vocabulary to be plugged into different data protection systems.

5.2. Consent issues.

Problem description

The Article 29 working group has stated. "Internet users must have a real possibility of objecting …on-line by clicking a box"[2]. Any collection of personal data must have a specific opt-in mechanism - in other words, consent must be explicitly expressed.

Although P3P is able to check what a P3P policy states about consent, using the opt-in, opt-out attributes in the policy, it is not able to check that there is actually a mechanism in place for doing this. More specifically, the following should be provided;

An integration with Xforms [5] to extract the semantics of consent boxes and validate claims of opt-in mechanisms.

Methods for expressing (possibly signed) consent. Although the requirements of the directive do not stipulate this, the specific requirement that users must have the option of explicit objection to data collection effectively requires that businesses can prove, in certain cases that consent was given. If some way of expressing signed consent were built into P3P, it would be a considerable aid to both parties and especially to businesses wishing to protect themselves against the consequences of disputes.

It may be argued that it is not within the remit of P3P to deal with the issue of consent, and that this should be addressed perhaps by the XFORMS group. However, consent for using personal data is an issue, which relates specifically to data privacy, and is independent of whether that data is transmitted through forms, or through for example, http headers. Therefore P3P is the ideal specification to include a mechanism for expressing consent.

Proposed Solution:

Here we outline a sketch of how such mechanisms might work. Full details would be a matter for the specification group.

5.2.1. Checking for an opt-in/out mechanism

a. There could be a specific attribute published within a namespace approved by the P3P specification, but mentioned within the Xforms specification (alongside other proposed attributes such as the policy reference declaration), which expresses in a machine readable way the fact a check box or other formfield is for expressing consent.s

This would have the important advantage of providing a standard syntax useable by all form systems for expressing consent.

5.2.2. Requesting signed consent

We considered the possibility of a mechanism for expressing consent by including, within a policy, an element specifying the name and various other specifications for a hidden form field to be added to a POST operation, containing a signed statement, as specified in the element.

We therefore suggest that a mechanism could be provided for requesting and providing consent using http headers, which would also provide the option of asking for a signed consent.

In this case, an element would be added to the P3P policy similar to the following:

<DATAREQUIERED certificate="X.509" algotrithmtype="RSA" minkeylength="128">I agree that my data in this form will be published on the internet.

</DATAREQUIRED>

</CONSENTREQUEST>

<DATA/>

CONSENTREQUEST could be inserted within a DATA element to state that the collection of this type of data requires the consent of the user, and how this consent should be sent.

"method" - specifies that the consent should be specified using an HTTP header specified by the attribute "headername" - specifies the name of the header which should contain the signature data. The DATA element contains the statement which is required to be signed to express consent. In its attributes, it contains various requirements to allow for flexibility in the requirements for signature types.

5.2.3. Structure of message.

To be of any use, consent messages need to be stored in a structured way in the "back office" of the service provider. The most important requirement for the "back office" is that the message can be linked to the data which it provides consent in the case of a dispute. This requirement however needs to be set against the possible loss of privacy involved should the message be linked with a unique identifier.

Because of this latter consideration, it should be left up to the service provider to link the consent message with a unique identifier binding it to the information, such that the possible privacy losses contained in such an identifier are appropriate to the situation. For example if the subject is willing for their entire information to be retained indefinitely, then a hash of the all or part of the information may be used. However, if they are not, then this is not appropriate, because such a hash could later be used to perform data mining operations on sensitive information. In this case, a hash of some form of session id might be more appropriate. Another solution is some form of escrowed key system which could be used to unlock the identifiers by a legal authority requiring the proof of consent. This is overkill for most situations. In either situation, the date of the consent may be taken from the http request headers.

One possibility for structuring of the messages themselves however is to express them according to the proposed OWL P3P semantic model. For example, RDF statements could be constructed to formally express statements such as

"I am a data subject and I agree that the data objects transferred in this request may be transferred to third parties." (ontological terms underlined)

If such a consent statement were expressed using RDF statements it would carry more legal weight through this unambiguous and transparent semantics and would make management of different consent statements easier by making them easily processable by software agents.

5.3. Repudiatability of policies.

Problem:

A principle problem for P3P is that if a company's practices contravene its stated privacy policy, there is little technical framework to prove that a company made the statements which may have existed on its server at a given time. I.e. it is too easy for a company to repudiate its policy.

While P3P does increase the level of trust felt by consumers by providing more transparent and unambiguous information, it does not however provide any assurance as to the authenticity and integrity of this information.

Proposed solution:

XML signatures [14] offer an ideal solution to the problem of making a policy at a given URI non-repudiatable. XML signatures provide the opportunity to introduce assertions such as "X assures the content of this document" into the semantics of signed material. Also since P3P is entirely expressed in XML, it is pragmatic to use the XML version of asymmetric digital signatures to provide this assurance. The following section defines in detail how this might be achieved.

Joseph Reagle of W3C has already gone some way towards outlining the detail of this solution. We examine and build upon the proposals of Reagle [15] for the inclusion of XML digitally signed [14] policies within P3P. As Reagle has already set out most of the mechanisms for achieving this, we make only three minor additions to the technical specification. Our main aim is to look at possible technical problems with the use of the XML signature extension, and their solutions.

XML Digitally Signed Policies.

P3P enabled servers could have the possibility of providing an XML digital signature as part of their policy, or as a separate document referenced within the policy. This is easily accomplished provided that the correct syntax is incorporated into the P3P specification, as shown by Reagle. Reagle's example should however be modified in the following ways.

a) Add an X.509 certificate bag to provide full non-repudiatability.
b.) Include a time stamp to comply with EU regulations.

c.) Require an additional signature over the PRF, which details which resources the policy applies to. Any signature that does not assure this information loses much of its legal significance. Note also that this signature cannot be bundled in with the policy signature because several PRF's may refer to the same policy. Furthermore, the person responsible for producing policy signatures may not even know the location of PRF's referring to the policy (in the case of a standard issue policy used by third parties.) We suggest the addition of a "DISPUTES" element to the PRF identical to the DISPUTES element in the P3P policy which allows the specification of a signature URI using the validation attribute.

The P3P process has 2 main components on the server; an XML policy and an XML PRF, which binds policies to resources. Semantically therefore, a P3P statement is a combination of the policy and the PRF, which binds a policy to a resource at a given time. The PRF, like the policy has a validity timestamp.

However, Reagle's P3P extension does not include any binding signature for the P3P PRF. This is an oversight, because omission of a signature binding the policy to the resource it applies to negates the non-reputability of the statements being made. The importance of the PRF cannot be ignored. We therefore suggest that a signature also be provided for any PRF's used. We show, however, in the example signature the necessary extensions for a signature to be bound to a PRF. It is also worth mentioning the possibility of an additional signature over the human readable policy, which could be achieved by the same mechanism. There has been some discussion of the fact that there may be discrepancies between the human readable and the XML version of privacy policies. This would ensure a commitment to consistency between both versions.

5.4. Expression of security measures

Problem description:

The European directive specifies that adequate security measures should be taken to protect data (95/46/EC Article 17). However there is no means within P3P to express the level of security around personal data. This may be seen also as an issue of vocabulary. However we have included it in this section because it specifically relates to the EU directives. The reasons for this are clear. State-of-the-art security measures are constantly changing. Furthermore, it is very difficult to define security measures in any meaningful way. It might for example be stated that a database is password protected, but password might in reality be "abc".

Proposed Solution:

There are several candidate schemas already in existence for classifying and describing security measures. It may be that these are able to provide some solution to this problem if incorporated within the P3P taxonomy. However, as mentioned above, any security taxonomy will either be too general to be useful, or would be out of date within a short space of time.

An additional solution, which may solve this problem is therefore to provide the opportunity for third party security seals within policies. P3P already provides a placeholder for data protection seals within the DISPUTES element. However these do not relate to security measures, only to data practices. The specific provision of a security seal placeholder would allow for a validation by a third party which would not constrain expressiveness to a taxonomy of security based around a changing and meaningless set of parameters. Instead, it would provide proof of a flexible and intelligent audit carried out by reputable individuals. It may also include a datestamp as an indication of the "freshness" of the seal. The expense of providing a meaningful seal may be a problem in in itself.

Finally, we suggest that the incorporation of a framework for machine understandable audit trails (see section…), may also provide some solution to this problem, as it would provide the possibility for rapid and accurate assessment of security policies.

5.5. Identity Management and management of data flows

Problem:

P3P 1.0 makes no recommendations on how to link privacy policies to data transfer events, and how to make decisions around such events. The W3C APPEL note, which we look at later in this document, makes recommendations on how to make such decisions. However, these recommendations are limited to only three basic behaviors. What is needed to make P3P into a really powerful tool within ebusiness, is the ability to release data selectively based on privacy policies and the agent's level of trust in them.

One of the main problems existing on the Internet today, is the amount of time and effort it takes to assess which bits of one's identity to release and which to withhold. Therefore it would be useful to extend P3P so that it could automate such decisions in a granular way.

To look at a specific example, the mobile device community has expressed interest in linking P3P with the CC/PP (Client Capabilities, Preferences Profile). In this case, it would be extremely powerful if a P3P enabled agent + Rule base were able to reveal only selected device capabilities, basing its decision on the privacy policy and which capabilities the service might need to know. Most client applications would benefit from such a capability, if it were made easy and robust. When filling in forms, users generally reveal only what is necessary and if the users do not trust the entity with the information which it claims to require, they will not go ahead with the data transfer at all.

It should be mentioned that the ability to selectively release data is strongly connected with identity management, and therefore any developments in this area should be linked into research in this area.

Proposed solution:

As this is an area where extensive further research is required, rather than describing a detailed solution, we will just outline the technical requirements of such a system, and briefly suggest their likely solution.

1. An ontology expressive enough to capture the various data types which might make up a composite identity (selective release of PII). This has already been discussed elsewhere in the paper.

2. Ways of linking that ontology to the data requests by client applications such as Xforms and CC/PP [11]. In other words there should be a common ontology between these specifications and P3P, or an effective translation between them.

3. A rule language and User interface expressive enough to allow selective release of information. This would most likely involve the definition of identities, in other words groups of information types, using a visual representation of a PII ontology and their linking to certain patterns recognized in policies. The identities would therefore become ad hoc classes within the ontology.

4. A clear specification for each page, what kind of information is being requested and which is optional, and which is required. Without this, the engine cannot decide what information to release in a particular case. As it stands, it would be very difficult for P3P to perform this function alone, because P3P policies are necessarily generalized between different resources and semantically they do not give any information about what is required on a particular page.

"whatever the resource this policy is applied to, if you give us information x, we will do y"

What is needed in this new scenario is both the above semantics. We suggest therefore that if the second semantic is provided by (e.g.) the Xform and linked in a granular way with P3P policies, this provides enough information for an agent to make a decision. For example a particular XForm might be able to express the semantic

"I require your email - this email address will be processed according to P3P policy Policy1, which can be found by means x."

On the part of P3P, it would simply require the capability to associate P3P policies to a more granular level than that of resources. In particular, in the case of Xforms, it requires P3P policies to be associated with individual form fields. If a more general specification could be produced allowing association of policies with more diverse entities, this would open up the way for the application of P3P in other similar settings such as CC/PP, irc (chat) etc...

5.6. Non-enforceability - automated audit trail systems.

Problem:

P3P has sometimes been presented as an aid to the enforcement of data protection principles. However, discussion among data protection professionals brings up the obvious obstacle that P3P in its present form can only provide statements of companies' intentions about their data practices. This has no necessary connection with their actual practices and therefore effectively makes P3P powerless as an enforcement tool. In other words, P3P cannot guarantee that the promise matches the practice.

There is of course the inverse problem that companies may abide by the law to the letter, and yet not publish a p3p policy. Therefore P3P can neither guarantee that a company is within the law, or that it is not.

Solution:

One solution to the first problem of non-enforceability, which is still somewhat within the realms of fantasy, but is however not inconceivable, is to use the taxonomy of P3P, perhaps somewhat extended, within a system for automated audit trails.

This solution can be compared to the solution adopted by restaurants who wish to make clients trust their hygiene practices. They put the kitchen in full view of the customers. In the same way, given a sufficiently standardized system, provided, perhaps within P3P, servers could record their data processing events and security related events in such a way that authorized auditing agents could assess them in a measurable way against the regulatory standards and of perhaps additional standards of trust seals. The full details of such a system are beyond the scope of this paper. However, we present below a scenario which helps to view this set of extensions in a concrete way, and from there to extract some requirements for P3P 2.0.

Audit Trail Scenario:
1.User U submits their email address to company x1. This event is logged as: "Data submission event, data type emailaddress:stored in database D"

At this point, the data can take one of 2 paths, which must be clearly distinguished:

A.Data viewed by a moral agent, human individual, with certain legal responsibilities and perhaps risks, outlined in his profile, p..

2.Pm may contain information such as links to signed NDA's, commitments made by M, perhaps a trust profile etc...

1.1. A set of commitments entered into by that agent as described in a P3P policy.

1.2. A pointer to how to find the audit trail left by that agent (anonymized versions may even be publicly available).

An important feature of such a system should be that any agent system A1 passing information to another agent system A2 should have a way of knowing whether A2 is also committed to saving audit trail information, and where and under what circumstances, this information could be accessed.

The crucial feature of such a system must allow an information trail to be stipulated and subsequently followed in order to track real privacy practices rather than privacy promises such as contained in P3P policies.

a. A placeholder for description of audit trail
  -commitments
  -access conditions and locations.
  -seals
b. Improved recipient taxonomy to allow expression of priviledge profiles p as in A.2.
c. A taxonomy for creating audit trail logs (for example was data passed in encrypted form or not.) was it placed in a secure environment.
d. If competing systems exist, then a taxonomy for distinguishing between them. For example a system is able to understand the meaning of " if you release this information, it will be passed from an environment which uses audit trail system x8 to a system which uses audit trail system x9"

Annex 3. Sample XML Digital Signature

Sample XML signature of P3P policy. Note that a signature for the PRF would be identical except that the node marked with ***'s would refer to a policy reference file.

<Signature Id="Signature1" xmlns="http://www.w3.org/2000/09/xmldsig#">
<SignedInfo>
    <CanonicalizationMethod Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000907"/>
    <SignatureMethod Algorithm="http://www.w3.org/2000/09/xmldsig#dsa-sha1"/>
    <Reference URI="http://www.example.org/p3p.xml">
      <Transforms>
        <Transform Algorithm="http://www.w3.org/TR/2000/WD-xml-c14n-20000907"/>
      </Transforms>
     <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
     <DigestValue>j6lwx3rvEPO0vKtMup4NbeVu8nk=</DigestValue>
    </Reference>
    <Reference URI="#Assurance1" Type="http://www.w3.org/2000/09/xmldsig#SignatureProperties">
     <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
     <DigestValue>1342=-0KKAASIC!=123Adxdf</DigestValue>
    </Reference>

<Reference URI="http://www.example.org/signaturePolicy.xml">
     <DigestMethod Algorithm="http://www.w3.org/2000/09/xmldsig#sha1"/>
     <DigestValue>1234x3rvEPO0vKtMup4NbeVu8nk=</DigestValue>
    </Reference>

</X509Data><X509Data><X509SubjectName>Subject of Certificate B</X509SubjectName></X509Data><X509Data> <X509Certificate>MIICXTCCA..</X509Certificate><X509Certificate>MIICPzCCA...</X509Certificate><X509Certificate>MIICSTCCA...</X509Certificate></X509Data></KeyInfo>

</SignatureProperty>
<SignatureProperty Id="TimeStamp1" Target="#MySecondSignature"> <timestamp xmlns="http://www.ietf.org/rfcXXXX.txt"> date>19990908</date> <time>14:34:34:34</time> </timestamp> </SignatureProperty>

REFERENCES

[1] See Working document on determining the international application of EU data protection law to personal data processing on the Internet by non-EU based web sites, adopted on the 30th of May 2002 (WP 56).

[2] Art 29 - Data Protection Working party: Recommendation 2/2001 on certain minimum requirements for collecting personal data on-line in the European Union; Opinion on P3P, 16 June 1998.

[3] "A fully compliant research implementation of the P3P standard for privacy protection: experiences and recommendations", Giles Hogben, Tom Jackson, Marc Wilikens.
ESORICS 2002, Zurich, 14-16 October 2002. Springer Verlag.

[8] Ackerman, M. S., Cranor, L., and Reagle, J. (1999). Privacy in E-Commerce: Examining User Scenarios and Privacy Preferences. Proceedings of the ACM Conference in Electronic Commerce : 1-8. New York: ACM Press.

[16] Information Technology - Code of practice for information security management. BS ISO/IEC 7799-1:2000. British Standards Institution. 2000. ISBN 0 580 36958 7

Additional Background

[1] Opinion 1/98: Platform for Privacy Preferences (P3P) and the Open Profiling Standard (OPS), adopted on 16 June 1998.

[2] Working Document Privacy on the Internet - An integrated EU Approach to On-line Data Protection- adopted on 21st November 2000.

A technical analysis of problems with P3P v1.0 and possible solutions.

Abstract

Table of Contents

1. Introduction

2. Short Term Issues

2.1 Compliance with DPD principles.

2.1.1. Cookie Management

Problem:

Proposed Solution:

2.1.2. Point of provision of purpose information to users

Problem:

Proposed solution:

2.1.3. Geographical and Jurisdictional Semantics

Problem Description:

Proposed solution:

2.2 User interface issues

2.2.1. How to create meaningful preference interfaces.

2.2.2. How to create meaningful feedback systems.

2.2.3. Specification Issues for User Interfaces.

2.3 Short term vocabulary issues

2.4. APPEL issues

Problems:

Proposed solutions:

3. Long Term issues Summary.

Annex 1. Background

Annex 2. Long Term Issues

5. Long Term Issues

5.1 Long term vocabulary issues.

5.2. Consent issues.

Problem description

Proposed Solution:

5.2.1. Checking for an opt-in/out mechanism

5.2.2. Requesting signed consent

5.2.3. Structure of message.

5.3. Repudiatability of policies.

Problem:

Proposed solution:

XML Digitally Signed Policies.

5.4. Expression of security measures

Problem description:

Proposed Solution:

5.5. Identity Management and management of data flows

Problem:

Proposed solution:

5.6. Non-enforceability - automated audit trail systems.

Problem:

Solution:

Annex 3. Sample XML Digital Signature

REFERENCES

Additional Background