Here is my draft text for addressing the confusion around identity terms in the spec. http://www.w3.org/Bugs/Public/show_bug.cgi?id=167 Identity Definitions in the P3P Specification In privacy regulations, guidelines and papers about privacy a variety of terms are used to describe data that identifies an individual to varying degrees. Some common terms such as Òpersonally identifiable information (PII)Ó are often not defined or the cause for heated debate. In different documents, ÒidentityÓ can be tied to: 1) how the information can be or is being used, 2) how the information is stored, or 3) the type of information. The P3P Specification Working Group tried to capture all three of these ideas so that different implementers and users can make decisions based on the importance they place on these various definitions of identity. (1) Identity Through Usage (ÒidentifiedÓ data) The most common term in the specification is Òidentified dataÓ and focuses on how the information can be or is being used. ÒIdentified dataÓ is information that reasonably can be used by the data collector to identify an individual. Admittedly, this is a somewhat subjective standard. For example, a data collector storing Internet Protocol (IP) addresses (which can be created dynamically or could be static and therefore tied to a particular computer used by a single individual) should consider the IP address Òidentified dataÓ only when an attempt is made to tie the exact addresses to past records or work with others to identify the specific individual or computer over a long period of time. In the more common case, where data collectors use IP addressing information in the aggregate or make no attempt to tie the IP address to a specified individual or computer over a long period of time, IP addresses are not considered identified even though it is possible for someone (eg, law enforcement agents with proper subpoena powers) to identify the individual based on the stored data. Identity Through Storage (Ònon-identifiableÓ and ÒlinkedÓ data) The working group also felt that data collectors should be able acknowledge when they make specific attempts to anonymize what would otherwise be identifiable in its storage. The term Ònon-identifiableÓ data refers to how the information is stored. For example, a data collector collecting and storing IP addresses but not using them should NOT call this data "non-identifiable" even in the common case where they have no plans to identify an actual individual or computer. However, if a Web site collects IP addresses, but actively deletes all but the last four digits of this information in order to determine short term use, but insure that a particular individual or computer cannot be consistently identified, then the data collector can and should call this information "non-identifiable." Also, non-identifiable can be used in cases where no information is being collected at all. Since most Web servers are designed to keep Web logs for maintenance, this would most likely mean that the data collector has taken specific efforts to ensure the anonymity of users. Under the above definitions, a lot of information could be ÒidentifiableÓ (not specifically made anonymous), but not ÒidentifiedÓ (reasonably able to be tied to an individual or computer). Similarly, the term ÒlinkedÓ refers to how information is being stored in connection with a cookie. All data in a cookie or linked to a particular user must be disclosed in the cookieÕs policy. Using the terminology above, if the data collector collects ÒidentifiableÓ information about the user it is generally ÒlinkedÓ data. Identity Through Information Type The Working Group felt that different user agent implementations could be created to focus on different concerns around data type. Therefore, the working group enabled the creation of a robust data schema including broad categories of information that may be considered sensitive by certain user groups. The Working Group hopes that a diverse set of user agents will be created to allow users the ability to make identity decisions based on specific collections and types of collects if they desire to do so. For example, a user agent could allow users to opt to be prompted when medical or financial identifier is being collected, independent of how that information is being used. (1)Ê More information on the debate and the definitions can be found in Lorrie Faith CranorÕs book Web Privacy with P3P, OÕReilly, 2002.