Privacy Spec Implementation Example

From Customer Experience Digital Data Community Group
Jump to: navigation, search

Preamble

The Privacy Specification is intended to be an optional recommendation. Some simple web sites may not need a privacy definition and implementation. Other sites may fall under geographies or industries where strict compliance is legally mandated. The intention is to provide a lightweight framework and a set of standards by which different site implementers, tag management vendors and marketing and analytics vendors can work together, integrating their systems in a cohesive and vendor-neutral manner.

Goals and Guidelines

The following guidelines drive the actual Privacy Implementation specification as defined later in this document.

The privacy implementation should be as simple and clean as possible.
The balance between simplicity and comprehensiveness is a delicate and important matter. Something that is too weak to enable enforcement may cause the entire specification to be ignored, as will something that is too complex to follow.
Privacy information is metadata and should be handled accordingly.
In other words, information about the level of privacy or security or sensitivity of portions of the data layer is data that describes the core data. It should not be mixed-in with the data layer itself.
Implementation should be consistent, complete, and enforceable.
The site owner should be able to define a privacy policy that tag and data management platforms can interpret. The policy should be complete and sufficient. It should be possible to create a system that can parse and interpret privacy settings and can be used, if necessary, to enforce privacy settings.
Implementation and enforcement should be modular and vendor-independent.
If good design practices are followed, a privacy management (definition and enforcement) system should allow systems from different vendors to work together; it should be possible to replace vendors without completely disrupting the client’s installation. Systems should be loosely coupled whenever possible.
The privacy implementation should be complete and extendable.
The specification should provide concrete definitions for primary standard privacy types. Vendors and page designers should have a set of categories that they can use straight-away. But these categories should be able to be added to or broken-down into more detailed sub-categories.

Primary structure for privacy data

If the root object for Customer Experience Digital Data is a top level JavaScript object named digitalData, then all metadata that defines privacy and security will be found under

digitalData.privacy

with the following sub-structures:

digitalData.privacy.types
A structure that defines the different "levels" of privacy available. The default value for this would be digitalData.privacy.types = [ "public", "private", "identifiable", "sensitive" ]; These default categories are defined later in the document, along with examples and information about how to extend these definitions.
digitalData.privacy.accessors
A structure that defines the different sorts of "consumers" or systems that are likely to want access to some or all of the customer digital data. The default value for this would be digitalData.privacy.accessors = [ "advertising", "analytics", "personalization", "social", "system" ]; These default categories are defined later in the document, along with examples and information about how to extend these definitions.
digitalData.privacy.policy
Maps accessors to types to determine what data should be available to any specific accessor or "consumer".
digitalData.privacy.rules
A series of rules that maps the different digitalData.privacy.types to the entire digitalData structure.

Additionally, there are two JavaScript methods that should be used to access all Customer Experience Digital Data:

digitalData.get(<optional> accessor identifier, <optional> data node);
A "getter" method by which a system identifies itself and the receives a copy of the digital data.
digitalData.set(digitalData, <optional> accessor identifier, <optional> data node);
A "setter" method by which a system identifies itself and requests that digital data be set or modified.

It is the W3C recommendation that these two basic APIs (described in detail below) be used to access Customer Experience Digital Data instead of accessing the digitalData.* structure directly. Some vendors may choose to ignore this recommendation, but any system that implements the Customer Experience Digital Data standard may choose to hide the primary data store and mandate the getter and setter be used as federated access methods. (In other words, the digitalData structure may appear to be blank to outside systems.)

More detailed discussion about these data access APIs is provided below.

Privacy Types

Data stored in the digitalData structure can have a wide range of privacy and security concerns. Some information, such as the top-level page identifier, needs to be accessible to all systems and is necessary to create a unified page identification system. On the other extreme, some information such as a credit card number or personally sensitive information such as political affiliation or medical history must be handled carefully and can lead to serious legal repercussions.

This specification defines the following basic "default" roles that must be available for all privacy implementations:

private
Information that alone identifies a site visitor. (e.g. Personally Identifiable Information, or PII)
identifiable
Information that identifies a site visitor in some combination with other Identifiable information. Identifiable is intended to describe the superset of PII that includes non-PII information.
sensitive
Information that is not identifying, but could be sensitive, in that it should not be shared. (e.g. medication, condoms, firearms, etc.) This field would be an attribute added by the site owner, as the same piece of information might be sensitive in certain contexts and not others.
public
Information that has no level of sensitivity and provides information about the web page itself rather than the page visitor. If no privacy level is defined for part of the customer data, it can be assumed to be public. In other words, "public" may be considered equivalent to null.

(needed: examples)

Extending the Privacy Types

For applications where these four basic categories are insufficient, there are two ways to extend the system: adding primary categories and dividing existing categories into sub-categories. Of these two approaches, the addition of new categories is highly recommended. This involves simply adding a new element to the digitalData.privacy.types array.

Sub-Categorization, Option 1

Sub-categorization can be achieved by replacing the parent string with a JavaScript object whose toString() and valueOf() method returns the parent name, and where a subcategories member has a new array of strings that define the possible sub-categories. For example:

var identifiable = { toString : function() { return "identifiable"; },
                     valueOf : function() { return "identifiable"; },
                     subcategories : [ "unique", "anonymous", "segmentation" ] };
digitalData.privacy.types = [ "private", identifiable, "sensitive", "public" ];

Here an expression like identifiable == "identifiable" evaluates to TRUE (important for compatibility with simple vendor implementations) but the additional sub-categorization information is available.

Sub-Categorization, Option 2

If a category can be further broken down into sub-categories, it is represented by a standard JavaScript array instead of a string. The first string will be the name of the "parent" category itself, and subsequent elements will be various sub-categories. For example:

digitalData.privacy.types = [ "private", 
                              [ "identifiable", "unique", "anonymous", "segmentation" ] ,
                              "sensitive",
                              "public" ];

It is important to reiterate that this extension mechanism is highly discouraged, but is mentioned in case a richer privacy definition mechanism is needed that can provide multiple vendor interoperability.

Accessor Types

As described above, the privacy specification requires well-defined categories for "types of privacy" or levels of sensitivity. On the other side of the privacy coin are the types of systems or "data consumers" that may wish to access some or all of the digitalData structure. Some of these systems, like the actual shopping cart system, may need access to some of the data in order for the web page to function. On the other hand, other systems such as marketing and advertising engines may store data about the user and may need to be blocked from sensitive data for legal reasons.

This specification defines the following basic "default" accessor types that must be available for all privacy implementations:

system
Part of the web site's underlying operating system. This system is considered the most trusted and secure and may require access to some or all of the data regardless of privacy type.
analytics
Part of an analytics or testing system that is used to monitor the "health" and usage of the website. None of the data used by this system will be used to target an individual user for marketing purposes, but it may be used to analyze how specific segments of the visitor population interacts with the site.
personalization
Part of a system that will provided "personalized" or targeted content based on user characteristics or stored personal preferences. An accessor that is solely registered as a "personalization" provider may not store and maintain this information for advertising or marketing activities outside of this website. <confirmation of this definition is needed>
social
<definition needed> Information available to this system may be used in a social media context, especially to pre-defined groups of "circles" that the user maintains.
advertising
This accessor may store allowable data, associate it with this user, and provide advertising at a later time and/or on different websites or channels.

Extending the Accessor Types

The mechanism for extending the accessor type definition is the same as the one described for the privacy types. Again, it is highly recommended that a flat array is used for simplicity and interoperability.

Privacy Policy

The "privacy policy" is a simple mapping between privacy types and accessor categories. It is represented by an object literal where the member keys are each of the accessors and the values are arrays of the privacy types that accessor is allowed to access. For example:

digitalData.privacy.policy = { "system" : [ "private","identifiable","sensitive","public" ],
                               "analytics" : [ "identifiable", "public" ],
                               "personalization" : [ "private", "identifiable", "public" ],
                               "social" : [ "public" ],
                               "advertising" : [ ] };

User-configuration of the Privacy Policy

It is very important to note that individual users should be able to have some influence over the strictness of a privacy policy. This may be done by existing mechanisms such as the Do Not Track setting on a browser or some other system settings. This W3C recommendation, however, will not give any specific guidelines or standards as to how this should be achieved. The privacy subcommittee recognizes the importance of this topic and may in the future release recommendations for vendor opt-in or opt-out solutions.

Privacy Mapping

Tying all the above definitions together, the digitalData.privacy.mapping array provides a clean and streamlined way to define the required level of privacy for various portions of the digitalData structure. This system provides a means to allow a privacy definition that is as granular or basic as needed, while staying shot and succinct. It is partially inspired by the CSS rules system.

digitalData.privacy.mapping is an array of rules. Each element in the array is a key/value pair (an object literal) where the key is a string that defines the node or branch of the data structure and the value is one of the defined privacy types and/or one or more of the accessor types. Names preceded by an at-sign (@) are interpreted as accessors instead of privacy types.

digitalData.privacy.mapping = [
    "page" : "public", // describes the page itself
    "product" : "public", // further describes the product associated with the page
    "cart" : "identifiable",
    "transaction" : "sensitive",
    "transaction total" : "private @analytics", // defined as private, but we make an exception for our analytics tools
    "event" : "public",
    "privacy" : "public",  // everyone should be able to see the general privacy definitions and policy
    "privacy mapping" : "@system", // in this example, we decide we don't want to expose the exact policy mapping
    "user" : "identifiable",
    "user segment" : "public"
];

A special type "default" can be used to define the default privacy level for top-level elements that have not been defined. For example, one of the rules could have defined "default" : "analytics" and then the "product" and "event" rules could have been removed.

Specificity

More specific rules will override less specific rules. Hence, in the above example, all sub-elements of "transaction" would have a "sensitive" level (highest privacy) except for the "transaction total" which would be accessible by the analytics system.

Access Control and Privacy Implementation

There are two ways in which this privacy specification can be implemented:

  1. Voluntary "self-policing" of systems. Here a system (an accessor) needs to check the digitalData.privacy structure and determine whether it is appropriate to access a piece of information. In other words, reading data is a two step process: (1) check the privacy settings to see if the data should be accessed and then (2) read the data if appropriate.
  2. Enforced access control. The privacy system is centralized. Here a central piece of logic or privacy delegate will inspect each data request and release only the digitalData items that are allowed. It will either refuse a request from an accessor that is deemed to be unauthorized or it will filter the results, pruning the nodes and sub-branches from the digitalData structure that are not allowed.

The W3C privacy specification does not stipulate which of these approaches will be used. It is recognized that different vendors and website designers/maintainers will have very different requirements. Some websites may entirely forgo the digitalData.privacy system, and for those websites the first choice would be preferable.

However, for those websites that have an actual need for a privacy system, it is strongly suggested that a central enforcement mechanism (the "privacy delegate") by used. Otherwise, the task of verifying that all website sub-systems (the various accessors) operate in a consistent manner will become extremely burdensome on the web site owner.

Regardless of the choice, this specification aims to be only as specific as necessary to create a standard that can be easily implemented by all vendors with a maximum degree of interoperability.

The Accessor Identifier

The Accessor Identifier is an optional (but recommended) parameter that tells the getter and setter methods which accessor is making a request. This W3C specification leaves all details of the accessor identifier to the implementation vendor and/or privacy delegate. Lenient systems may allow some sort of unique "system ID" to be passed as a string or even simply a accessor category name to be given. More sensitive and secure systems may require that the Accessor Identifier be a JavaScript object with specific callable methods that must provide a complex challenge-response algorithm.

By leaving this part of the specification open-ended, the best innovations are encouraged to develop. Any examples later in this document may use a simple accessor string 'for demonstration purposes only'.

Example

Tag Solutions Inc has developed a centralized tag and data management system that will be responsible for the creation and [most of] the population of the digitalData structure. AcmeAnalytics has a website tracking mechanism that wishes to obtain the Customer Experience Digital Data for this web page. The AcmeAnalytics system will use some sort of accessor identifier to identify itself. The site owner will have "registered" AcmeAnalytics with the Tag Solutions system so that it recognizes AcmeAnalytics by the accessor identifier "acme_analytics01" and internally it associates AcmeAnalytics as an "analytics" category of accessor.

AcmeAnalytics makes a request for the digitalData structure via the simple command:

var captureAllDigitalData = digitalData.get("acme_analytics01");

A copy of the entire digitalData structure will be returned, with only those nodes and branches that are accessible to an "analytics" accessor as defined by the digitalData.privacy structure.

Data Node

This string defines the portion of the digitalData object that is being set or requested. The exact syntax is TBD and needs to be reconciled with the Dynamic Data subcommittee.

Example

In the previous example, maybe the AcmeAnalytics system is only interested in the digitalData.page branch. The Data Node would then be requested using a command such as:

var capturePageLevelData = digitalData.get("acme_analytics01", "page");

Setter Method

Some analytics systems or 3rd party segmentation data vendors might actually want to add information to the digitalData structure. This would be done by a similar digitalData.set(data, <optional> accessor id, <optional> data node) method. There are three advantages to mandating this approach:

  1. It provides an enforceable and centralized security and privacy mechanism
  2. It is compatible with the work being done by the Dynamic Data sub-committee
  3. It allows east addition of mechanisms such as data change event notifications or logging

Example

AcmeAnalytics manages information about display banner ads and some online search an on-site search systems. It is responsible for populating (or in some cases overriding) the search and referrer data in the digitalData structure. It will set that information via:

var originationData = { "page" : { "searchterm" : "health food drinks",
                                   "referringURL" : "healthnutdirectory.com" } };
digitalData.set(originationData, "acme_analytics01");

or if we use the optional data node parameter, this could also be done via:

var originationData = { "searchterm" : "health food drinks",  "referringURL" : "healthnutdirectory.com" };
digitalData.set(originationData, "acme_analytics01", "page");

WC3 Specification Option: No Data Node

It may be much simpler to omit the data node from the digitalData.get() method. All accessors would simple retrieve a full copy of the entire digitalData structure and would use and ignore elements as needed. There are two reasons for including the data node:

  1. It would be more compatible with the Dynamic Data Subgroup's work
  2. It might be easier to specify exactly which numbered object (e.g. user[3] or cart.items[3]) if data needs to be added to a member of an array.

Discussion Topic: Advanced Security for Accessor Identifiers

This specification does not give any details about how an Accessor Identifier should work, as this is best left to a data management system's implementation. In fact, the Accessor Identifier does not necessarily have to be a string!

In simple "trusted" situations, an accessor can pass a simple unique identifier such as the "acme_analytics01" ID used in the previous examples. An even more lenient system may simply allow the accessor to pass its accessor type, i.e.

digitalData.get("analytics"); // system just self-reports to be an analytics system.

On the other extreme, a challenge-response approach could be used. For example, the Accessor Identifier may be mandated to be an object that has a "challenge(key)" method that takes any key value and must return the appropriate response.

Example

Our newly secured Tag Solutions system registers AcmeAnalytics and specifies that it will pass some integer into the challenge(number) method, and AcmeAnalytics must perform a binary XOR operation with a special secret value (e.g. 0x3452) and return the result. AcmeAnalytics now access the digitalData structure via:

var accessorId = { toString : function() { return "acmeanalytics_01"; },
                   challenge : function(key) { return key ^ 0x3452; } };
var analyticsData = digitalData.get(accessorId);

This entire challenge system is simply provided as an example in order to demonstrate that innovative privacy delegates can be designed in a way that provides any level of desired robustness. By keeping this part of the W3C specification open, the most innovative ideas may be encouraged.

Access and Implementation Recommendations

The W3C Customer Experience Digital Data Privacy Specification consists of everything above this point of the page and nothing more. This section discusses some recommendations for access and implementation that, if followed, should lead to a consistent and secure system.

Accessing Data

The entire W3C Customer Experience Digital Data Privacy Specification is optional. Some customer data management systems will opt to exclude any privacy or security mechanisms and, for the sake of simplicity, will exclude any digitalData.privacy components and will expose the digitalData structure as a nested JavaScript object as described in the core spec.

Vendors that access and modify the digitalData structure are strongly encouraged to first check for digitalData.get and digitalData.set methods and use them if they are present. E.g.

var capturePageId = (digitalData.get ? digitalData.get(myAccessorId) : digitalData).page.pageID;

Providing Default Accessors

In order to encourage the use of the getter and setter accessors, it is strongly recommended that the data provider provide basic "stub" implementations, even if no actual privacy subsystem is going to be used. There are two examples of how this could be done. The simplest example would be:

digitalData.get = function() { return this; }

Returning Copies of Digital Data

It is a best practice to return a copy or "clone" of the digitalData structure. Returning actual elements of the data structure is discouraged because these "pointers" allow the accessor to potentially change the original data structure.

Hiding the Original Data

If accessors are provided getters and setters to access the Customer Experience Digital Data, the actual data should be protected. In other words, digitalData.page should evaluate to NULL. It is a common misconception that the JavaScript language cannot support "private" data. In fact, it is relatively easy to hide data in a way that is impossible for any other JavaScript code to execute.

var digitalData = { };
(function(){
  var privateData = { page : { pageID : "uniqueID", pageName : "thePageName" } };
  digitalData.get = function(caller) {
    return JSON.parse(JSON.stringify(privateData)); // uses JSON library to make a deep copy.
  };
})();

This example stores data in a "privateData" object that is completely inaccessible except through the digitalData.get method, and even then only a deep copy of the data is returned.

Using Inversion of Control (IoC) and Delegate patterns to centralize privacy and security

If vendors and implementors adhere to a Delegate Pattern to centralize privacy and security, it will enable a modular design process that should simplify development and enable the creation of 3rd party privacy implementation and enforcement "modules" that vendors could use interchangeably. (This could/should include an Open Source reference implementation.)

The idea is simple: the privacy delegate will be a function that is passed two parameters: (1) the accessor type of the system requesting Customer Experience Digital Data and (2) the complete clone of the original digitalData that needs to be "pruned" of sensitive and unauthorized components. The delegate will handle all logic necessary to prune the unauthorized components, leaving the data that is safe for use by the requesting accessor.

The privacy delegate can then be "attached" to the digitalData system and potentially replaced with a different system at a later date.

In a similar and related manner, a "security delegate" could be incorporated to determine if the stated accessor is allowed to access the data at all. Although security and privacy delegates might seem to have slightly overlapping functions, the idea of a security delegate may be more appropriate for the digitalData.set operation, as a means of protecting an untrusted system from tampering with the digitalData.

A complete example of the privacy and security delegates in action can be seen on this web page. Visitors are encouraged to look at the source code of the accessor.js JavaScript file to see how the mechanisms work. The example is not only secure, but it is even tamper-proof, in that the digitalData system deletes the original references to the delegates after they are attached. A web page that defines the delegates immediately before the digitalData object creates a completely closed system.