RE: URLS/scoring

Shane,

 

Many thanks for answering our questions but to help me understand I need  to
ask a few more.

 

1a) Under the DAA proposal can  a first party pass on UID+URL to a
third-party who then creates an aggregate score and adds it to a dataset?
This would, I assume, be "not tracking"  and therefore the third party could
have collected the data themselves irrespective of DNT. Could this be done
using a frame element so the third party could also add its own UID to the
dataset?

 

1b) Would the data collected above then be out of scope so the third party
could deliver OBA (retargeting) to the user when they visited another
website, again irrespective of DNT?

 

2) In EU terms could a service-provider, referenced in the first party
compliance paragraph,  be a Data Controller, either on their own or jointly
with the first party?

 

3) Could the retention of a cryptographic hash of a subsequently discarded
URL also be "not tracking"?

 

Mike

 

 

From: Shane Wiley [mailto:wileys@yahoo-inc.com] 
Sent: 11 July 2013 13:04
To: Jules Polonetsky; Paul Ohm
Cc: Jonathan Mayer; public-tracking@w3.org
Subject: RE: URLS/scoring

 

Jules,

 

Retargeting is a function of a Service Provider - not expressly cross-site
activity as in behavioral advertising - in this context.  So the data is
collected and used only on the behalf of the 1st party.

 

- Shane

 

From: Jules Polonetsky [mailto:julespol@futureofprivacy.org] 
Sent: Thursday, July 11, 2013 12:54 PM
To: Paul Ohm
Cc: Shane Wiley; Jonathan Mayer; public-tracking@w3.org
Subject: Re: URLS/scoring

 

So retargeting is restricted here, since it often indicates a visit to one
URL or a limited # of designated URLs?

(Apologies if I missed that amidst the flurry)


Jules Polonetsky 

Facebook.com/FutureofPrivacy

@JulesPolonetsky

 


On Jul 11, 2013, at 5:48 AM, Paul Ohm <paul.ohm@colorado.edu> wrote:

Would it be considered tracking if a particular cookie was scored high in
the category, "visited one of these two particular URLs," because it could
not possibly be reverse engineered to a single URL? 

And if "visited one of these two particular URLs" is some how considered
tracking, what in any of the various draft spec texts leads us to that
conclusion?

And if "visited one of these two particular URLs" is considered tracking,
what about "visited one of these ten particular URLs"? Or "visited one of
these 100 particular URLs"?

In other words, is there a k-anonymity floor operating here? If so, what is
k?

On 7/10/2013 5:55 PM, Shane Wiley wrote:

Fair point Jonathan - and something I had expected we'd be able to provide
more clarity around in non-normative text.  The center point **text** is the
definition of Tracking.  As long as the resulting transformation to the ID
or the URL was something that could not be reverse engineered back to the
original ID and/or URL, then I would defend this as the information no
longer resulting in tracking.

 

For example, if a collected activity for cookie ID 1234 was obfuscated to a
single letter, then we'd have 26 possible buckets with no way of linking a
single aggregated result to an actual URL.

 

Cookie ID 1234,
http://www.carmaker.com/2013/trucks/sportedition.html?username=Shane

-becomes-

Cookie ID 1234, "c", 1

 

Similarly.

 

Cookie ID 1234,
http://www.candlesplus.com/aromacenter/vaniall.php?account_id=Wiley 

-becomes-

Cookie ID 1234, "c", 2

 

While difficult to predefine in technical terms, as long as the resulting
"aggregate" doesn't allow for reverse engineering back to the actual event,
then tracking is not occurring.

 

ROT13 doesn't work (character rotation of 13 places) as this can be reverse
engineered directly and wouldn't be able to be contained through
administrative and operational controls.  That's why we've recommended
something more significant such as keyed/secret hash where the key is
further contained from access outside of automated routines - aka, humans -
as a more reasonable option (but there could be others that meet the same
goal).

 

- Shane    

 

From: Jonathan Mayer [mailto:jmayer@stanford.edu] 
Sent: Wednesday, July 10, 2013 11:55 PM
To: Shane Wiley
Cc: Lauren Gelman; Peter Swire; Justin Brookman; Rob van Eijk; Mike O'Neill;
public-tracking@w3.org
Subject: Re: URLS/scoring

 

Shane, 

 

Could you please identify the **text** that limits these exceptions from
"tracking"?  Once a URL is altered to something other than a plaintext URL
(e.g. applying ROT13), why is it still "tracking"?

 

Thanks,

Jonathan

 

On Wednesday, July 10, 2013 at 3:34 PM, Shane Wiley wrote:

Lauren,

 

I'm not following your "translation from English to Spanish" example as for
the Aggregate Scoring approach would be more akin to summarizing English
into basic sounds - of which could be attributed to any number of words but
in of themselves does not reveal the actual word the sound belongs to.

 

- Shane

 

From: Lauren Gelman [mailto:gelman@blurryedge.com] 
Sent: Wednesday, July 10, 2013 7:47 PM
To: Peter Swire
Cc: Jonathan Mayer; Shane Wiley; Justin Brookman; Rob van Eijk; Mike
O'Neill; public-tracking@w3.org
Subject: Re: URLS/scoring

 

 

The change proposed to limit the definition of tracking to URLs is
extraordinary.

 

Business works this way anyway-- URLS are translated into segments and
people are characterized using those. Segments and profiles are augmented
and targeted to.  Not lists of URLs 

 

I thought it was crazy a year ago when the compromise was made for DNT:1 to
permit collecting of information, in order to accommodate (IMHO broad)
permitted uses.  If collection is permitted in order to allow the business
to translate the URL into a segment, the exception has indeed, finally,
swallowed the rule.  

 

Allowing aggregate scoring is just like translating english URLs to spanish
and then saying the spanish ones are out of scope.  It ignores the fact that
if you collect multiple data points about a unique identifier, you can
eventually determine it's personal characteristics.  There's no reason that
is limited to URLS, but applies equally to any translated characteristics.

 

Lauren Gelman

@laurengelman

BlurryEdge Strategies
415-627-8512

 

On Jul 10, 2013, at 11:14 AM, Peter Swire wrote:

 

Please correct me if I'm wrong.

 

My understanding is that "aggregate scoring" is not "tracking."

 

It therefore does not qualify either as "de-identified" or "de-linked."  It
is outside the scope of DNT under the DAA proposal.

 

Peter

 

 

 

Prof. Peter P. Swire

C. William O'Neill Professor of Law

                Ohio State University

240.994.4142

www.peterswire.net

 

Beginning August 2013:

Nancy J. and Lawrence P. Huang Professor

Law and Ethics Program

Scheller College of Business

Georgia Institute of Technology

 

 

From: Jonathan Mayer <jmayer@stanford.edu>
Date: Wednesday, July 10, 2013 12:40 PM
To: Shane Wiley <wileys@yahoo-inc.com>
Cc: Justin Brookman <jbrookman@cdt.org>, Rob van Eijk <rob@blaeu.com>, Mike
O'Neill <michael.oneill@baycloud.com>, "public-tracking@w3.org"
<public-tracking@w3.org>
Subject: Re: URLS/scoring
Resent-From: <public-tracking@w3.org>
Resent-Date: Wednesday, July 10, 2013 12:40 PM

 

Shane, 

 

Could you please explain where "Aggregate Scoring" would land in the DAA
proposal?  Is it "de-identified" data?  "Unlinked" data?

 

Thanks,

Jonathan

 

On Wednesday, July 10, 2013 at 9:11 AM, Shane Wiley wrote:

Justin,

 

It was my hope to add this as non-normative text as Aggregate Scoring is one
example of "not tracking" and we've been focused on normative text at this
point so that's why it's not included.

 

- Shane

 

From: Justin Brookman [mailto:jbrookman@cdt.org] 
Sent: Wednesday, July 10, 2013 4:40 PM
To: Rob van Eijk
Cc: Mike O'Neill; Shane Wiley; public-tracking@w3.org
Subject: Re: URLS/scoring

 

I had heard the idea floated in Sunnyvale (and before) but it was only
presented as a possibility --- in any event, scoring certainly ran counter
to the previous requirements in the compliance standard.  Mike Zaneis's
comments last week were the first time I thought I understood that the trade
associations were proposing that OBA/retargeting be allowed when DNT is
turned on.  And in any event, prior discussions are not really relevant ---
I'm just trying to figure out concretely what is on the table as far as the
DAA proposed DNT standard.

 

Jack's proposed revision of the definition of tracking helped me (I think)
to understand what is being offered, but I was just trying to flesh it out.
People keep referencing "scoring," but that term is neither defined nor used
in any of the proposals.

 

On Jul 10, 2013, at 11:33 AM, Rob van Eijk <rob@blaeu.com> wrote:

 

Justin, currently aggregated scoring happens parallel from R-Y-G, and is not
part of the proposal. In Santa Clara Shane made it clear that all users,
regardless of DNT will be subject to aggregated scoring. Only an opt-out
cookie MAY prevent this collection, use and sharing.

Rob

Justin Brookman <jbrookman@cdt.org> wrote:

To be clear, I do not believe that the term "aggregate scoring" appears
either in the original DAA proposal or the amendments that Jack sent around
yesterday.  As I currently think I understand the proposal, when DNT:1 is
turned on, a third party may not use/retain the specific url/domain for OBA
(or other non-permitted purposes), but they may use/retain any derived
information about the url.

 

So an ad network may not retain/use the fact that I visited zappos.com/32145
for OBA (or other non-permitted purposes) but they may retain/use/sell/do
anything with a characterization of my unique ID as "interested in
shopping," "interested in shoes," or "interested in the Nike Pro Attack in
blue and green."  The unique ID could be a cookie, an email address, a name,
or anything else.

 

Justin Brookman
Director, Consumer Privacy
Center for Democracy & Technology
tel 202.407.8812
justin@cdt.org
http://www.cdt.org <http://www.cdt.org/> 
@JustinBrookman
@CenDemTech

 

On Jul 10, 2013, at 11:15 AM, "Mike O'Neill" <michael.oneill@baycloud.com>
wrote:

 

[Keep ID, Remove URL = Aggregate Scoring] is a null

 

Because the individual is still profiled and their web activity can continue
to be appended to the profile

 

 

 

[Remove ID, Keep URL]  is a null

 

Because a) PII might be in URLs.

 

                 b) In reality ID has been replaced with an equivalent,
though different,  ID' so web activity can continue to be appended.

 

 

From: Shane Wiley [mailto:wileys@ <http://yahoo-inc.com/> yahoo-inc.com] 
Sent: 10 July 2013 15:42
To: Mike O'Neill
Cc:  <mailto:public-tracking@w3.org> public-tracking@w3.org
Subject: RE: issue-199

 

 

Mike,

 

 

I support verifiability but am challenged with technical mechanisms to allow
this without breaking corporate confidentiality concerns.  This is why I
call it out as an area for future development to help build solutions to
this unique problem.

 

 

I've tried breaking the proposal down to the simplest form I can think of.
Let me know if this makes it more clear:

 

 

-----

 

If Tracking = ID + URLs, then Not Tracking = ID <> URL

 

 

Keep ID, Remove URL Aggregate Scoring

 

Remove ID, Keep URL De-Identification

 

 

Remove ID, Remove URL De-Identification + De-Linking  (now out of scope of
DNT)

 

-----

 

 

- Shane

 

 

From: Mike O'Neill [ <mailto:michael.oneill@baycloud.com>
mailto:michael.oneill@baycloud.com] 
Sent: Wednesday, July 10, 2013 3:10 PM
To: Shane Wiley
Cc:  <mailto:public-tracking@w3.org> public-tracking@w3.org
Subject: RE: issue-199

 

 

Shane,

 

 

I have not missed key points, and know the DAA proposals mean continued
profiling, just think that needs to be made clear. Perhaps you could give an
example where applying a hash to a UID would be useful.

 

 

There is not much difference between the retention of a profile ba! sed on
algorithmically examining a web history and the actual web history itself.
Both can be a basis for discrimination.

 

 

My point about verifiability is that without it, with only administrative
and operation controls, there will be inevitably be demands for intrusive
regulation, which will not be good for industry. Verifiability is in fact
quite easy to ensure if tracking is constrained to cookies or even
localStorage, and that is all the more reason to rule out tracking by other
means such as fingerprinting.

 

 

Mike

 

 

 

From: Shane Wiley [ <mailto:wileys@yahoo-inc.com>
mailto:wileys@yahoo-inc.com] 
Sent: 10 July 2013 14:36
To: Mike O'Neill
Cc:  <mailto:public-tracking@w3.org> public-tracking@w3.org
Subject: RE: issue-199

 

 

Mike,

 

 

Perhaps you've not been on the calls as I believe you've missed a few of the
key points of this discussion.  I won't be able to provide a full recount
via email but I'll try to hit the high points for you:

 

 

1.      It's understood obfuscation comes with some risk and will need to be
bundled with operational and administrative controls to reach a reasonable
confidence that data will not reverse engineered.  For example, data in the
yellow state is not shared publically and/or with parties where you don'! t
feel could protect the security of its composition.  While we've agreed on
transparency in this area - no one has requested external verifiability to
date which I believe would be somewhat impossible as a starting point.
Perhaps something to work on as a future goal (I believe the EFF would also
be interested in innovating techniques in this area - is that fair Lee?).

 

2.      Agg! regate scoring will result in a profile.  The proposal does not
attempt to remove this concept but instead to ensure the result doesn't
include a user's historical cross-site activity.  This should not be
confused with de-identification and instead is simply another method to meet
the goal of "not tracking".

 

 

- Shane

 

 

From: Mike O'Neill [ <mailto:michael.oneill@baycloud.com>
mailto:michael.oneill@baycloud.com] 
Sent: Wednesday, July 10, 2013 2:02 PM
To: Shane Wiley
Cc:  <mailto:public-tracking@w3.org> public-tracking@w3.org
Subject: RE: issue-199

 

 

Shane,

 

 

As an example of why this "obfuscation" is pointless let it be a simple
substitution cypher! so my UID (which happens to be "123456") is turned into
"987654". If I visit a website containing a reference to  <http://adco.com/>
adco.com that server recognises me because the UID contains "123456" and
builds up a profile about me. They apply the transform to the UID and always
get the unique value  "987654". which is stored in the profiling dataset.
When I visit other websites that also contain references to
<http://adco.com/> adco.com the same process is repeated and my web activity
is appended to the dataset, again using "987654" as a key.

 

 

It makes no difference how complex  the UID transformation  is, as long as
it is 1to1.

 

 

Under the "DAA proposal" rules there is absolutely no diminution of adco's
ability to profile me.

 

 

If another party gets hold of the dataset they can also see my profile,
though not my original UID. If further records are shared they can be
connected  to me by this other party because they have the same "987654"
UID. They may not be able to connect records containing "123456" to me
(unless they can crack the cypher or are given the key) but what would be
the point? If they have access to those data records they can already
profile me anyway.

 

 

If activity data in the dataset, collected with my consent, contains other
PII about me, such as my name, post code, website history etc.  they should
obfuscate that, perhaps using one way hash functions or aggregated scoring
algorithms. Since these datasets are a valuable corporate asset you would
expect them to be doing that anyway, but in any case that is legally
required in the EU.

 

 

As the Snowden revelations have highlighted "operational and administrative
controls" need to be closely monitored. In the case of security services
this can be (has to be) through impeccable judicial process under democratic
oversight. This would not be appropriate for commercial companies in a
competitive environment, so transparent technical procedures are necessary.

 

 

The "yellow" state should be recognisable to users and others though
inspection of user agent data or web logs.

 

 

Mike

 

 

 

From: Shane Wiley [ <mailto:wileys@yahoo-inc.com>
mailto:wileys@yahoo-inc.com] 
Sent: 10 July 2013 12:14
To: Mike O'Neill
Cc:  <mailto:public-tracking@w3.org> public-tracking@w3.org
Subject: RE: issue-199

 

 

Mike,

 

 

I respectfully disagree.  Obfuscating the ID breaks the association with the
actual user/device.  That said, I agree this has the risk of being reversed
so a blend of technical, operational, and administrative controls must be
brought to bear to keep this from occurring.

 

 

De-identification doesn't allow for profiling in a manner that could affect
a user's experience (no way to get back to the user). 

 

 

Do Not Track can be achieved by breaking the link between a unique ID a! nd
cross-site activity (URLs) - and this could result in a profile of the
user's interest resulting from aggregate scoring - but this would not allow
a user's historical activity to be retrieved.

 

 

- Shane

 

 

From: Mike O'Neill [ <mailto:michael.oneill@baycloud.com>
mailto:michael.oneill@baycloud.com] 
Sent: Wednesday, July 10, 2013 11:55 AM
To: Shane Wiley
Cc:  <mailto:public-tracking@w3.org> public-tracking@w3.org
Subject: RE: issue-199

 

 

Hi Shane,

 

 

How can it be possible to remove the association between a device and a UID
other than deleting it or ensuring it is deleted by the UA after a short
duration. If the UID is there (and present in every tran! sport level
request if it is in a cookie) it uniquely points to the device where it is
stored or derived. This identity is available to the receiving server as
well as any actor with similar access to the data stream or the same
document origin.

 

 

If you transform the UID in retained data by setting it to another UID (say
by using a hash function), this does not break the association because there
is a 1to1 mapping. There is no practical point in doing it.

 

 

De-identified data can only be classed as such if there is no linkage. The
"yellow" state can be imagined as an intermediate stage before
de-identification but is only relevant for permitted uses (such as the
detection of unique visitors for analytics or frequency capping), and there
is no need for it to exist for more than a few hours.

 

 

If we end up defining de-identified as including the ability to link
individuals to a profile it would be a travesty, and people will see through
it. The arms race has already started with an explosion of blunt cookie and
script blockers. If there is not a sensible response to people's real
privacy concerns the usefulness of the web (and consequently the
profitability of many business models) will be severely diminished.

 

 

Mike

 

 

 

From: Shane Wiley [ <mailto:wileys@yahoo-inc.com>
mailto:wileys@yahoo-inc.com] 
Sent: 09 July 2013 19:30
To: Mike O'Neill; 'achapell';  <mailto:npdoty@w3.org> npdoty@w3.org;
<mailto:tlr@w3.org> tlr@w3.org
Cc:  <mailto:public-tracking@w3.org> public-tracking@w3.org;
<mailto:jeff@democraticmedia.org> jeff@democraticmedia.org
Subject: RE: issue-199

 

 

Mike,

 

 

Deidentification is about removing the association between a unique ID (any
source:  cookie, digital fingerprint, etc.) and the actual/specific
user/device.  In this context:

 

 

Red:  actual user/device

 

Yellow:  not actual user/device but events are linkable (and only usable for
analytics/reporting)

 

Green:  not actual user/device and events are not linkable (outside the
scope of DNT)

 

 

- Shane

 

 

From: Mike O'Neill [ <mailto:michael.oneill@baycloud.com>
mailto:michael.oneill@baycloud.com] 
Sent: Sunday, June 30, 2013 3:01 PM
To: 'achapell';  <mailto:npdoty@w3.org> npdoty@w3.org;  <mailto:tlr@w3.org>
tlr@w3.org
Cc:  <mailto:public-tracking@w3.org> public-tracking@w3.org;
<mailto:jeff@democraticmedia.org> jeff@democraticmedia.org
Subject: RE: issue-199

 

 

Alan,

 

 

Persistent identifiers and their duration should be discussed as part of the
red/yellow/green permitted use debate. Browser fingerprinting identifiers
are qualitatively different from those stored in cookies or localStorage
because they are effectively infinite in duration, so I thought it best to
extend the defs. to make that clear.

 

 

 

Mike

 

 

 

From: achapell [ <mailto:achapell@chapellassociates.com>
mailto:achapell@chapellassociates.com] 
Sent: 30 June 2013 22:39
To:  <mailto:michael.oneill@baycloud.com> michael.oneill@baycloud.com;
<mailto:npdoty@w3.org> npdoty@w3.org;  <mailto:tlr@w3.org> tlr@w3.org
Cc:  <mailto:public-tracking@w3.org> public-tracking@w3.org;
<mailto:jeff@democraticmedia.org> jeff@democraticmedia.org
Subject: RE: issue-199

 

 

Do we want to specify technologies here?  

 

 

 

Cheers,

Alan Chapell
917 318 8440

 




-------- Original message --------
From: Mike O'Neill < <mailto:michael.oneill@baycloud.com>
michael.oneill@baycloud.com> 
Date: 06/30/2013 3:33 PM (GMT-05:00) 
To: Nicholas Doty < <mailto:npdoty@w3.org> npdoty@w3.org>,
<mailto:tlr@w3.org> tlr@w3.org 
Cc:  <mailto:public-tracking@w3.org,jeff@democraticmedia.org>
public-tracking@w3.org,jeff@democraticmedia.org 
Subject: issue-199

 

Nick, Thomas

 

Dr Dix's letter reminded me that we need to have some reference to browser
fingerprinting being ruled out when DNT is set. I have amended the
definitions accordingly.

 

Do you want me to modify the wiki?

 

 

 

A persistent identifier is an arbitrary value held in, or derived from o!
ther data in, the user agent whose purpose is to identify the user agent in
subsequent transactions to a particular web domain. It may be encoded for
example as the name or value attribute of an HTTP cookie, as an item in
localStorage or recorded in some way in the cache.

 

The duration of a persistent identifier is the maximum period of time it
will be retained in the user agent. This could be implemented for example
using the Expires or Max-Age attributes of an HTTP cookie so that it is
automatically deleted by the user agent after the specified time period is
exceeded.

 

Browser fingerprinting!  is a method of tracking based on creating a
persistent identifier from other information either inherent in the content
request or already stored in the user agent. Such an identifier may not need
itself to be stored in the user-agent as it can be calculated again in
subsequent transactions. It follows from this that its duration is
effectively unlimited.

 

Justification.

 

With the duration definition, restrictions on permitted uses could then be
made that limit the duration of persistent identifiers. Because browser
fingerprinting cannot! be given a finite duration this tracking method
should not be used when DNT is set even if it is for a permitted use. In
reality browser fingerprinting solely based on examining initial content
requests is usually not an effective tracking method because the combination
of IP addresses and other headers are not sufficiently user specific, but we
should rule out at least the more complex form when DNT is set.

 

Mike

 

 

 

 

 

 

 

Received on Thursday, 11 July 2013 12:57:24 UTC