Re: URLS/scoring

Would it be considered tracking if a particular cookie was scored high 
in the category, "visited one of these two particular URLs," because it 
could not possibly be reverse engineered to a single URL?

And if "visited one of these two particular URLs" is some how considered 
tracking, what in any of the various draft spec texts leads us to that 
conclusion?

And if "visited one of these two particular URLs" is considered 
tracking, what about "visited one of these ten particular URLs"? Or 
"visited one of these 100 particular URLs"?

In other words, is there a k-anonymity floor operating here? If so, what 
is k?

On 7/10/2013 5:55 PM, Shane Wiley wrote:
>
> Fair point Jonathan – and something I had expected we’d be able to 
> provide more clarity around in non-normative text.  The center point 
> **text** is the definition of Tracking.  As long as the resulting 
> transformation to the ID or the URL was something that could not be 
> reverse engineered back to the original ID and/or URL, then I would 
> defend this as the information no longer resulting in tracking.
>
> For example, if a collected activity for cookie ID 1234 was obfuscated 
> to a single letter, then we’d have 26 possible buckets with no way of 
> linking a single aggregated result to an actual URL.
>
> Cookie ID 1234, 
> http://www.carmaker.com/2013/trucks/sportedition.html?username=Shane
>
> -becomes-
>
> Cookie ID 1234, “c”, 1
>
> Similarly…
>
> Cookie ID 1234, 
> http://www.candlesplus.com/aromacenter/vaniall.php?account_id=Wiley
>
> -becomes-
>
> Cookie ID 1234, “c”, 2
>
> While difficult to predefine in technical terms, as long as the 
> resulting “aggregate” doesn’t allow for reverse engineering back to 
> the actual event, then tracking is not occurring.
>
> ROT13 doesn’t work (character rotation of 13 places) as this can be 
> reverse engineered directly and wouldn’t be able to be contained 
> through administrative and operational controls. That’s why we’ve 
> recommended something more significant such as keyed/secret hash where 
> the key is further contained from access outside of automated routines 
> – aka, humans – as a more reasonable option (but there could be others 
> that meet the same goal).
>
> - Shane
>
> *From:*Jonathan Mayer [mailto:jmayer@stanford.edu]
> *Sent:* Wednesday, July 10, 2013 11:55 PM
> *To:* Shane Wiley
> *Cc:* Lauren Gelman; Peter Swire; Justin Brookman; Rob van Eijk; Mike 
> O'Neill; public-tracking@w3.org
> *Subject:* Re: URLS/scoring
>
> Shane,
>
> Could you please identify the **text** that limits these exceptions 
> from "tracking"?  Once a URL is altered to something other than a 
> plaintext URL (e.g. applying ROT13), why is it still "tracking"?
>
> Thanks,
>
> Jonathan
>
> On Wednesday, July 10, 2013 at 3:34 PM, Shane Wiley wrote:
>
>     Lauren,
>
>     I’m not following your “translation from English to Spanish”
>     example as for the Aggregate Scoring approach would be more akin
>     to summarizing English into basic sounds – of which could be
>     attributed to any number of words but in of themselves does not
>     reveal the actual word the sound belongs to.
>
>     - Shane
>
>     *From:*Lauren Gelman [mailto:gelman@blurryedge.com]
>     *Sent:* Wednesday, July 10, 2013 7:47 PM
>     *To:* Peter Swire
>     *Cc:* Jonathan Mayer; Shane Wiley; Justin Brookman; Rob van Eijk;
>     Mike O'Neill; public-tracking@w3.org <mailto:public-tracking@w3.org>
>     *Subject:* Re: URLS/scoring
>
>     The change proposed to limit the definition of tracking to URLs is
>     extraordinary.
>
>     Business works this way anyway-- URLS are translated into segments
>     and people are characterized using those. Segments and profiles
>     are augmented and targeted to.  Not lists of URLs
>
>     I thought it was crazy a year ago when the compromise was made for
>     DNT:1 to permit collecting of information, in order to accommodate
>     (IMHO broad) permitted uses.  If collection is permitted in order
>     to allow the business to translate the URL into a segment, the
>     exception has indeed, finally, swallowed the rule.
>
>     Allowing aggregate scoring is just like translating english URLs
>     to spanish and then saying the spanish ones are out of scope.  It
>     ignores the fact that if you collect multiple data points about a
>     unique identifier, you can eventually determine it's personal
>     characteristics.  There's no reason that is limited to URLS, but
>     applies equally to any translated characteristics.
>
>     Lauren Gelman
>
>     @laurengelman
>
>     BlurryEdge Strategies
>     415-627-8512
>
>     On Jul 10, 2013, at 11:14 AM, Peter Swire wrote:
>
>     Please correct me if I'm wrong.
>
>     My understanding is that "aggregate scoring" is not "tracking."
>
>     It therefore does not qualify either as "de-identified" or
>     "de-linked."  It is outside the scope of DNT under the DAA proposal.
>
>     Peter
>
>     Prof. Peter P. Swire
>
>     C. William O'Neill Professor of Law
>
>     Ohio State University
>
>     240.994.4142
>
>     www.peterswire.net <http://www.peterswire.net>
>
>     Beginning August 2013:
>
>     Nancy J. and Lawrence P. Huang Professor
>
>     Law and Ethics Program
>
>     Scheller College of Business
>
>     Georgia Institute of Technology
>
>     *From: *Jonathan Mayer <jmayer@stanford.edu
>     <mailto:jmayer@stanford.edu>>
>     *Date: *Wednesday, July 10, 2013 12:40 PM
>     *To: *Shane Wiley <wileys@yahoo-inc.com <mailto:wileys@yahoo-inc.com>>
>     *Cc: *Justin Brookman <jbrookman@cdt.org
>     <mailto:jbrookman@cdt.org>>, Rob van Eijk <rob@blaeu.com
>     <mailto:rob@blaeu.com>>, Mike O'Neill <michael.oneill@baycloud.com
>     <mailto:michael.oneill@baycloud.com>>, "public-tracking@w3.org
>     <mailto:public-tracking@w3.org>" <public-tracking@w3.org
>     <mailto:public-tracking@w3.org>>
>     *Subject: *Re: URLS/scoring
>     *Resent-From: *<public-tracking@w3.org
>     <mailto:public-tracking@w3.org>>
>     *Resent-Date: *Wednesday, July 10, 2013 12:40 PM
>
>     Shane,
>
>     Could you please explain where "Aggregate Scoring" would land in
>     the DAA proposal?  Is it "de-identified" data?  "Unlinked" data?
>
>     Thanks,
>
>     Jonathan
>
>     On Wednesday, July 10, 2013 at 9:11 AM, Shane Wiley wrote:
>
>         Justin,
>
>         It was my hope to add this as non-normative text as Aggregate
>         Scoring is one example of “not tracking” and we’ve been
>         focused on normative text at this point so that’s why it’s not
>         included.
>
>         - Shane
>
>         *From:*Justin Brookman [mailto:jbrookman@cdt.org]
>         *Sent:* Wednesday, July 10, 2013 4:40 PM
>         *To:* Rob van Eijk
>         *Cc:* Mike O'Neill; Shane Wiley; public-tracking@w3.org
>         <mailto:public-tracking@w3.org>
>         *Subject:* Re: URLS/scoring
>
>         I had heard the idea floated in Sunnyvale (and before) but it
>         was only presented as a possibility --- in any event, scoring
>         certainly ran counter to the previous requirements in the
>         compliance standard.  Mike Zaneis's comments last week were
>         the first time I thought I understood that the trade
>         associations were proposing that OBA/retargeting be allowed
>         when DNT is turned on.  And in any event, prior discussions
>         are not really relevant --- I'm just trying to figure out
>         concretely what is on the table as far as the DAA proposed DNT
>         standard.
>
>         Jack's proposed revision of the definition of tracking helped
>         me (I think) to understand what is being offered, but I was
>         just trying to flesh it out.  People keep referencing
>         "scoring," but that term is neither defined nor used in any of
>         the proposals.
>
>         On Jul 10, 2013, at 11:33 AM, Rob van Eijk <rob@blaeu.com
>         <mailto:rob@blaeu.com>> wrote:
>
>
>
>         Justin, currently aggregated scoring happens parallel from
>         R-Y-G, and is not part of the proposal. In Santa Clara Shane
>         made it clear that all users, regardless of DNT will be
>         subject to aggregated scoring. Only an opt-out cookie MAY
>         prevent this collection, use and sharing.
>
>         Rob
>
>         Justin Brookman <jbrookman@cdt.org <mailto:jbrookman@cdt.org>>
>         wrote:
>
>         To be clear, I do not believe that the term "aggregate
>         scoring" appears either in the original DAA proposal or the
>         amendments that Jack sent around yesterday.  As I currently
>         think I understand the proposal, when DNT:1 is turned on, a
>         third party may not use/retain the specific url/domain for OBA
>         (or other non-permitted purposes), but they may use/retain any
>         derived information about the url.
>
>         So an ad network may not retain/use the fact that I visited
>         zappos.com/32145 <http://zappos.com/32145> for OBA (or other
>         non-permitted purposes) but they may retain/use/sell/do
>         anything with a characterization of my unique ID as
>         "interested in shopping," "interested in shoes," or
>         "interested in the Nike Pro Attack in blue and green."  The
>         unique ID could be a cookie, an email address, a name, or
>         anything else.
>
>         Justin Brookman
>         Director, Consumer Privacy
>         Center for Democracy & Technology
>         tel 202.407.8812
>         justin@cdt.org <mailto:justin@cdt.org>
>         http://www.cdt.org <http://www.cdt.org/>
>         @JustinBrookman
>         @CenDemTech
>
>         On Jul 10, 2013, at 11:15 AM, "Mike O'Neill"
>         <michael.oneill@baycloud.com
>         <mailto:michael.oneill@baycloud.com>> wrote:
>
>
>
>         [Keep ID, Remove URL = Aggregate Scoring] is a null
>
>         Because the individual is still profiled and their web
>         activity can continue to be appended to the profile
>
>         [Remove ID, Keep URL]  is a null
>
>         Because a) PII might be in URLs.
>
>         b) In reality ID has been replaced with an equivalent, though
>         different,  ID’ so web activity can continue to be appended.
>
>         *From:* Shane Wiley [mailto:wileys@yahoo-inc.com
>         <http://yahoo-inc.com/>]
>         *Sent:* 10 July 2013 15:42
>         *To:* Mike O'Neill
>         *Cc:* public-tracking@w3.org <mailto:public-tracking@w3.org>
>         *Subject:* RE: issue-199
>
>         Mike,
>
>         I support verifiability but am challenged with technical
>         mechanisms to allow this without breaking corporate
>         confidentiality concerns.  This is why I call it out as an
>         area for future development to help build solutions to this
>         unique problem.
>
>         I’ve tried breaking the proposal down to the simplest form I
>         can think of.  Let me know if this makes it more clear:
>
>         -----
>
>         If Tracking = ID + URLs, then Not Tracking = ID <> URL
>
>         Keep ID, Remove URL = Aggregate Scoring
>
>         Remove ID, Keep URL = De-Identification
>
>         Remove ID, Remove URL = De-Identification + De-Linking  (now
>         out of scope of DNT)
>
>         -----
>
>         - Shane
>
>         *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com]
>         *Sent:* Wednesday, July 10, 2013 3:10 PM
>         *To:* Shane Wiley
>         *Cc:* public-tracking@w3.org <mailto:public-tracking@w3.org>
>         *Subject:* RE: issue-199
>
>         Shane,
>
>         I have not missed key points, and know the DAA proposals mean
>         continued profiling, just think that needs to be made clear.
>         Perhaps you could give an example where applying a hash to a
>         UID would be useful.
>
>         There is not much difference between the retention of a
>         profile ba! sed on algorithmically examining a web history and
>         the actual web history itself. Both can be a basis for
>         discrimination.
>
>         My point about verifiability is that without it, with only
>         administrative and operation controls, there will be
>         inevitably be demands for intrusive regulation, which will not
>         be good for industry. Verifiability is in fact quite easy to
>         ensure if tracking is constrained to cookies or even
>         localStorage, and that is all the more reason to rule out
>         tracking by other means such as fingerprinting.
>
>         Mike
>
>         *From:* Shane Wiley [mailto:wileys@yahoo-inc.com]
>         *Sent:* 10 July 2013 14:36
>         *To:* Mike O'Neill
>         *Cc:* public-tracking@w3.org <mailto:public-tracking@w3.org>
>         *Subject:* RE: issue-199
>
>         Mike,
>
>         Perhaps you’ve not been on the calls as I believe you’ve
>         missed a few of the key points of this discussion.  I won’t be
>         able to provide a full recount via email but I’ll try to hit
>         the high points for you:
>
>         1.It’s understood obfuscation comes with some risk and will
>         need to be bundled with operational and administrative
>         controls to reach a reasonable confidence that data will not
>         reverse engineered.  For example, data in the yellow state is
>         not shared publically and/or with parties where you don’! t
>         feel could protect the security of its composition.  While
>         we’ve agreed on transparency in this area – no one has
>         requested external verifiability to date which I believe would
>         be somewhat impossible as a starting point. Perhaps something
>         to work on as a future goal (I believe the EFF would also be
>         interested in innovating techniques in this area – is that
>         fair Lee?).
>
>         2.Agg! regate scoring will result in a profile.  The proposal
>         does not attempt to remove this concept but instead to ensure
>         the result doesn’t include a user’s historical cross-site
>         activity.  This should not be confused with de-identification
>         and instead is simply another method to meet the goal of “not
>         tracking”.
>
>         - Shane
>
>         *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com]
>         *Sent:* Wednesday, July 10, 2013 2:02 PM
>         *To:* Shane Wiley
>         *Cc:* public-tracking@w3.org <mailto:public-tracking@w3.org>
>         *Subject:* RE: issue-199
>
>         Shane,
>
>         As an example of why this “obfuscation” is pointless let it be
>         a simple substitution cypher! so my UID (which happens to be
>         “123456”) is turned into “987654”. If I visit a website
>         containing a reference to adco.com <http://adco.com/> that
>         server recognises me because the UID contains “123456” and
>         builds up a profile about me. They apply the transform to the
>         UID and always get the unique value  “987654”. which is stored
>         in the profiling dataset. When I visit other websites that
>         also contain references toadco.com <http://adco.com/> the same
>         process is repeated and my web activity is appended to the
>         dataset, again using “987654” as a key.
>
>         It makes no difference how complex  the UID transformation
>          is, as long as it is 1to1.
>
>         Under the “DAA proposal” rules there is absolutely no
>         diminution of adco’s ability to profile me.
>
>         If another party gets hold of the dataset they can also see my
>         profile, though not my original UID. If further records are
>         shared they can be connected  to me by this other party
>         because they have the same “987654” UID. They may not be able
>         to connect records containing “123456” to me (unless they can
>         crack the cypher or are given the key) but what would be the
>         point? If they have access to those data records they can
>         already profile me anyway.
>
>         If activity data in the dataset, collected with my consent,
>         contains other PII about me, such as my name, post code,
>         website history etc.  they should obfuscate that, perhaps
>         using one way hash functions or aggregated scoring algorithms.
>         Since these datasets are a valuable corporate asset you would
>         expect them to be doing that anyway, but in any case that is
>         legally required in the EU.
>
>         As the Snowden revelations have highlighted “operational and
>         administrative controls” need to be closely monitored. In the
>         case of security services this can be (has to be) through
>         impeccable judicial process under democratic oversight. This
>         would not be appropriate for commercial companies in a
>         competitive environment, so transparent technical procedures
>         are necessary.
>
>         The “yellow” state should be recognisable to users and others
>         though inspection of user agent data or web logs.
>
>         Mike
>
>         *From:* Shane Wiley [mailto:wileys@yahoo-inc.com]
>         *Sent:* 10 July 2013 12:14
>         *To:* Mike O'Neill
>         *Cc:* public-tracking@w3.org <mailto:public-tracking@w3.org>
>         *Subject:* RE: issue-199
>
>         Mike,
>
>         I respectfully disagree. Obfuscating the ID breaks the
>         association with the actual user/device. That said, I agree
>         this has the risk of being reversed so a blend of technical,
>         operational, and administrative controls must be brought to
>         bear to keep this from occurring.
>
>         De-identification doesn’t allow for profiling in a manner that
>         could affect a user’s experience (no way to get back to the
>         user).
>
>         Do Not Track can be achieved by breaking the link between a
>         unique ID a! nd cross-site activity (URLs) – and this could
>         result in a profile of the user’s interest resulting from
>         aggregate scoring – but this would not allow a user’s
>         historical activity to be retrieved.
>
>         - Shane
>
>         *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com]
>         *Sent:* Wednesday, July 10, 2013 11:55 AM
>         *To:* Shane Wiley
>         *Cc:* public-tracking@w3.org <mailto:public-tracking@w3.org>
>         *Subject:* RE: issue-199
>
>         Hi Shane,
>
>         How can it be possible to remove the association between a
>         device and a UID other than deleting it or ensuring it is
>         deleted by the UA after a short duration. If the UID is there
>         (and present in every tran! sport level request if it is in a
>         cookie) it uniquely points to the device where it is stored or
>         derived. This identity is available to the receiving server as
>         well as any actor with similar access to the data stream or
>         the same document origin.
>
>         If you transform the UID in retained data by setting it to
>         another UID (say by using a hash function), this does not
>         break the association because there is a 1to1 mapping. There
>         is no practical point in doing it.
>
>         De-identified data can only be classed as such if there is no
>         linkage. The “yellow” state can be imagined as an intermediate
>         stage before de-identification but is only relevant for
>         permitted uses (such as the detection of unique visitors for
>         analytics or frequency capping), and there is no need for it
>         to exist for more than a few hours.
>
>         If we end up defining de-identified as including the ability
>         to link individuals to a profile it would be a travesty, and
>         people will see through it. The arms race has already started
>         with an explosion of blunt cookie and script blockers. If
>         there is not a sensible response to people’s real privacy
>         concerns the usefulness of the web (and consequently the
>         profitability of many business models) will be severely
>         diminished.
>
>         Mike
>
>         *From:* Shane Wiley [mailto:wileys@yahoo-inc.com]
>         *Sent:* 09 July 2013 19:30
>         *To:* Mike O'Neill; 'achapell'; npdoty@w3.org
>         <mailto:npdoty@w3.org>; tlr@w3.org <mailto:tlr@w3.org>
>         *Cc:* public-tracking@w3.org <mailto:public-tracking@w3.org>;
>         jeff@democraticmedia.org <mailto:jeff@democraticmedia.org>
>         *Subject:* RE: issue-199
>
>         Mike,
>
>         Deidentification is about removing the association between a
>         unique ID (any source: cookie, digital fingerprint, etc.) and
>         the actual/specific user/device.  In this context:
>
>         Red: actual user/device
>
>         Yellow: not actual user/device but events are linkable (and
>         only usable for analytics/reporting)
>
>         Green: not actual user/device and events are not linkable
>         (outside the scope of DNT)
>
>         - Shane
>
>         *From:* Mike O'Neill [mailto:michael.oneill@baycloud.com]
>         *Sent:* Sunday, June 30, 2013 3:01 PM
>         *To:* 'achapell'; npdoty@w3.org <mailto:npdoty@w3.org>;
>         tlr@w3.org <mailto:tlr@w3.org>
>         *Cc:* public-tracking@w3.org <mailto:public-tracking@w3.org>;
>         jeff@democraticmedia.org <mailto:jeff@democraticmedia.org>
>         *Subject:* RE: issue-199
>
>         Alan,
>
>         Persistent identifiers and their duration should be discussed
>         as part of the red/yellow/green permitted use debate. Browser
>         fingerprinting identifiers are qualitatively different from
>         those stored in cookies or localStorage because they are
>         effectively infinite in duration, so I thought it best to
>         extend the defs. to make that clear.
>
>         Mike
>
>         *From:* achapell [mailto:achapell@chapellassociates.com]
>         *Sent:* 30 June 2013 22:39
>         *To:* michael.oneill@baycloud.com
>         <mailto:michael.oneill@baycloud.com>; npdoty@w3.org
>         <mailto:npdoty@w3.org>; tlr@w3.org <mailto:tlr@w3.org>
>         *Cc:* public-tracking@w3.org <mailto:public-tracking@w3.org>;
>         jeff@democraticmedia.org <mailto:jeff@democraticmedia.org>
>         *Subject:* RE: issue-199
>
>         Do we want to specify technologies here?
>
>         Cheers,
>
>         Alan Chapell
>         917 318 8440
>
>
>
>
>         -------- Original message --------
>         From: Mike O'Neill <michael.oneill@baycloud.com
>         <mailto:michael.oneill@baycloud.com>>
>         Date: 06/30/2013 3:33 PM (GMT-05:00)
>         To: Nicholas Doty <npdoty@w3.org
>         <mailto:npdoty@w3.org>>,tlr@w3.org <mailto:tlr@w3.org>
>         Cc: public-tracking@w3.org,jeff@democraticmedia.org
>         <mailto:public-tracking@w3.org,jeff@democraticmedia.org>
>         Subject: issue-199
>
>         Nick, Thomas
>
>         Dr Dix’s letter reminded me that we need to have some
>         reference to browser fingerprinting being ruled out when DNT
>         is set. I have amended the definitions accordingly.
>
>         Do you want me to modify the wiki?
>
>         A *persistent identifier* is an arbitrary value held in, or
>         derived from o! ther data in, the user agent whose purpose is
>         to identify the user agent in subsequent transactions to a
>         particular web domain. It may be encoded for example as the
>         name or value attribute of an HTTP cookie, as an item in
>         localStorage or recorded in some way in the cache.
>
>         The *duration* of a persistent identifier is the maximum
>         period of time it will be retained in the user agent. This
>         could be implemented for example using the Expires or Max-Age
>         attributes of an HTTP cookie so that it is automatically
>         deleted by the user agent after the specified time period is
>         exceeded.
>
>         *Browser**fingerprinting*!  is a method of tracking based on
>         creating a persistent identifier from other information either
>         inherent in the content request or already stored in the user
>         agent. Such an identifier may not need itself to be stored in
>         the user-agent as it can be calculated again in subsequent
>         transactions. It follows from this that its duration is
>         effectively unlimited.
>
>         /Justification./
>
>         /With the duration definition, restrictions on permitted uses
>         could then be made that limit the duration of persistent
>         identifiers.//Because/ /browser fingerprinting/ /cannot! be
>         given a finite duration this tracking method should not be
>         used when DNT is set even if it is for a permitted use./ /In
>         reality browser fingerprinting solely based on examining
>         initial content requests is usually not an effective tracking
>         method because the combination of IP addresses and other
>         headers are not sufficiently user specific, but we should rule
>         out at least the more complex form when DNT is set./
>
>         Mike
>

Received on Thursday, 11 July 2013 09:47:31 UTC