tracking-ISSUE-146: Well-known URIs and maintainability for large sites [Tracking Preference Expression (DNT)]

tracking-ISSUE-146: Well-known URIs and maintainability for large sites [Tracking Preference Expression (DNT)]

http://www.w3.org/2011/tracking-protection/track/issues/146

Raised by: Matthias Schunter
On product: Tracking Preference Expression (DNT)

Ian Fette wrote:

The current proposal requires duplicating the entire website's namespace under /.well-known/dnt/ -- that is to say, if you request https://apis.google.com/_/apps-static/_/js/gapi/googleapis_client,plusone/rt=j/ver=OjdQ3MbDCro.en./sv=1/am=!uchpBK-CNFmZrNLZSw/d=1 I have to have a policy file under https://apis.google.com/.well-known/dnt/_/apps-static/_/js/gapi/googleapis_client,plusone/rt=j/ver=OjdQ3MbDCro.en./sv=1/am=!uchpBK-CNFmZrNLZSw/d=1

This is difficult for large sites for a number of reasons. 

1. Parts of the URL might be used as transitive data, e.g. not actually representing an actual file but rather arguments to be passed to the server. This essentially means that I need to query whatever frontend service handled the original request, and the parameters specified as part of the URL may or may not still have meaning at that time.

2. The policy might depend on query parameters which in the current draft are not sent, e.g. both https://www.google.com/search?source=ig&hl=en&rlz=&q=microsoft&btnG=Google+Search and https://www.google.com/search?sugexp=chrome,mod=12&sourceid=chrome&ie=UTF-8&q=microsoft represent searches on Google for "microsoft" but come from different sources and therefore may have different logging policies (one came from iGoogle, the other from the Chrome omnibox). We may potentially need query parameters in this case to figure that out. 

3. Creating this duplicate namespace now means I've got additional mappings/rules for my load balancers / frontends, depending on how much flexibility you have this may be a small overhead or if may be quite large.

4. A URL that is used in both first and third party contexts certainly has no way of knowing if it was used in a first or third party context under the current proposal. (Whether a site can know at all if it is 1st/3rd party in any reliable manner is still in the current draft an open issue AFAIK though).

What I had proposed in earlier discussions, and what I still maintain would be more workable for some large sites, is to instead have the request return (perhaps as an alternative to the current well-known location proposal) a "policy identifier". That is, the response could include something like 'Tk:3,maps' and then if the client cared it could fetch /.well-known/dnt/maps to get the policy identified by the token "maps". This avoids the problems 1-4 listed above as at the time of serving the request, I believe a site has at that point better information about what policy applies to the request than being asked at a random later point in time at a different address.

Received on Saturday, 12 May 2012 14:53:08 UTC