Bug 19028 - Support a rel attribute that restricts cookie transmission
Summary: Support a rel attribute that restricts cookie transmission
Status: RESOLVED LATER
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: All All
: P3 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-09-25 21:55 UTC by contributor
Modified: 2013-02-08 02:45 UTC (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description contributor 2012-09-25 21:55:59 UTC
This was was cloned from bug 11235 as part of operation LATER convergence.
Originally filed: 2010-11-05 13:15:00 +0000
Original reporter: Alexander Romanovich <alex@sirensclef.com>

================================================================================
 #0   Alexander Romanovich                            2010-11-05 13:15:16 +0000 
--------------------------------------------------------------------------------
One of the objectives of a CDN involves using a cookie-less domain to cut down on the amount of data that needs to be transferred to the server to make a request. I'm in a situation where I cannot use a cookie-less subdomain because of a need to use ".foo.com" for cookies so that states persist across a non-fixed set of subdomains on the host. For this reason I have a number of subresources that transmit cookie data needlessly on any given page.

It would seem useful to allow a "rel" attribute on script/link/img tags that tells the browser to not send cookies with that subrequest. This would make it possible and very easy for developers to cut down on the amount of cookie data being sent, especially in situations when they lack the authority or have a specific obstacle to creating a cookie-less CDN to serve the resources over.
================================================================================
 #1   Kyle Simpson                                    2010-11-05 14:00:36 +0000 
--------------------------------------------------------------------------------
+1. I think this is a fantastic idea for improved web performance optimization.
================================================================================
 #2   Julian Reschke                                  2010-11-05 14:04:04 +0000 
--------------------------------------------------------------------------------
Potential overlap with "noreferrer"?
================================================================================
 #3   Alexander Romanovich                            2010-11-05 14:18:51 +0000 
--------------------------------------------------------------------------------
I thought of adding this behavior to an existing rel value, but then the nomenclature would be a little misleading (since cookies and referrers are not related).

That said, the general idea of bundling this behavior with other ones triggered by the same attribute would be fine, so long as they're all things that a developer would want together, and not individually.
================================================================================
 #4   Anne                                            2010-11-08 09:10:59 +0000 
--------------------------------------------------------------------------------
Maybe we should have something like "anonymous" (similar to AnonXMLHttpRequest) which kills credentials, Referer, and Origin.
================================================================================
 #5   Julian Reschke                                  2010-11-08 09:19:58 +0000 
--------------------------------------------------------------------------------
Indeed.

It would be great if it could *replace* noreferrer, but that ship probably has sailed.
================================================================================
 #6   Alexander Romanovich                            2010-11-08 14:20:47 +0000 
--------------------------------------------------------------------------------
A rel="anonymous" would probably fit the bill perfectly (restricting cookies, HTTP auth, SSL certs, referrer, and origin). (Though according to this source, the origin header should only sent with script requests of the 3 types of requests I originally mentioned: https://wiki.mozilla.org/Security/Origin)

I'm in the CMS business, and I'm thinking here of all the content we generate (particularly image thumbnails for individual news stories, etc. which would not be appropriate to make into sprites). Our product typically drives pretty large web sites, and the ability to use this flag globally in page output would probably have a dramatic effect across the board. Removing credentials and extra headers from these requests is an improvement, and would become an asset for security as well.
================================================================================
 #7   Kyle Simpson                                    2010-11-10 13:53:25 +0000 
--------------------------------------------------------------------------------
I've definitely been in favor of this proposal, especially the suppressing of cookies.

I ran it by Billy Hoffman (http://zoompf.com) and he brought up a good point that we need to consider.

There are apparently some servers/applications that are intentionally configured to log out a user session if a request is received that has no cookies. Honestly, I'm not actually sure how that would work, because I'm not sure how the server knows which session to kill if there was no cookie to identify to the server who the request came from. But, nevertheless, apparently this is a reality out there.

So, the obvious point is, anyone who used such a functionality in their application (for whatever reason, intentional or not), they couldn't use this rel="anonymous" to suppress cookies, without logging out users.

On the surface, my reaction was to say that such strange setups would just be unable to use this rel feature.

But Billy pointed out that such things can be used in a DoS attack. For instance, evil.com can have an <img> tag on it that points to an image on bank.com, and uses rel="anonymous" to force the user to be logged out. Now, in my opinion, this type of DoS is rather benign, but I guess it's real nonetheless.

So, this is what I propose:

We restrict the behavior of rel=anonymous to only work (at least in terms of cookies) if the resource is on the same domain (exactly) as the page domain. It would be silently ignored for requests to resources on other domains.

This should be fine for CDN usage, because CDN's in general are not sending out cookies. Or, rather, the issue we're trying to solve is much more about all the global cookies that are set on a local domain (like analytics tracking cookies, etc) that are unnecessarily bogging down static resource requests. So, the far majority of those requests will be to the same page-domain, which would benefit from the rel=anonymous behavior being discussed.

Thoughts?

--Kyle
================================================================================
 #8   Anne                                            2010-11-10 14:04:07 +0000 
--------------------------------------------------------------------------------
If that is a real problem that would be a problem with XMLHttpRequest as well. Could you raise that on public-webapps@w3.org?
================================================================================
 #9   Kyle Simpson                                    2010-11-10 14:38:57 +0000 
--------------------------------------------------------------------------------
I think the mitigation of XHR is that normal XHR only works same-domain, and even CORS requires the server to handle the pre-flight authorization before a real cross-domain request can come in, whereas <script>, <link>, <img> etc can all freely make cross-domain requests.

Nevertheless, I'll check out that list. Do I need to join that WG before I can post?
================================================================================
 #10  Anne                                            2010-11-10 14:47:24 +0000 
--------------------------------------------------------------------------------
No need to join. And you are wrong as to how cross-origin XMLHttpRequest operates. It makes the request directly for simple GET requests. And this already works in Firefox/Safari/Chrome. And also works in Internet Explorer when using XDomainRequest. So I sort of think that vulnerability is already inherent to the platform.
================================================================================
 #11  Ian 'Hixie' Hickson                             2010-12-29 08:37:27 +0000 
--------------------------------------------------------------------------------
I agree that the problem described is a real one: that images, scripts, and style sheets are often served from separate domains to avoid sending cookies and that doing so is hard in some cases such as that described in comment 0. However, rel="" can't fix this  since neither images nor scripts have a rel="" attribute. It would have to be something like a "nocookie" attribute or some such.

The usual solution is to just use an entirely separate domain (e.g. yimg.com).

Is this really common enough to warrant new syntax features in HTML?
================================================================================
 #12  Julian Reschke                                  2010-12-29 08:42:05 +0000 
--------------------------------------------------------------------------------
(In reply to comment #11)
> I agree that the problem described is a real one: that images, scripts, and
> style sheets are often served from separate domains to avoid sending cookies
> and that doing so is hard in some cases such as that described in comment 0.
> However, rel="" can't fix this  since neither images nor scripts have a rel=""
> attribute. It would have to be something like a "nocookie" attribute or some
> such.
> ...

*If* we decided to add new attributes, we of course *could* add @rel to img and script.
================================================================================
 #13  Alexander Romanovich                            2010-12-29 17:29:32 +0000 
--------------------------------------------------------------------------------
I can't comment on how common usage of an attribute like this would be across the web in general, but I can briefly describe the effect I would anticipate from such a feature in the content management world, which is what prompted me to file this request.

I work on a large scale CMS in education. Looking at a typical client, there are images (most frequently thumbnails accompanying news stories, event listings, galleries, etc.) that appear often in list format on a large number of pages throughout any given site. It is not a situation where yimg would be desirable, as there are several reasons why our custom image manipulation/deployment tools are used by our clients on their local servers as opposed to remote hosting, and other reasons why these particular clients may not wish to use third party remote hosting to store their image media in general. Using Google Analytics alone, which is a distinct commonality, you're looking at a large chunk of cookie data being transmitted to the server per image request. (The same is true for script and link tags which the CMS embeds in templates.) These are high traffic sites, and the CMS often powers more than one site running on the same machine (I have one client that has chosen to drive six sites with the CMS on one server).

In other words: frequent usage of script, link, and especially image tags throughout a web site, multiplied by at least the large chunk of Google Analytics cookie data per unique request, multiplied by the volume of users of a high traffic site, multiplied by the number of sites on that server, translates to a lot of time spent receiving data from the end user, which as you know is much slower than sending data to them. Since it would be trivial to modify our codebase in order to apply a new attribute to content we generate, it would be an equally trivial process to cut bandwidth usage and server response time for users of our CMS, across the board. I would imagine adoption by Wordpress, etc. would result in a similar story, as well as sites which simply display a large quantity of resources. The benefit here is that it provides a painless option to reduce bandwidth usage and resource load latency when the alternatives (which have been noted here) are either impossible or incompatible, or simply inconvenient.
================================================================================
 #14  Ian 'Hixie' Hickson                             2011-01-23 20:30:55 +0000 
--------------------------------------------------------------------------------
It's not clear to me that HTML is the right place for a solution for this. For instance, whatever solution we used you'd still want to be able to control whether images or fonts in a style sheet had cookies or not, you'd still want to be able to control what happened to resources imported by scripts (including those without underlying DOM nodes, like Worker and EventSource objects), and you'd want it to cover a whole raft of features even just within HTML, like <video src>, <source>, <object>, <iframe>, etc.

My recommendation at this stage would be to approach browser vendors and encourage them to experiment with different ideas for addressing this, so that we gain implementation experience.
================================================================================
 #15  Alexander Romanovich                            2011-01-23 21:24:53 +0000 
--------------------------------------------------------------------------------
I guess you're right that an HTML attribute would be too limiting in regards to controlling this functionality in all aspects of web browsing. But where would such an approach, with the larger scope you're describing, be implemented exactly?

I assume you're not talking about browser settings, since this is something a web developer would want to control on a case-by-case basis (since some requests require cookie transmission to maintain logins, for example.). I suppose you could send a header along with the main resource that instructs the browser how to behave in respect to different kinds of future subresource requests on the page, but that would get tricky if it was the sole source of instruction about so many different types of requests. Otherwise, I could imagine HTML being the right place for controlling this so long as there's an equivalent solution(s) that applied to the additional cases you mentioned (i.e. a request in the generic sense knows how to be anonymous, but the HTML attribute is just one of several ways to switch that flag on).

I'd be happy to approach some browser vendors, once I'm clearer on where the possibilities lie for implementing this, if not just an HTML attribute.
================================================================================
 #16  Kyle Simpson                                    2011-01-23 23:49:32 +0000 
--------------------------------------------------------------------------------
I agree that more thought would need to go into all the possible resource requests which would benefit from this type of functionality.

But it still seems like having `rel` as an attribute for all those different containers, with a value that said "suppress cookies", would be sufficient. If any of those containers don't yet support the `rel` attribute, it wouldn't seem too onerous to extend rel to those containers.

Even if we're discussing sub-requests (like a script loads more scripts), those requests are always done via dynamically creating one of the containers in question, in which case setting the `rel` property should suffice, right?

I think the only other concern would be if XHR requests should support a way to suppress in the same way, and I think that it should. I'm not sure `rel` for XHR would make much sense, but perhaps something like "sendCookies" or whatever.
================================================================================
 #17  Ian 'Hixie' Hickson                             2011-02-16 08:58:15 +0000 
--------------------------------------------------------------------------------
I don't really know what a good solution would be, unfortunately. Maybe some sort of API or declarative solution where you can whitelist URL prefixes that don't get cookies and so on?

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: none yet
Rationale: For administrative purposes I'm going to mark this one "LATER" until we have more implementation experience.
================================================================================
Comment 1 Alexander Romanovich 2013-02-06 21:43:22 UTC
What are the chances that you'd consider supporting a response header that would prompt the browser to whitelist certain url prefixes for restriction of cookie transmission?

i.e. something like:

No-Cookies: /path/to/dir1/*;/path/to/resource

This would be sent with an HTML document, and then the browser would not send cookies for anything it requests from /path/to/dir1 or for the specific resource /path/to/resource, while making sub requests from that web page.

The advantage of this (over a rel attribute) would be that it can be taken into account for lazy-loaded resources/XHR/etc., and might be easier to implement than setting a manifest in a separate file somewhere a la application cache.
Comment 2 Kyle Simpson 2013-02-06 23:54:07 UTC
I think it's a good addition, but I don't think it'd be fully sufficient, because part of the desire here is to let an author control what cookies might be being sent to third-party domains as a result of referencing resources on that third-party domain.

For instance, imagine a site allows a user to express a preference to not have any tracking done on their activities, even with third-party ads that site may use. A site could make sure any such resources would have `rel=nocookie` on them to restrict the cookies the browser would otherwise set.
Comment 3 Alexander Romanovich 2013-02-07 14:08:09 UTC
In that case you could just allow hosts in the whitelist as well:

No-Cookies: /path/to/dir1/*;/path/to/resource;www.domain.com

This option would not necessitate the developer setting a rel attribute on a DOM-inserted tag referencing a resource but, more importantly, that's not the only scenario we'd be dealing with for script-initiated requests. There are lots of XHR requests that would benefit from this.

The other reason I like Ian's whitelist/prefix idea is that deployment would be greatly simplified. My application doesn't need to have to worry about generating rel attributes for stylesheets, scripts, images. I simply send an extra header with my documents and forget about it. I could even do that site-wide with Apache's mod_headers, for instance.
Comment 4 Kyle Simpson 2013-02-07 15:06:58 UTC
I understand the reasoning for the header approach, and appreciate the spirit of it. I have a few concerns:

1. A primary motivator for this feature request was performance. Especially on mobile devices with severely limited (or metered) bandwidth, the cost of lots of cookies (often 600 bytes or more per request) is very undesirable.

If we specify that you send a list (perhaps a complicated list depending on your needs) of paths and domains to suppress cookies on, and you do that in response headers, then I think the default tendency for most people will be that they turn on this header for all responses, which shifts the performance problem from requests to responses, but doesn't alleviate it by much.

So, there'd have to be an easy way to make sure that the response header was only sent on initial HTML page. There are certain facilities in Apache that could accomplish the task, like per mime-type. But that doesn't account for Ajax requests for HTML pages/snippets, which would still send the headers.

I think we'd be creating a system that was, by default, not all that helpful, without more education on fine tuning the Apache mechanisms so that the headers are only sent sparingly. User education/evangelism is useful but it's a moving target.


2. If we send it as a header, and multiple resources DO include the header, what should the browser do if it receives different/conflicting answers? Last answer wins? First answer wins? Merge the answers progressively?

If we only accept this header on HTML page requests, and ignore it on CSS and JS and such, can we reasonably distinguish between an Ajax request for an HTML page and a full HTML page request?


-----------

Here's a possible compromise that I think might address some of those concerns, but still alleviate having to put the policy (via `rel`) on all containers:

What about saying the policy can be specified as a <meta> tag only, with the same format you were suggesting, that could only be included in the <head> of an HTML page?
Comment 5 Julian Reschke 2013-02-07 15:31:31 UTC
This: http://tools.ietf.org/html/draft-nottingham-http-browser-hints-04#section-5.10 might be of interest.
Comment 6 Alexander Romanovich 2013-02-08 02:45:21 UTC
@kyle I don't have the details with me at this time, but a while back I ran some numbers on this with a handful of web sites. The homepage of one of these sites claimed approximately 700 bytes of cookie data was being sent for at least 24 individual requests that appeared to not need cookies sent at all. That's 16800 bytes total for just that one page.

In a case like this, it's a clear win to send the above suggested header (just once, for the root index in question) in exchange for saving that amount of data across all those requests. Remember also that it's typically more expensive to upstream data than downstream. Of course, this profile may or may not prove to be widely representative, and it is certainly on the developer to deploy the feature in a way that produces the optimal tradeoff, as you have noted.

I'm definitely suggesting this header be sent only for full HTML documents-- not for sub resources or for AJAX requests. Again, it would be up to the developer to determine the logic for sending these. I tend to think that HTML snippets via AJAX tend to be trending downward with application/json, etc. (potentially containing HTML segments) better served in distinguishing these types of requests. That said, I think your concerns are well warranted.

I believe it has been previously suggested that another possibility here would be to include some form of manifest embedded/attached to the HTML document, as opposed to a header. There is an approach to this that is quite tied to application cache, and doesn't seem suitably abstract to use for other purposes, but the precedent is there. I would be just as happy with an implementation like this, provided it is well thought out.

I know that you also have an interest in webkit bug 30862, and I can't help but wonder if some form of manifest/instruction such as we're discussing here would be suited to solving that issue as well.

@julian The browser hints draft looks interesting. Will need more time to digest it thoroughly.