7918 – prefetching: allow site to deny

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 7918 - prefetching: allow site to deny

Summary: prefetching: allow site to deny

Status:	RESOLVED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P3 enhancement
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:	NE

Depends on:
Blocks:

Reported:	2009-10-14 15:45 UTC by Nick Levinson
Modified:	2010-10-04 13:59 UTC (History)
CC List:	6 users (show)

See Also:

Attachments

Description Nick Levinson 2009-10-14 15:45:40 UTC

Prefetching should be deniable, because it demands bandwidth, especially when dynamism makes accurately predicting what will be fetched unlikely. Most site owners will happily keep speeds up but some will want to control costs.

Prefetching is good for visitors. If the specific predictions are highly accurate, and I imagine they would be on most sites, the bandwidth penalty is small and probably easily overcome by visitor satisfaction leading to stiktion, repeat visits, and a positive reputation for new visitors.

But the cost objection is right regarding sites where the arrangement of pages is less obvious, and especially where a page is dynamic, leading to more prefetch demands that may generate error pages or hits that don't correlate with visitors' actual interests, so that visitors don't even glance at much that has been prefetched at the site owner's expense. For example, if a visitor is reading a series of pages that have "Next Page ->" links and the link is to a script that generates a page based on how long the visitor stayed at the current page, the prefetch would probably be erroneous.

A link supporting prefetch being absent is not a bar to prefetching (see section 4.3.1), so the absence of the link is not sufficient protection.

Two solutions are proposed, one easier and the other more specific, and not mutually exclusive.

The easier solution would be a local file that could deny all prefetching sitewide. The file should be modeled on robots.txt, should be named use.txt, should be commentable like robots.txt, and should have one noncomment line strippable of any leading whitespace:

prefetch no

The UA, before prefetching anything at a domain, should be required to check for that file and its command.

The reason for naming the file use.txt rather than prefetch.txt is to support growth of the file for other purposes that may be offered or prescribed in the future. If it is desired to add utility to use.txt before any other terms are semanticized or reserved, private use could be supported for any term beginning with "x-" or "data-" (your choice of which to adopt).

The more specific solution would be a per-page or per-element denial of prefetching by a link, a, or area element. An attribute could be rel="noprefetch" (i.e., don't prefetch the URL in the href), rev="noprefetch" (i.e., don't prefetch the page bearing the link element), or either noprefetch="true" (false being trivial) or prefetch="false" (true being trivial).

I'll likely add the rel to RelExtensions soon but the other solutions don't go there.

If what is to be prefetched is from another website, one website should not be permitted to bar prefetching from another website. The website from which the resource is to be prefetched would have the option of denying prefetching and the UA would have to recognize that bar. For example, if a user visits example.com and the UA, whether parsing example.com or not, sees fit to prefetch from example.org, only example.org could bar the prefetching from example.org.

If a UA wants to prefetch two items for sequential use and if the first is denied the second would not be prefetched before the first is fetched, but if the first is allowed and the second is denied the UA would prefetch the first and fetch but not prefetch the second.

If a UA wants to prefetch two items for joint or parallel purposing and one is denied and the other is allowed, I don't know whether there should be an HTML5-prescribed default whereby just one or neither is prefetched or the UA should decide whether to prefetch just one or neither.

Thank you.

--
Nick

Comment 1 Nick Levinson 2009-10-16 03:51:18 UTC

Amending the above as to a and area elements: Instead of noprefetch="true" or prefetch="false", use rel="noprefetch" with rev being meaningless.

Comment 2 Ian 'Hixie' Hickson 2009-10-21 08:21:44 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Before we add this kind of thing to the spec, we need to do research to find if people need it, we need experimental implementations or a commitment from UA implementors that they'll implement it, and we need to examine what the motivations will be for UAs to support this or ignore it (for example, why would a browser support this, when it would just make their user's experience slower?).

Comment 3 Nick Levinson 2009-10-23 08:18:14 UTC

I'm reopening.

Use case additions:

--- Host bandwidth demand may nearly double, as with Bugzilla, says Mozilla in "[w]e found that some existing sites utilize the &lt;link rel=&quot;next&quot;&gt; tag with URLs containing query strings to reference the next document in a series of documents. Bugzilla is an example of such a site that does this, and it turns out that the Bugzilla bug reports are not cachable, so prefetching these URLs would nearly double the load on poor Bugzilla! It's easy to imagine other sites being designed like Bugzilla . . . ." https://developer.mozilla.org/en/Link_prefetching_FAQ (characters replaced by entities by me). Google agrees on the concept in saying, "Your users probably have other websites open in different tabs or windows, so don't hog all of their bandwidth. A modest amount of prefetching will make your site feel fast and make your users happy; too much will bog down the network and make your users sad. Prefetching only works when the extra data is actually used, so don't use the bandwidth if it's likely to get wasted." <http://code.google.com/speed/articles/prefetching.html>. See also "[s]ince Fasterfox constantly requests new files, it can cause many servers to overload much faster" via the Skatter Tech link, below.

--- URLs with query strings may yield uncachable pages, making prefetching them useless. Mozilla link, above.

--- Caching is increased, which is demanding of hardware. This is inferred from a 2001 paper from University of Texas at Austin, <http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6TYP-444F9RV-1&_user=10&_rdoc=1&_fmt=&_orig=search&_sort=d&_docanchor=&view=c&_searchStrId=1060473355&_rerunOrigin=google&_acct=C000050221&_version=1&_urlVersion=0&_userid=10&md5=c95e0ad0d99e082a730536d8f5b1937c>. I'm unclear on what happens if a user's low-capacity computer tries to prefetch a large file it can't hold, especially when it already has important content in memory or on disk.

--- Visitors' bandwidth: Some visitors apparently use ISPs who charge users for bandwidth (see megaleecher.net link below), and erroneous prefetches cost them more for visiting. That's separate from site owners' bandwidth costs.

--- Slowing: If a user has several downloads working at once, prefetching adds an unannounced burden that can noticeably slow everything. See the Google link, above.

--- Benchmarking is available for a limited and possibly non-Web case. IEEE says, "[u]nfortunately, many LDS ["[l]inked data structure"] prefetching techniques 1) generate a large number of useless prefetches, thereby degrading performance and bandwidth efficiency, 2) require significant hardware or storage cost, or 3) when employed together with stream-based prefetchers, cause significant resource contention in the memory system." IEEE benchmarks a proposed hardware-based alternative as that "[e]valuations show that the proposed solution improves average performance by 22.5% while decreasing memory bandwidth consumption by 25% over a baseline system that employs an effective stream prefetcher on a set of memory- and pointer-intensive applications." <http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4798232>, abstract.

--- Staleness, if prefetching is too early, is cited in the Google article linked to above.

--- If the current page is still downloading, prefetching the next can slow the current page's download. "It's also important to be careful not to prefetch files too soon, or you can slow down the page the user is already looking at." (per the Google article linked to above).

--- When a site owner restricts the number of times a file may be downloaded, prefetching causes the threshold to be exceeded too soon (http://forums.mozillazine.org/viewtopic.php?f=8&t=372533&start=0). 

--- Security re HTTPS: Mozilla says, re Firefox, "https:// . . . URLs are never prefetched for security reasons[.]" Mozilla link, above.

--- Security has been criticized, re cookies from unexpected sites, but that can be solved by turning cookies off either generally or from third-party sites, so I don't know if that's critical. It's discussed in the Mozilla page linked to above and the MegaLeecher comment linked to below.

--- Security in retrieving from a dangerous site via a safe site might result in caching a page that has a dangerous script with the user not knowing. Mozilla disagrees ("[l]imiting prefetching to only URLs from the the same server would not offer any increased browser security", via the above link), but I'm unclear why.

Implementations:

--- Firefox already allows turning prefetching off (<http://www.megaleecher.net/Firefox_Prefetch_Trick>, the article and the sole comment; see also the Mozilla link above).

--- Gnome's Epiphany browser reportedly does the same (http://ubuntu-tutorials.com/2008/03/20/how-to-disable-prefetching-in-firefox-epiphany/).

--- IE7 reportedly does the same, with difficulty (see the MegaLeecher link, above).

--- Blocking at the site is implemented using robots.txt against Fasterfox, a Firefox extension, and the extension checks the robots file (http://skattertech.com/2006/02/how-to-block-fasterfox-requests/). (I prefer not using a robots-sensitive grammar, since that conflates different problems into the same solution.)

--- Advice to return a 404 error when an X-moz: prefetch is found in the headers is attributed to Google and is in <http://www.vision-seo.com/search-engine-optimization/google.html#prefetchblock>, but that's based on a Mozilla instruction, which may not be stable (see the Mozilla page linked to above).

Rationale:

--- More UAs would do this if it's in the HTML5 spec. HTML5 should have it for the use case.

--- As the vast majority of sites would not want prefetch denial, the spec including it would not alter page authoring or burden UA prefetching, especially since, presumably, not prefetching is easier for a UA than prefetching.

All URLs in this post were accessed today.

Comment 4 Nick Levinson 2009-10-23 08:32:24 UTC

Correction: I meant that the spec including prefetch denial would not alter most page authoring. The solutions proposed for those authors wanting them have low burdens.

Comment 5 Ian 'Hixie' Hickson 2009-10-23 08:32:47 UTC

This is good stuff, but we still need actual implementation experience *of the proposal*, before this kind of thing is added to the spec proper.


> More UAs would do this if it's in the HTML5 spec.

I doubt that that is the case. Implementors don't implement things because they're in specs, they implement them because they want to implement them.

Comment 6 Nick Levinson 2010-01-28 03:34:34 UTC

Google Chromium's developers seem to feel it's not a priority unless it's in HTML5, and that the discussion should be within HTML5, not at the implementer's end (<http://code.google.com/p/chromium/issues/detail?id=27111>, as accessed 1-22-10 & 1-27-10). My requests to some other browser makers are apparently still pending.

I'm still puzzled by the requirement that an implementer commit before it's in the spec. That seems to be confusing the Chromium people. By definition, if their priority is standards compliance, adding a feature outside of compliance takes time away from compliance development and writing the feature into the spec increases the likelihood of it being in a browser as part of general compliance.

Thanks.

Comment 7 Nick Levinson 2010-04-11 21:15:56 UTC

This will help website owners but not UA makers, in general. So, UA makers won't want to implement this before it's in an external standard.

Standards compliance is part of market satisfaction. To meet the needs of users who expect industry-normed performance, UA makers will tend to prioritize meeting standards and product differentiation (subject to patent control) and will deprioritize adding features that don't speed up their performance, even though they help website owners.

Robots.txt is unreliable because that standard is no longer being developed and does not officially support anything like this. Thus, another mechanism is needed.

That would be HTML5 and a proposed use.txt file.

Comment 8 Maciej Stachowiak 2010-04-28 21:45:38 UTC

Nick, it looks like you meant to reopen this bug, rather than escalate to the tracker, since you said "I'm reopening". Therefore I am reopening this bug for the editor to consider the newly provided use cases. If you actually meant to escalate and not reopen, please provide a suggested title and text for the issue to be raised.

Comment 9 Jonas Sicking (Not reading bugmail) 2010-04-28 22:12:08 UTC

For what it's worth, in Firefox prefetch implementation we add the http header:

X-Moz: prefetch

to all prefetch requests. This allows sites to block any and all prefetch requests. It can simply close the TCP connection upon detecting this header.

One downside with your approach is that it causes even more requests to go to servers, since you have to first request the use.txt file. It also slows down performance since the two requests have to be made one after another.

The header approach does not suffer these problems.

Comment 10 Maciej Stachowiak 2010-04-28 22:23:57 UTC

(In reply to comment #9)
> For what it's worth, in Firefox prefetch implementation we add the http header:
> 
> X-Moz: prefetch
> 
> to all prefetch requests. This allows sites to block any and all prefetch
> requests. It can simply close the TCP connection upon detecting this header.
> 
> One downside with your approach is that it causes even more requests to go to
> servers, since you have to first request the use.txt file. It also slows down
> performance since the two requests have to be made one after another.
> 
> The header approach does not suffer these problems.

This header sounds like a good approach, though of course ideally we would want one with a name other than "X-Moz".

Comment 11 Jonas Sicking (Not reading bugmail) 2010-04-28 22:25:57 UTC

We'd definitely be fine with giving it another name. Though we'd possibly include the old header as well for a release or two in case someone is using it.

Comment 12 Nick Levinson 2010-05-12 16:19:06 UTC

I support reopening. Thanks.

Universalizing from X-Moz might be good.

One concern, but maybe I misunderstand something: If X-Moz results in the TCP connection being closed and Google says to send a 404 error for X-Moz, does that interfere with fetching when the time comes for a timely fetch of the same file that denied prefetch? And does sending a 404 error interfere with viewing the current page? I'm unclear why a 404 error is supposed to be helpful to the visitor for a prefetch denial if fetching will proceed. Prefetch denial could be stated in a status bar, if that's helpful and wouldn't interfere with current-page viewing or timely fetching.

I'm happy with any solution that works. If there's a better way than a use.txt file, great.

Thank you.

Comment 13 Ian 'Hixie' Hickson 2010-08-16 22:19:01 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Rejecting this proposal based on the implementation feedback above. I would be happy to spec a particular HTTP header to include if browser vendors can come to an agreement on what header would be appropriate to send here; please file a new bug requesting that and documenting the interest from browser vendors if that is a satisfactory direction for you.