12569 – "Resource" Package Support

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12569 - "Resource" Package Support

Summary: "Resource" Package Support

Status:	RESOLVED NEEDSINFO

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 enhancement
Target Milestone:	---
Assignee:	This bug has no owner yet - up for the taking
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Duplicates (1):	15287 (view as bug list)
Depends on:
Blocks:

Reported:	2011-04-29 04:53 UTC by Wojciech Hlibowicki
Modified:	2013-03-05 00:44 UTC (History)
CC List:	10 users (show)

See Also:

Attachments

Description Wojciech Hlibowicki 2011-04-29 04:53:30 UTC

With the overhead involved with each server request to load various resources such as images/videos/etc, most webmasters have began doing everything possible to optimize the number of requests necessary for a given page. Some methods that they opt to use is the merging of images by using CSS to map the merged image to the individual area, merging javascript files into one file, merging CSS files, delayed loading of images till they are scrolled onto the screen, among other strategies. 

Now, what I am suggesting, I am not sure whether it is included or has been suggested before, though I did spend some time looking. I would like to see a new "link" support in the <head> tag that allows a developer to specify a compressed tarball to utilize as a resource within the page, along with a url mask to match the tarball to resources on the page, which the mask would default to matching all urls that start with the url of the resource tarball. 

For instance, a resource tarball located at "http://domain.ext/res/main.tgz" would match by default an image that might be called like '<img src="http://domain.ext/res/main.tgz/header.png" />' which would find the 'header.png' in the main.tgz tarball and use that. 

This would greatly decrease the number of requests to the server as you can now merge all javascript, css, and images into one request! Also, with the url mask approach, it is very backwards compatible as a server can handle the url "http://domain.ext/res/main.tgz/header.png" and delivery that individual file easily.

Another thing that can be done to optimize page loading, and it would be something in the hands of browsers and webmasters is the ability to load resources from the tarball as they are downloaded, and before the whole tarball is downloaded. Then webmasters could make sure that the CSS file is at the start of the tarball, followed by the javascript files required, then the images in the order they are used.

I would be very grateful for a feature like this, and believe it starts with adding it to the specification, since starting with the implementation can shed bad light on the first implementer. Lastly, if this is the wrong place to suggest this, please direct me to who/where I can submit it.

Comment 1 Henri Sivonen 2011-04-29 12:05:16 UTC

This has been suggested before:
http://people.mozilla.com/~jlebar/respkg/

Comment 2 Kyle Simpson 2011-04-29 12:36:14 UTC

Here's some more info on previous proposals/discussions about this type of idea:

http://ajaxian.com/archives/resource-packages-making-a-faster-web-via-packaging

http://www.stevesouders.com/blog/2009/11/18/fewer-requests-through-resource-packages/

And here's some criticisms I wrote about the idea, back in the fall of 2009:

http://blog.getify.com/2009/11/resource-packaging-silver-bullet/

Comment 3 Wojciech Hlibowicki 2011-04-29 21:49:06 UTC

(In reply to comment #2)
> Here's some more info on previous proposals/discussions about this type of
> idea:
> 
> http://ajaxian.com/archives/resource-packages-making-a-faster-web-via-packaging
> 
> http://www.stevesouders.com/blog/2009/11/18/fewer-requests-through-resource-packages/
> 
> And here's some criticisms I wrote about the idea, back in the fall of 2009:
> 
> http://blog.getify.com/2009/11/resource-packaging-silver-bullet/

I completely agree/understand your criticism, but I believe it is too pessimistic. Really, why should every one be held back by the possibilities of a few? I personally have employed most optimization techniques you have mentioned, including some you have not, like utilizing a domain that has no cookies set on it as to reduce the request header size, and feel like I can go much further with a feature like this. Also, the problem with waiting for MBs to download is easily solved by processing the package as it downloads, though that does not help if the main CSS file is placed at the end of the package. 

My other idea to solve the same problem was to add support to the HTTP protocol for multiple file downloads when requesting the main page, so that a page can automatically start the downloading of resources, which the browser can provide sort of a package "etag" with the request which is checked to see if any of the files have changed since then.

And the last one was to allow for multiple simultaneous requests in the HTTP protocol, so that browsers do not have to send new headers with every request, and the server could send the files requested in serial.

Comment 4 Kyle Simpson 2011-04-29 22:27:11 UTC

(in reply to comment #3)

> Also, the problem with waiting for MBs
> to download is easily solved by processing the package as it downloads, though
> that does not help if the main CSS file is placed at the end of the package. 

This doesn't at all address the fact that the ZIP file has to be transferred serially, byte-by-byte, meaning the files are in fact NOT downloaded in parallel, which creates a situation where the page will load MUCH slower than it normally would, by disabling all parallel downloading.

As for your idea of having the files download in parallel somehow... the whole point of having a single file is to prevent multiple files from being delivered. I don't think you can have it both ways.

Now, if you're suggesting that some sort of "manifest" file could be delivered to the browser to tell it what all the files in the ZIP are, sure... but how will that help at all, if the browser still has to wait for each file to show up as part of the single ZIP file stream?

What we'd REALLY need for an idea like this to not hamper performance is for the browser (and server) to do some sort of parallel bit downloading of a single file, similar to bit-torrent kind of thing, where a single giant ZIP file could be delivered in simultaneous/parallel chunks, bit-by-bit. If you wanna argue for THAT, sure, go ahead. That'd be awesome. But it's a LONG way from being possible, as all web servers and all web browsers would have to support that. If either a webserver or a browser didn't, and the HTML served up suggested that single manifest.ZIP file, then this view of the site would be excruciatingly slow because all resources would default to the serial loading, worse than like IE3.0 kinda days.

Moreover, the separate cacheability is a concern. If I have a big manifest.ZIP file with all my resources in it, and I change one tiny file in that manifest, then doesn't the entire ZIP file change (it's file signature/size/timestamp certainly does). So, the browser has to re-download the entire ZIP file. Lots and lots of wasted download of stuff that didn't change.

Unless you're really suggesting that browser and server need to be able to negotiate on a single pipeline streaming connection, wherein all resources can be piped through the one connection. But then we're back to serial loading, which is *much* slower (even though you save the overhead of the additional HTTP request/response).

All in all, I think this idea is flawed from the start. I think it has no chance of actually working to improve performance in the real world. There's just too many fundamental paradigm problems of packaging files up together that loses the huge performance ability of files to be separately loaded in parallel and separately cacheable. Any paradigm where you lose those two things is just not going to work.

NOW, if we're merely talking about saving the TCP overhead of establishing a new connection for each file (but that still files would be requested individually, and in parallel), then that IS something valuable. And it already exists and is in wide-spread use. It's called "Keep-Alive".

Comment 5 Wojciech Hlibowicki 2011-04-30 06:12:09 UTC

(In reply to comment #4)
> This doesn't at all address the fact that the ZIP file has to be transferred
> serially, byte-by-byte, meaning the files are in fact NOT downloaded in
> parallel, which creates a situation where the page will load MUCH slower than
> it normally would, by disabling all parallel downloading.

Parallel downloading is more of a fix than a solution. Ideally, it is better for the end user and the servers if we minimize connections and just have an overall higher throughput, but the problem is that with every file requested, there is the overhead of sending each request with headers, which if done in parallel the delay of the sending alone becomes negligible as you are making more use of both directions of communication.

The other issue it solves is that it gives you a greater slice of the bandwidth on either side for systems set up to evenly divide network resources per connection, so more connections = greater overall slice for the application in question.

As you can not solve all the issues in another means, a smart webmaster can still divide the resources into multiple packages, with the highest priority items being added first to the packages. This would also solve some of the 'waste' you  mentioned when modifying a package for a tiny image. To be honest, most people will figure out how to setup these 'packages' and won't be merging everything under the moon, especially if they anticipate changes.

> As for your idea of having the files download in parallel somehow... the whole
> point of having a single file is to prevent multiple files from being
> delivered. I don't think you can have it both ways.

It was an alternative suggestion, the main issue here is not multiple file downloads, is the overhead in making the requests, and the amount of data sent for each file requested from the server, which is not much, but on a home connection that is being used, it can amount to enough, that when added together might create a delay in loading all resources even when in parallel.


> Now, if you're suggesting that some sort of "manifest" file could be delivered
> to the browser to tell it what all the files in the ZIP are, sure... but how
> will that help at all, if the browser still has to wait for each file to show
> up as part of the single ZIP file stream?
> 
> What we'd REALLY need for an idea like this to not hamper performance is for
> the browser (and server) to do some sort of parallel bit downloading of a
> single file, similar to bit-torrent kind of thing, where a single giant ZIP
> file could be delivered in simultaneous/parallel chunks, bit-by-bit. If you
> wanna argue for THAT, sure, go ahead. That'd be awesome. But it's a LONG way
> from being possible, as all web servers and all web browsers would have to
> support that. If either a webserver or a browser didn't, and the HTML served up
> suggested that single manifest.ZIP file, then this view of the site would be
> excruciatingly slow because all resources would default to the serial loading,
> worse than like IE3.0 kinda days.

I would not argue for that, as that would be more complex for the browsers to implement, and more problems to iron out, along with the fact that a heavily used website will become quickly boggled down as you increase the number of connections. It would be more cost efficient to minimize number of connections.

> Moreover, the separate cacheability is a concern. If I have a big manifest.ZIP
> file with all my resources in it, and I change one tiny file in that manifest,
> then doesn't the entire ZIP file change (it's file signature/size/timestamp
> certainly does). So, the browser has to re-download the entire ZIP file. Lots
> and lots of wasted download of stuff that didn't change.

Well, you can always contain version numbers or hashes within the ZIP file, and have the browser download the difference between its version and the servers, perhaps you can expand on the technology from bit torrent and download only the diff of the two ZIP files. Or just have faith that a webmaster will be competent enough not to make a big manifest with everything under the moon. Ideally, the packages should be split up into resources used on the whole site (main javascript libraries, layout specific images, and global CSS files), then check how many resources are required per individual page and either load them the way you would now, or package them into another package or two.


The other ideas are not to remove parallel downloading, but to optimize it further by allowing the browser to request multiple resources in serial, and then do the same across multiple parallel connections. It is mainly to reduce the amount of data sent due to headers by sharing headers, and before you point out that headers can change from request to request, you can include the ability to mark what elements can be transfered in this way by stating which can be grouped and which can possibly change cookies or might change due to data changing during a load. 

Perhaps another simpler solution would to be to have a way to mark up certain elements to state that minimal headers can be sent to request it, like no referrer/cookies/user agent/etc.

> All in all, I think this idea is flawed from the start. I think it has no
> chance of actually working to improve performance in the real world. There's
> just too many fundamental paradigm problems of packaging files up together that
> loses the huge performance ability of files to be separately loaded in parallel
> and separately cacheable. Any paradigm where you lose those two things is just
> not going to work.

The main problem I want to solve here is the amount of data sent with each request and the round trip time required for making a request. Since most internet connections typically have an upstream speed of 1/10th of download speed, we can theorize that the typical 300-1000 bytes required to make a request could be translated to 3kB-10kB of data we could be downloading instead, in ideal conditions, and even in not so ideal conditions, you can typically download more than you can send in the same time period, so any savings in the amount sent means great boosts in the amount you can receive. Which is obvious, as most people have been merging javascript and css files, and creating sprite images to reduce the overall requests.



> NOW, if we're merely talking about saving the TCP overhead of establishing a
> new connection for each file (but that still files would be requested
> individually, and in parallel), then that IS something valuable. And it already
> exists and is in wide-spread use. It's called "Keep-Alive".

I am not sure whether you are being sarcastic or not, but I will give you the benefit of the doubt and assume you are being genuine. My main concern is not over connection overhead, it is the overhead involved in making a request to the server. 

To be honest, you can point flaws in every idea and system and can always attribute it to some theoretical incompetent person out there, and to be honest, you will always have those, but should we really stifle progress over it? A competent webmaster would take a package system and utilize it to greatly increase the efficiency of a site, not to mention the amount saved alone from not having to always sprite/merge images, along with the extra CSS rules/bytes you don't have to do by packaging all the images into one package instead. Ideally a webmaster would take this idea, and create 2-3 packages that are optimized for the overall loading and speed of the site, which could easily double the speed of a page over the best and most extreme optimization techniques currently available.

Comment 6 Robin Berjon 2013-01-21 15:57:46 UTC

Mass move to "HTML WG"

Comment 7 Robin Berjon 2013-01-21 16:00:26 UTC

Mass move to "HTML WG"

Comment 8 Edward O'Connor 2013-02-28 23:30:05 UTC

*** Bug 15287 has been marked as a duplicate of this bug. ***

Comment 9 Travis Leithead [MSFT] 2013-03-05 00:44:44 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the Editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this document:


   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Additional Information Needed
Change Description: No spec change
Rationale: 

Based on the use cases outlined, it seems like a new transmission protocol such as HTTP 2.0 (http://en.wikipedia.org/wiki/HTTP_2.0) would fit the bill nicely, without the complications of a new link handling scheme.

You are welcome to consider producing an extension spec based on the use cases you believe are not satisfied by such a proposal as HTTP 2.0. The process around creating an extension spec may be found here:

http://www.w3.org/html/wg/wiki/ExtensionHowTo