23348 – The loaded property doesn't make sense when the body is compressed

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23348 - The loaded property doesn't make sense when the body is compressed

Summary: The loaded property doesn't make sense when the body is compressed

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WebAppsWG
Classification:	Unclassified
Component:	XHR (show other bugs)
Version:	unspecified
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Anne
QA Contact:	public-webapps-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-09-25 01:22 UTC by Marcelo Volmaro
Modified:	2013-09-26 15:55 UTC (History)
CC List:	4 users (show)

See Also:

Attachments

Description Marcelo Volmaro 2013-09-25 01:22:49 UTC

As far as I understand from the current Draft, when the body (defined as entity_body) is sent in compressed form, the "loaded" property will return values way bigger than the value you can actually read from the "total" property.

That doesn't makes any sense, as it makes the event information unusable. Right now, most browsers agreed on returning values as stated above, so for example, I get "loaded:16845, total: 2324" on one of the test cases.

I believe that instead of entity_body, a new entity should be defined that it is the actual transferred body, without any decoding. Or, the Progress Event specification needs to be adjusted so instead of the entity_body, another definition is used that resembles the actual transferred body instead of the decoded body.

Comment 1 Anne 2013-09-25 12:58:41 UTC

http://xhr.spec.whatwg.org/ seems pretty clear on this. I don't see how the HTTP entity body would suddenly be the uncompressed form. That's not what's being transfered.

Comment 2 Marcelo Volmaro 2013-09-25 14:02:45 UTC

I don't see how either, but it is what all browsers seems to do. Maybe they are all wrong. 

I believe the problem is on the definition of the entity body, as defined in http://tools.ietf.org/html/rfc2616#section-7.2 "The entity-body is obtained from the message-body by decoding any Transfer-Encoding that might have been applied to ensure safe and proper transfer of the message."

So, since the entity body is decompressed, the reported length doesn't holds any relationship with the total.

Comment 3 Glenn Maynard 2013-09-25 14:53:20 UTC

If you want the amount of data received over the network, you want the RFC2616 message body, not the entity body.

Comment 4 Marcelo Volmaro 2013-09-25 15:16:27 UTC

@Glenn Maynard: Agreed. But that's not what the definition in http://xhr.spec.whatwg.org/ says. In fact, under http://xhr.spec.whatwg.org/#response-entity-body-0, the body is defined as per RFC2616 entity body.

So, as I originally commented, maybe that definition (entity body) should be replaced by the definition of message body.

Comment 5 Glenn Maynard 2013-09-25 15:27:36 UTC

That's what I just said.

Comment 6 Boris Zbarsky 2013-09-25 16:31:01 UTC

I think there is some confusion here.  In general, you start off with some data (1).  Then you apply content-encodings to get a different set of data (2).  Then you apply transfer-encodings to get a third set of data (3).

What's transferred over the network is (3): the RFC 2616 message-body.

The RFC 2616 entity-body is (2).  This is typically what Content-Length refers to.

The data the application actually receives from the HTTP library in browsers is typically (1).  That's certainly what happens in Gecko.

The current spec says to show the number of bytes of (2) that have been transferred for "loaded", so the number will in some way correspond to "total".

Actual UAs simply don't have that information right now, so they're showing the number of bytes the HTTP layer has handed them instead, which are bytes of (1), not (2).

(In reply to Marcelo Volmaro from comment #2)
> I don't see how either, but it is what all browsers seems to do. Maybe they
> are all wrong. 

Per spec, yes they are.

This is not a spec/definition problem; it's a pure implementation problem as far as I can tell.

Comment 7 Glenn Maynard 2013-09-25 17:18:42 UTC

(In reply to Boris Zbarsky from comment #6)
> I think there is some confusion here.

There's some confusion in the meaning of the Content-Length header.  The Content-Length is the size of the entity body, that is, *after* reversing Transfer-Encoding.  It's not the number of bytes coming over the network (the message body).  So, the message body doesn't need to be used here.

>  In general, you start off with some
> data (1).  Then you apply content-encodings to get a different set of data
> (2).  Then you apply transfer-encodings to get a third set of data (3).
>
> What's transferred over the network is (3): the RFC 2616 message-body.

(Note that you're describing this in reverse, from the POV of the sender rather than the receiver.  That confused me a bit below, until I noticed it and flipped it around mentally.)

> This is not a spec/definition problem; it's a pure implementation problem as
> far as I can tell.

Browsers should at least zero the field if they can't implement it, not give bogus data that will turn into a web-compat issue.

A possible change would be for .total and .loaded to be the number of bytes after decoding C-E, so they correspond with the data actually visible to the XHR result.  That's more consistent, and for .loaded it's what implementations are already doing.

However, that would mean .total wouldn't be known when Content-Encoding is in use, and .lengthComputable would be false.  That would fix the inconsistent results the OP is seeing, but means you can't show a progress bar for C-E: gzip.

The advantage of using the entity body length is that the relative length is known even with C-E, so you can show a progress bar more of the time.  That's better, but only if browsers will implement it.  If implementors are planning to fix this, I think the spec is fine the way it is now.

Comment 8 Marcelo Volmaro 2013-09-25 17:44:13 UTC

But it seems that most servers (well, at least IIS with .net and apache with php) set the Content-Length to the length of the compressed data, not the length of the uncompressed data (so, to the length of the message body, not the length of the entity body).

That's the problem. Right now, on my tests both servers set the content-length to 2324 (the size of the gzipped body), while the real size is 16845.

Since the body can be loaded in one packet, I only get one progress event fired that shows me that from a total of 2324 bytes, the browser loaded 16845. That's totally wrong.

Comment 9 Boris Zbarsky 2013-09-25 17:52:31 UTC

Marcelo, you're still being confused, I think.

There are _two_ kinds of HTTP "compression".  One (Content-Encoding) affects Content-Length.  One (Transfer-Encoding) does not.  The entity-body is the data after Content-Encoding but before Transfer-Encoding; the Content-Length is by definition the length of the entity-body in HTTP.

The message-body, which is what Transfer-Encoding produces, is a transient phenomenon that can be changed in transit (e.g. a proxy is allowed to change the Transfer-Encoding, but not the Content-Encoding).  Note that Transfer-Encoding can make the message bigger as well as smaller (e.g. see "chunked" Transfer-Encoding).  Which is why in practice no one should be caring about the message-body.

The XMLHttpRequest spec currently calls for both "total" and "loaded" to reflect the data after Content-Encoding but before Transfer-Encoding.

Browsers currently show "total" per the spec, but "loaded" after undoing Content-Encoding.  As in, they're not doing what the spec says.

Comment 10 Marcelo Volmaro 2013-09-25 21:04:01 UTC

Boris: I get that, and that's what I understand. But I can't believe that 4 different browser makers (Opera, Mozilla, Apple and Google) agreed on disagree with the specifications. So, either the specifications are not clear, or I really don't know what to think about these companies. I even can understand that Apple/Google may have the same behavior, as they started from the same codebase. But not from Opera (and I'm talking about Opera 12, with the old Presto engine) and/or Mozilla.

I see two paths here: 
- One where the content is sent uncompressed: this works as expected on all browsers.
- One where the content is sent compressed (gzip/deflate, it doesn't matter): this is not working as expected, but all browsers agreed on not working as expected :)

I really don't care if the total/loaded is calculated from the number of transferred bytes or from the size of the uncompressed body (that may/may not be the same as the number of transferred bytes). I believe it would be easier if they are calculated from the number of transferred bytes, so you don't need to know the uncompressed size beforehand. And I believe that is what needs to be clarified on the specifications.

Comment 11 Glenn Maynard 2013-09-25 23:24:11 UTC

I'm not seeing inconsistent .total and .loaded.  My test case is here: https://zewt.org/~glenn/test-xhr-progress-events.html.  It downloads a 5000000-byte text file, compressed to 1370401 bytes with Content-Encoding: gzip.  I set this up to ensure a Content-Length header is sent (Apache's internal deflate support uses chunked, which defeats this test).

- Firefox 23 (needs retesting with 24) gives:

(loaded, total, lengthComputable)
0, 3153971, false
5000000, 5000000, false

It gives a bunch of progress messages with an e.total of 0, an e.length of the fully decompressed size, and a false e.lengthComputable.  This doesn't match the spec (as I understand it), but it matches the suggestion I made above.

(The final message has an e.total set to the final size, which seems fine.  e.lengthComputable is still false, which is a bug, but that doesn't matter here.)

- Chrome 29:

0, 5336, false test-xhr-progress-events.html:6
0, 2699630, false test-xhr-progress-events.html:6
0, 4865055, false test-xhr-progress-events.html:6
0, 5000000, false 

It does the same thing as Firefox, except without e.total set in the final message.

- IE10:

 18446744073709552000, 4096, false 
 18446744073709552000, 8192, false 
 ...
 18446744073709552000, 4994966, false 
 18446744073709552000, 4999062, false 
 18446744073709552000, 5000000, false 

(Wow, that's broken: a garbage .total, and no onprogress rate limiting at all.  Even better, its network console lies: the Content-Encoding header isn't shown at all, even though sniffing the connection I can see that it's being sent.  I guess IE is still IE.)  But, the .loaded value matches the others.

So, I'd recommend the spec use the number of post-Content-Encoding bytes for .total and .progress, rather than the entity body (before C-E and T-E).

Comment 12 Marcelo Volmaro 2013-09-26 00:49:39 UTC

Opera 12 gives me:

1370401, 2854194, true
1370401, 3229076, true
1370401, 3602736, true
1370401, 3928678, true
1370401, 4271418, true
1370401, 4402656, true
1370401, 4530428, true
1370401, 4900050, true
1370401, 5023252, true
1370401, 5397756, true
1370401, 5740824, true
1370401, 5841832, true
1370401, 6197970, true
1370401, 6328104, true
1370401, 6823498, true
1370401, 7074506, true
1370401, 7439940, true
1370401, 7755220, true
1370401, 8120916, true
1370401, 8493980, true
1370401, 8618854, true
1370401, 8997026, true
1370401, 9240456, true
1370401, 9465826, true
1370401, 9665664, true

Content-Encoding: gzip
Content-Length: 1370401

My interpretation of this is that Total is correct (as it is fetched from the content-length), but loaded is wrong. Why? Because you can't know the uncompressed total until all the content has been loaded.

Comment 13 Boris Zbarsky 2013-09-26 02:07:19 UTC

Marcelo,

> So, either the specifications are not clear,

The specification used to say something somewhat different from what it does now.  It was changed, for exactly the reason you describe somewhat recently (last year?).  Updating to the current spec draft for XHR doesn't seem like a high priority for browsers, for various reasons.

Note also that reporting the length of uncompressed data received is an obvious consequence of the HTTP library handling decompression, so it could in fact happen easily in different browsers, because having the HTTP library handle decompression is a very reasonable design decision.

> And I believe that is what needs to be clarified on the specifications.

It's already very clear in the specification as it is today.

> but loaded is wrong. Why?

Because getting the number of entity-body bytes processed so far is not trivial with most HTTP libraries?

Glenn,

> My test case is here: https://zewt.org/~glenn/test-xhr-progress-events.html.

In Gecko this testcase is fundamentally not doing what you'd think because Gecko uses progress notifications from the HTTP library to drive its "total" reporting, but it needs to ask for those at send() time.  Try adding you onprogress listener before you call send()...

Yes, it's broken.  It's all very broken.

Comment 14 Glenn Maynard 2013-09-26 02:41:41 UTC

I didn't test Opera.  Since it's switching to WebKit, it's not a very interesting test target any longer.

(In reply to Boris Zbarsky from comment #13)
> In Gecko this testcase is fundamentally not doing what you'd think because
> Gecko uses progress notifications from the HTTP library to drive its "total"
> reporting, but it needs to ask for those at send() time.  Try adding you
> onprogress listener before you call send()...
> 
> Yes, it's broken.  It's all very broken.

(I suppose I don't really need to know the details here, but how can the order matter?  The events aren't fired until returning from the script anyway.)

I don't think this detail affects my suggestion to use the post-C-E values instead of the in-between-CE-and-TE values.  That already matches the behavior for .loaded in all browsers, so it seems like the only change browsers would need to do is to ensure that total == 0 and lengthComputable == false if it doesn't know the size. (Chrome already does that, IE10 almost does that except for setting length to 18446744073709552000 instead of 0, and Firefox sometimes does that, apparently depending on celestial alignment.)

Comment 15 Boris Zbarsky 2013-09-26 02:52:13 UTC

> but how can the order matter? 

It matters because the way XHR in Gecko asks the network library for sane progress events is via a boolean in the data it hands it when it starts the network load.  That data can't change once the load is started.  So Gecko has to decide inside send() whether it wants sane progress information or not.  If not, then it'll still synthesize some sort of progress events based on how much data it's gotten, but they won't be terribly useful, as you discovered.

> I don't think this detail affects my suggestion to use the post-C-E values
> instead of the in-between-CE-and-TE values.

There is no sane way to use the post-C-E value for "total".  You don't know what that is until the load is done.

Might as well spec "total" as always returning 0 and lengthComputable as always being false be done with it: that's all UAs will _ever_ be able to do for C-E cases if you want post-C-E values.

Comment 16 Glenn Maynard 2013-09-26 03:18:58 UTC

(In reply to Boris Zbarsky from comment #15)
> There is no sane way to use the post-C-E value for "total".  You don't know
> what that is until the load is done.

That's what I said:
> browsers would need to do is to ensure that total == 0 and lengthComputable
> == false if it doesn't know the size.


> Might as well spec "total" as always returning 0 and lengthComputable as
> always being false be done with it: that's all UAs will _ever_ be able to do
> for C-E cases if you want post-C-E values.

Err, no.  It's only 0/false if C-E is in use, not always.  That's true for a lot of smaller files (scripts, CSS), but probably uncommon for large files, where it's most important to have the total (since large files are typically internally compressed, eg. video and music).

If vendors are actually planning on fixing this to match spec, then leaving the spec alone is fine.  It's a better end result, since progress bars will be available more often.  But, with strong interop on the post-C-E behavior, I wonder if it's worth the risk (of only some browsers changing, and ending up losing the interop we have).

Comment 17 Marcelo Volmaro 2013-09-26 14:22:16 UTC

So, to sum up: If the content is compressed, there is no way to get a progress (in terms of a percentage from a total). The only thing I can get is the number of bytes downloaded AFTER the decompression.

I refuse to believe that there is no way, in the system, to get the number of downloaded bytes (and I don't think that a specification needs to be tied to what a system can/can't do - doing so means that we are tied to what the lowest implementation of a system can do). I really don't know how the HTTP system libraries work so I can't comment on that, but doing a small test with .net, I can get Glenn's file, compressed, and get the number of downloaded bytes from the stream. 

I don't think .net is downloading the file through the HTTP libraries in uncompressed form and compressing it back only to give me a compressed stream. 

Also, if I try to download the file, this is what I get:
- With Opera, I get a progress bar over the downloaded bytes. Just what I expect.
- With Chrome, I get an undetermined download progress.
- With Firefox, I get a progress bar that shows the compressed size, but the file continues downloading even after the number of downloaded bytes were way bigger than the total.

So, .net and Opera work as I expected, Chrome/Firefox don't. Since there are at least two implementations (on Windows) that can read/determine the correct number of downloaded bytes, I don't think the problem is with the underlining HTTP libraries.

Comment 18 Boris Zbarsky 2013-09-26 14:41:21 UTC

> So, to sum up: If the content is compressed, there is no way to get a
> progress (in terms of a percentage from a total).

In UAs as things stand.

> I refuse to believe that there is no way, in the system, to get the number
> of downloaded bytes

There is currently no way, in Gecko, to get that information.  The network library simply doesn't keep track of it.  Obviously that can be fixed, with enough effort.

> (and I don't think that a specification needs to be tied
> to what a system can/can't do

And it's not: right now it specifies something Gecko currently cannot do, for example.

> I really don't know how the HTTP
> system libraries work so I can't comment on that,

I don't think any browser uses a system HTTP library: they typically don't do what browsers need to do.

> but doing a small test
> with .net, I can get Glenn's file, compressed, and get the number of
> downloaded bytes from the stream. 

Yes, but the point is a browser wants to get the file _uncompressed_ in this case.  And in fact pretty much always (there is only one exception I can think of, and that's when the file is being saved and has a .gz or .gzip extension).

> I don't think the problem is with the
> underlining HTTP libraries.

You just talked about 4 different HTTP libraries.

Comment 19 Glenn Maynard 2013-09-26 15:02:28 UTC

(In reply to Glenn Maynard from comment #16)
> I wonder if it's worth the risk (of only some browsers changing, and ending
> up losing the interop we have).

On the other hand, maybe it's a fairly innocuous interop issue even if that does happen.  I'll reclose this.  If it ends up looking like the change to use post-C-E byte counts is needed then we should file a separate bug, since it's not actually related to this one: the "out of sync loaded and total" issue is a browser bug in Firefox and old-Opera.

Comment 20 Marcelo Volmaro 2013-09-26 15:55:18 UTC

(In reply to Boris Zbarsky from comment #18)
 
> > but doing a small test
> > with .net, I can get Glenn's file, compressed, and get the number of
> > downloaded bytes from the stream. 
> 
> Yes, but the point is a browser wants to get the file _uncompressed_ in this
> case.  And in fact pretty much always (there is only one exception I can
> think of, and that's when the file is being saved and has a .gz or .gzip
> extension).

The browser uses (by what you said) its own HTTP library to fetch data. What then it does with that data will depend on who asked for that data, but the data is transferred as a bunch of bytes. From my point of view, the JS engine asks for some data. The engine asks the HTTP library for it. It is ok (and makes total sense) that what the JS engine receives the data uncompressed since the compression is part of the underlining transmission protocol. It is even ok that the underlining HTTP library feeds that data to the JS engine uncompressed. But the progress event can't be handled by the JS engine. Is has to be handled by the HTTP library. Then, the library can report back to the JS engine the correct progress.

I don't know how that's being handled right now on each browser. If the specification states what I said above (basically, that you can always get a proper progress indicator if the content-length is specified), and right now all browsers are doing it wrong, that's ok. It is their fault and they may/may not fix that. But it is clear that they are the ones doing things wrong.