22731 – Should atob() trim spaces, or not?

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22731 - Should atob() trim spaces, or not?

Summary: Should atob() trim spaces, or not?

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	Needs Impl Interest
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:	http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-07-19 07:38 UTC by contributor
Modified:	2014-09-26 22:08 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description contributor 2013-07-19 07:38:19 UTC

Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html
Multipage: http://www.whatwg.org/C#atob
Complete: http://www.whatwg.org/c#atob
Referrer: http://www.whatwg.org/specs/web-apps/current-work/multipage/workers.html

Comment:
Based on my testing, step 3 (remove all space characters from input) does not
match the current behavior of WebKit, Blink, Firefox and IE10. Should the
specification be updated to match the behavior of all major browsers?

Posted from: 91.154.118.240
User agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36

Comment 1 Chris Dumez 2013-07-19 07:42:14 UTC

For e.g. the following throws an InvalidCharacterError on all browsers I could test:
atob("abcd ");

This case is part of the following test suite and is expected to succeed:
http://www.w3c-test.org/html/tests/submission/AryehGregor/base64.html

Comment 2 Ian 'Hixie' Hickson 2013-10-23 22:09:30 UTC

Seems like a case where improving the implementations might be in order... unless anyone depends on the exception. It's common to have white space in base64 data.

Comment 3 Simon Pieters 2013-10-24 11:01:25 UTC

Yeah the space stripping was a deliberate change to ease the burden on authors and to not waste memory on creating a new string without the whitespace.

Comment 4 Alexey Proskuryakov 2013-11-06 18:21:21 UTC

Breaking on non-alphabet characters is an explicit recommendation of RFC 4648, backed by security considerations (see <http://tools.ietf.org/html/rfc4648#section-12>).

It seems like a bad idea to diverge from authoritative spec and from all implementations at the same time.

Comment 5 Ian 'Hixie' Hickson 2013-11-06 19:25:23 UTC

The security issues seem pretty minor, but at the end of the day, it's up to the implementors whether they want to do this or not.

Comment 6 Chris Dumez 2013-11-07 17:37:40 UTC

FYI, this is now implemented in Blink:
https://code.google.com/p/chromium/issues/detail?id=314682

Comment 7 Alexey Proskuryakov 2013-11-07 17:53:41 UTC

> at the end of the day, it's up to the implementors whether they want to do this or not.

This doesn't really add up. Can't exactly the same be said of every spec mistake, word for word?

Having 1:1 mapping between original and encoded forms is a really nice trait, adding randomness to the process is strange to say the least.

Comment 8 Ian 'Hixie' Hickson 2013-11-07 22:37:02 UTC

> This doesn't really add up. Can't exactly the same be said of every spec
> mistake, word for word?

Yes. At the end of the day, the WHATWG HTML spec is just going to spec whatever browsers end up implementing; my job as editor is just to propose what I think (based on the data and arguments I can find and that people bring up) is the optimal behaviour. I don't file bugs on browsers to implement what I spec, or apply pressure in the form of patches, test cases, or even, except in rare cases, directed advocacy, because I want implementors to carefully consider the text and independently review it and decide whether it's sane or not before implementing.

On this particular feature, there's minor security concerns on the one side (people who compare base64 data before decoding it, I guess? I don't really understand how we end up with a security problem here), and there's minor performance wins on the other side (spaces in base64 data are common, and not requiring that scripts strip the spaces is a minor win). The back-compat issues don't seem major (it's unlikely that people are relying on this throwing an exception on spaces in a way that they'd act worse if it ignored spaces, as far as I can tell). Thus, on the whole, the current text seems like a win to me. However, I could be wrong, and if implementors think I am then they shouldn't implement it, and if they don't, then we'll move the spec back to what they do implement.

Comment 9 Alexey Proskuryakov 2013-11-08 07:20:35 UTC

Ian, was there any evidence of this being an actual measurable performance improvement on any real life web sites?

We already had interoperability across all browsers, supported by RFC recommendation and security concerns, as minor as those may be. The behavior was conceptually cleaner, we don't want to be liberal in what we accept unless absolutely necessary.

The bar was quite high, and I do not think that this change came anywhere close to meeting it.

Comment 10 Ian 'Hixie' Hickson 2013-11-08 23:51:09 UTC

I don't have a strong feeling about this. (As far as I recall, this wasn't even something I originally specced, it was Aryeh's good work.) As noted in comment 8, for me it's just a minor performance win vs a minor security risk; I don't really see this as a particularly important issue one way or the other. If the conclusion from implementors is that this is a mistake, then let's change it.

Comment 11 Alexey Proskuryakov 2013-11-21 17:21:56 UTC

So, let's change it back?

Comment 12 Ms2ger 2013-11-21 19:58:19 UTC

FWIW: Firefox matches the spec as of Firefox 27 (currently on Aurora); see https://bugzilla.mozilla.org/show_bug.cgi?id=711180

Comment 13 Alexey Proskuryakov 2013-11-21 20:14:25 UTC

Let's fix the spec quickly then, before the change is shipped.

Comment 14 Chris Dumez 2013-11-21 20:26:15 UTC

So Firefox and Blink already follow the specification so it looks like there was interest from  browser vendors.

There is also a patch up-for-review to implement this in WebKit so we are actually really close to cross-browser support.

Comment 15 Alexey Proskuryakov 2013-11-21 20:49:19 UTC

Was there an analysis of these issues performed by Mozilla or Google?

Again, this is a basic feature of Base64 encoding. Generally, it's a very desirable trait for any encoding that you can't add undetectable noise to it. Many applications of Base64 (notably JOSE, used in WebCrypto) additionally forbid padding with '=' characters. So for security applications, space and '=' padding is forbidden.

Do we really want to re-evaluate all applications of Base64 just to make this small silly change?

Comment 16 Ian 'Hixie' Hickson 2013-12-10 22:23:19 UTC

cc'ing abarth who may have an opinion about the security implications, since Chrome does this.

Comment 17 Adam Barth 2014-02-07 23:44:14 UTC

I'm not sure I understand what the security issue is supposed to be.  The RFC seems worried about covert channels, which aren't usually a threat model we worry about in the web platform

Comment 18 Ian 'Hixie' Hickson 2014-09-26 21:26:38 UTC

Given the lay of the land, I think the logical thing to do is to leave the spec as-is instead of changing it to match the Safari behaviour. I understand that this is not something where everyone is on the same page, but I don't see how else to make progress.

Comment 19 Alexey Proskuryakov 2014-09-26 22:08:54 UTC

It's not "Safari behavior", it's everyone's behavior before this essentially unmotivated spec change was made :-/