20611 – Specify the text encoding for JWK key format

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 20611 - Specify the text encoding for JWK key format

Summary: Specify the text encoding for JWK key format

Status:	RESOLVED FIXED

Alias:	None

Product:	Web Cryptography
Classification:	Unclassified
Component:	Web Cryptography API Document (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Mark Watson
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-01-09 00:24 UTC by Mark Watson
Modified:	2014-02-20 00:57 UTC (History)
CC List:	4 users (show)

See Also:

Attachments

Description Mark Watson 2013-01-09 00:24:11 UTC

Section 14 (Crypto Interface), KeyFormat enum. We need to specify the character encoding for the JWK case.

Suggestion: insert 'UTF-8 encoded' before 'JSON'

Comment 1 Ryan Sleevi 2013-01-21 16:20:36 UTC

DOMString encodings are UTF-16, so doing so would force all callers to do a manual conversion when manipulating JWK objects in JavaScript, which is presumed to be the common case.

Comment 2 Mark Watson 2013-01-21 17:21:28 UTC

Are you saying we should specify the encoding as UTF-16, or that we should not specify the encoding ? It's not clear in your comment.

Unless you have [1], converting from UTF-8 to a Javascript string is indeed a pain. I believe you need to create a Blob from the ArrayBuffer and then use FileReader's readAsText( blob, encoding ).

But UTF-16 is not straightforward either: you can copy characters one by one but you have to watch out for the 4 byte codes in UTF-16 that are not supported in JS strings (which are UCS-2).

UTF-8 would be convenient if you were going to build the exported string into a larger UTF-8 message and wanted to avoid the bloat of UTF-16. Equally if you're going to encrypt the resulting ArrayBuffer.

I don't know what will be most common: parsing the JWK into a JS object, or applying some other transformation ?

How widely supported is [1] ?

[1] http://encoding.spec.whatwg.org/

Comment 3 Ryan Sleevi 2013-01-24 01:09:19 UTC

I'm not sure I agree with you on the UTF-8 preference, because that incurs additional pain for any JavaScript user using any of the JOSE keys as DOMStrings, which we imagine will be the common case for JOSE users (given that any use with XHR will return DOMStrings)

Further, there are additional complexities regarding JWK, particularly as it applies to wrapped keys, because JWK (like JSON) lacks a canonical format, unlike all of the other supported key formats.

Let's bring it to the list for discussion to see if we can find a resolution here.

Comment 4 Alexey Proskuryakov 2013-11-08 23:21:33 UTC

Did this ever get to the mailing list? I couldn't find that in archives.

+1 to specifying UTF-8 as suggested by Mark.

It seems like the regular scenario for JWK will be getting an ArrayBuffer straight from XMLHttpRequest. And JWK spec requires using UTF-8 with application/jwk+json MIME type.

If someone builds a JWK from scratch, they already likely have to perform various conversions manually, such as base64url. Adding an UTF-8 conversion to that wouldn't make much of a difference.

Besides, how often are JWK serialized strings going to be anything by pure ASCII? In this case, one will only need a simple asciiToArrayBuffer() helper for this kind of work.

Comment 5 Ryan Sleevi 2013-11-09 05:45:58 UTC

(In reply to Alexey Proskuryakov from comment #4)
> Did this ever get to the mailing list? I couldn't find that in archives.

No. For whatever reason, the lists are not getting auto-cc'd.

> 
> +1 to specifying UTF-8 as suggested by Mark.

Can you explain how this is compatible with JS / DOMStrings, which deals in UTF-16? Isn't that part of the problem?

If you pass a DOMString, how do you interpret it? Is it UTF-16? Is it a series of bytes that are 'really' UTF-8? Is it only valid ASCII characters (even though JWK totally supports the full rage, as it's 'simply' JSON)

> 
> It seems like the regular scenario for JWK will be getting an ArrayBuffer
> straight from XMLHttpRequest. And JWK spec requires using UTF-8 with
> application/jwk+json MIME type.

JWK requires using UTF-8. But that will be converted to UTF-16 when exposed to the XMLHttpRequest, as per DOMString conversion rules.

> 
> If someone builds a JWK from scratch, they already likely have to perform
> various conversions manually, such as base64url. Adding an UTF-8 conversion
> to that wouldn't make much of a difference.
> 
> Besides, how often are JWK serialized strings going to be anything by pure
> ASCII? In this case, one will only need a simple asciiToArrayBuffer() helper
> for this kind of work.

I'm extremely uncomfortable with this position, because it's inconsistent with the statement that JWK "is JSON". You're instead arbitrarily restricting it to a specific subset of JSON that is not compatible with JSON.parse (among other things).

Have I missed something here in the reading of DOMString, WebIDL, and ES? Isn't that the crux of the conflict? If I get a response from XHR in a DOMString, what are those bytes? How are they exposed to JS? And how are they interpreted by the DOM bindings when converting back from DOMString into a system that the underlying cryptographic library supports.

Comment 6 Alexey Proskuryakov 2013-11-09 19:00:31 UTC

I think that an example of expected usage may be the best way to explain my thinking.

var req = new XMLHttpRequest;
req.open("GET", url, true);
req.responseType = "arraybuffer";
req.send();
req.onload = function() {
   crypto.subtle.importKey("jwk", req.response, null).then(...)
}

req.response is already an ArrayBuffer, there are no DOM strings involved.

This is not an unprecedented problem, strings are mapped to UTF-8 typed arrays elsewhere in the web platform. See for example <https://developer.mozilla.org/en-US/docs/Web/JavaScript/Base64_encoding_and_decoding>.

If web developers end up needing to convert DOM strings to UTF-8 ArrayBuffers often, and if copy/pasting strToUTF8Arr code from this page becomes a problem for them, we can always add a built-in function for this. But that's outside WebCrypto scope, we should just use UTF-8 here.

Comment 7 Mountie Lee 2013-11-14 07:30:28 UTC

Unicode can be recommended but can not be forced.
considering local encoding users (EUC-KR, GB2312, Shift-JIS...)

Comment 8 Mark Watson 2013-11-14 08:06:13 UTC

This is for JWK which is a JSON structure and therefore by definition Unicode (See http://www.ietf.org/rfc/rfc4627.txt?number=4627 Section 3).

The default encoding according to this RFC is UTF-8. Since we are using ArrayBuffers for input/output to/from the WebCrypto operations that will accept/produce JWK objects, it seems natural to use this encoding.

If conversion is especially problematic and the usecase of manipulating a JWK as a Javascript object is expected to be common, then we could consider supporting multiple encodings for import/export. This would require us to add an encoding parameter to the operation specification somehow.

Comment 9 Alexey Proskuryakov 2013-11-18 19:05:23 UTC

I agree that it's extremely unlikely that any other encoding will ever be needed, given that MIME type registration for application/jwk+json requires UTF-8.

In any case, the answer will be the Encoding spec <http://encoding.spec.whatwg.org>, which also provides a trivially easy way to convert between DOM strings and UTF-8 Uint8Arrays. For implementations that implement WebCrypto but don't implement the Encoding spec, it's still doable in JS, as in the link given in comment 6.

Comment 10 Mark Watson 2013-12-10 17:59:12 UTC

Ryan,

Replying explicitly to your questions. I think we are essentially waiting on you on this one:

(In reply to Ryan Sleevi from comment #5)
> (In reply to Alexey Proskuryakov from comment #4)
> > Did this ever get to the mailing list? I couldn't find that in archives.
> 
> No. For whatever reason, the lists are not getting auto-cc'd.
> 
> > 
> > +1 to specifying UTF-8 as suggested by Mark.
> 
> Can you explain how this is compatible with JS / DOMStrings, which deals in
> UTF-16? Isn't that part of the problem?
> 
> If you pass a DOMString, how do you interpret it? Is it UTF-16? Is it a
> series of bytes that are 'really' UTF-8? Is it only valid ASCII characters
> (even though JWK totally supports the full rage, as it's 'simply' JSON)

A DOMString is a sequence of 16-bit units, each of which is a UTF-16 codepoint. Mostly, each 16-but unit is a single Unicode character, but sometimes it can take two 16-bit units to represent one Unicode character.

An Ecmascript string, but contrast, is just a sequence of 16 bit integers. There seem to be suggestions that these are generally considered as UCS-2 which I understand means UTF-16 without the two-unit characters, but elsewhere it says they are usually interpreted as UTF-16.

A UTF-8 message body read from XmlHttpRequest as a DOMString will be converted to UTF-16. If you read it as an ArrayBuffer it will stay in UTF-8.

This is clear and outside the scope of WebCrypto. What we are asking is what format the ArrayBuffer input/output of import/export should be. It can work either way:
- if we specify this to be UTF-16, then conversion to a JS String for input to JSON.parse is easy as it can be done 16-bit unit by 16-bit unit (modulo looking out for the surrogate pairs if JS String really is UCS-2). However, conversion to UTF-8 for serialization in an outgoing message might be harder. If one's outgoing message is entirely text, one could construct the whole message in a String and then hope that XmlHttpRequest serializes this as UTF-8, though I am not sure if it can be made to do that.
- if we specify this to be UTF-8, then conversion to a JS string is harder, unless you have support for the encoding specification. But still it is not hard. Building an outgoing message which is explicitly UTF-8 in an ArrayBuffer is very easy.

So, it depends which is the most likely usecase. It seems to me that UTF-8 will make the more common case easiest and the less common case not too hard. UTF-16 makes both cases a little tricky.

> 
> > 
> > It seems like the regular scenario for JWK will be getting an ArrayBuffer
> > straight from XMLHttpRequest. And JWK spec requires using UTF-8 with
> > application/jwk+json MIME type.
> 
> JWK requires using UTF-8. But that will be converted to UTF-16 when exposed
> to the XMLHttpRequest, as per DOMString conversion rules.
> 
> > 
> > If someone builds a JWK from scratch, they already likely have to perform
> > various conversions manually, such as base64url. Adding an UTF-8 conversion
> > to that wouldn't make much of a difference.
> > 
> > Besides, how often are JWK serialized strings going to be anything by pure
> > ASCII? In this case, one will only need a simple asciiToArrayBuffer() helper
> > for this kind of work.
> 
> I'm extremely uncomfortable with this position, because it's inconsistent
> with the statement that JWK "is JSON". You're instead arbitrarily
> restricting it to a specific subset of JSON that is not compatible with
> JSON.parse (among other things).

Yep - there's no reason for us to restrict to ASCII, though a specific application might know that the JWKs it will deal with all meet that restriction.

> 
> Have I missed something here in the reading of DOMString, WebIDL, and ES?
> Isn't that the crux of the conflict? If I get a response from XHR in a
> DOMString, what are those bytes?

Unicode characters represented in UTF-16. But if you get them in an ArrayBuffer they retain their original wire encoding.

 How are they exposed to JS? And how are
> they interpreted by the DOM bindings when converting back from DOMString
> into a system that the underlying cryptographic library supports.

AFAIK, we don't have DOMString inputs/outputs to our methods. It's all ArrayBuffers.

Comment 11 Mark Watson 2014-02-05 23:06:32 UTC

We essentially have two choices here: UTF-8 or UTF-16.

The JWK specification recommends UTF-8 wherever it considers encoding (for example for the application/jwk+json MIME type registration).

Since all of the existing (pre-standard) WebCrypto implementations generate / consume UTF-8 I think we should consider this as 'input from implementors' and resolve this as UTF-8.

If I don't hear otherwise, I will go ahead and make this change in the Editor's Draft.

Comment 12 Mark Watson 2014-02-20 00:57:02 UTC

Resolved in favor of UTF-8 after offline discussion with Ryan.

https://dvcs.w3.org/hg/webcrypto-api/rev/66bec4453de5