Bug 17620 - Add steps to convert a sequence of Unicode characters to a DOMString
Add steps to convert a sequence of Unicode characters to a DOMString
Status: RESOLVED WONTFIX
Product: WebAppsWG
Classification: Unclassified
Component: WebIDL
unspecified
All All
: P2 enhancement
: ---
Assigned To: Cameron McCormack
public-webapps-bugzilla
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-06-27 20:31 UTC by Joshua Bell
Modified: 2012-06-28 22:12 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joshua Bell 2012-06-27 20:31:26 UTC
The Web API proposed in http://wiki.whatwg.org/wiki/StringEncoding requires interpretation of DOMString code units as an encoding of Unicode characters for the purpose of encoding and decoding DOMStrings to other binary encodings.

WebIDL defines "steps to convert a DOMString to a sequence of Unicode characters" at http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode

The proposed API require the reverse as well, and defines "steps to convert a sequence of Unicode characters to a DOMString" at http://wiki.whatwg.org/wiki/StringEncoding#Steps_to_convert_a_sequence_of_Unicode_characters_to_a_DOMString

Would it be possible to add the latter to the WebIDL specification so that both directions are defined in one place?

The proposed text could be (sans-formatting, using _ for subscript and ^ for superscript):

The following algorithm defines a way to convert a sequence of Unicode characters to a DOMString:
1. Let U_0...n-1 be the sequence of Unicode characters
2. Initialize i to 0
3. Initialize S to be an empty sequence of code units
4. While i < n
    1. Let c be the code point of the Unicode character in U at index i
    2. If c ≥ 2^16, then:
        1. Append to S a code unit equal to (c - 2^16) / 2^10 + 0xD800, where "/" represents integer division.
        2. Append to S a code unit equal to (c - 2^16) % 2^10 + 0xDC00, where "%" represents the remainder of an integer division.
    3. Otherwise, append to S a code unit equal to c.
    4. Set i to i+1
5. Return the IDL DOMString value that represents sequence of code units S.
Comment 1 Cameron McCormack 2012-06-28 02:38:36 UTC
I think the reason that I didn't include this reverse algorithm was because there's only one correct way of converting a Unicode string into UTF-16 code units (whereas going the other way you need to deal with illegal UTF-16 sequences so there were some different approaches we could have taken) so you could probably just write a single line in your spec saying for example

  Let s be the DOMString that represents the sequence of code units resulting
  from encoding the sequence of Unicode characters t as UTF-16.

I guess I just want to avoid re-specifying the UTF-16 encoding algorithm.  But if you think the above is not precise enough I guess I can add your suggested text.
Comment 2 Joshua Bell 2012-06-28 15:53:15 UTC
(In reply to comment #1)
> I guess I just want to avoid re-specifying the UTF-16 encoding algorithm.  But
> if you think the above is not precise enough I guess I can add your suggested
> text.

You're right, that should be fine. If anyone complains about not having it detailed it shouldn't be problematic to add it later since, as you point out, there's only one way to do it.
Comment 3 Cameron McCormack 2012-06-28 22:12:18 UTC
Thanks.