This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17620 - Add steps to convert a sequence of Unicode characters to a DOMString
Summary: Add steps to convert a sequence of Unicode characters to a DOMString
Status: RESOLVED WONTFIX
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: WebIDL (show other bugs)
Version: unspecified
Hardware: All All
: P2 enhancement
Target Milestone: ---
Assignee: Cameron McCormack
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-06-27 20:31 UTC by Joshua Bell
Modified: 2012-06-28 22:12 UTC (History)
2 users (show)

See Also:


Attachments

Description Joshua Bell 2012-06-27 20:31:26 UTC
The Web API proposed in http://wiki.whatwg.org/wiki/StringEncoding requires interpretation of DOMString code units as an encoding of Unicode characters for the purpose of encoding and decoding DOMStrings to other binary encodings.

WebIDL defines "steps to convert a DOMString to a sequence of Unicode characters" at http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode

The proposed API require the reverse as well, and defines "steps to convert a sequence of Unicode characters to a DOMString" at http://wiki.whatwg.org/wiki/StringEncoding#Steps_to_convert_a_sequence_of_Unicode_characters_to_a_DOMString

Would it be possible to add the latter to the WebIDL specification so that both directions are defined in one place?

The proposed text could be (sans-formatting, using _ for subscript and ^ for superscript):

The following algorithm defines a way to convert a sequence of Unicode characters to a DOMString:
1. Let U_0...n-1 be the sequence of Unicode characters
2. Initialize i to 0
3. Initialize S to be an empty sequence of code units
4. While i < n
    1. Let c be the code point of the Unicode character in U at index i
    2. If c ≥ 2^16, then:
        1. Append to S a code unit equal to (c - 2^16) / 2^10 + 0xD800, where "/" represents integer division.
        2. Append to S a code unit equal to (c - 2^16) % 2^10 + 0xDC00, where "%" represents the remainder of an integer division.
    3. Otherwise, append to S a code unit equal to c.
    4. Set i to i+1
5. Return the IDL DOMString value that represents sequence of code units S.
Comment 1 Cameron McCormack 2012-06-28 02:38:36 UTC
I think the reason that I didn't include this reverse algorithm was because there's only one correct way of converting a Unicode string into UTF-16 code units (whereas going the other way you need to deal with illegal UTF-16 sequences so there were some different approaches we could have taken) so you could probably just write a single line in your spec saying for example

  Let s be the DOMString that represents the sequence of code units resulting
  from encoding the sequence of Unicode characters t as UTF-16.

I guess I just want to avoid re-specifying the UTF-16 encoding algorithm.  But if you think the above is not precise enough I guess I can add your suggested text.
Comment 2 Joshua Bell 2012-06-28 15:53:15 UTC
(In reply to comment #1)
> I guess I just want to avoid re-specifying the UTF-16 encoding algorithm.  But
> if you think the above is not precise enough I guess I can add your suggested
> text.

You're right, that should be fine. If anyone complains about not having it detailed it shouldn't be problematic to add it later since, as you point out, there's only one way to do it.
Comment 3 Cameron McCormack 2012-06-28 22:12:18 UTC
Thanks.