<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>17620</bug_id>
          
          <creation_ts>2012-06-27 20:31:26 +0000</creation_ts>
          <short_desc>Add steps to convert a sequence of Unicode characters to a DOMString</short_desc>
          <delta_ts>2012-06-28 22:12:18 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WebAppsWG</product>
          <component>WebIDL</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>WONTFIX</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>enhancement</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Joshua Bell">jsbell</reporter>
          <assigned_to name="Cameron McCormack">cam</assigned_to>
          <cc>mike</cc>
    
    <cc>public-script-coord</cc>
          
          <qa_contact>public-webapps-bugzilla</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>69460</commentid>
    <comment_count>0</comment_count>
    <who name="Joshua Bell">jsbell</who>
    <bug_when>2012-06-27 20:31:26 +0000</bug_when>
    <thetext>The Web API proposed in http://wiki.whatwg.org/wiki/StringEncoding requires interpretation of DOMString code units as an encoding of Unicode characters for the purpose of encoding and decoding DOMStrings to other binary encodings.

WebIDL defines &quot;steps to convert a DOMString to a sequence of Unicode characters&quot; at http://dev.w3.org/2006/webapi/WebIDL/#dfn-obtain-unicode

The proposed API require the reverse as well, and defines &quot;steps to convert a sequence of Unicode characters to a DOMString&quot; at http://wiki.whatwg.org/wiki/StringEncoding#Steps_to_convert_a_sequence_of_Unicode_characters_to_a_DOMString

Would it be possible to add the latter to the WebIDL specification so that both directions are defined in one place?

The proposed text could be (sans-formatting, using _ for subscript and ^ for superscript):

The following algorithm defines a way to convert a sequence of Unicode characters to a DOMString:
1. Let U_0...n-1 be the sequence of Unicode characters
2. Initialize i to 0
3. Initialize S to be an empty sequence of code units
4. While i &lt; n
    1. Let c be the code point of the Unicode character in U at index i
    2. If c ≥ 2^16, then:
        1. Append to S a code unit equal to (c - 2^16) / 2^10 + 0xD800, where &quot;/&quot; represents integer division.
        2. Append to S a code unit equal to (c - 2^16) % 2^10 + 0xDC00, where &quot;%&quot; represents the remainder of an integer division.
    3. Otherwise, append to S a code unit equal to c.
    4. Set i to i+1
5. Return the IDL DOMString value that represents sequence of code units S.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>69468</commentid>
    <comment_count>1</comment_count>
    <who name="Cameron McCormack">cam</who>
    <bug_when>2012-06-28 02:38:36 +0000</bug_when>
    <thetext>I think the reason that I didn&apos;t include this reverse algorithm was because there&apos;s only one correct way of converting a Unicode string into UTF-16 code units (whereas going the other way you need to deal with illegal UTF-16 sequences so there were some different approaches we could have taken) so you could probably just write a single line in your spec saying for example

  Let s be the DOMString that represents the sequence of code units resulting
  from encoding the sequence of Unicode characters t as UTF-16.

I guess I just want to avoid re-specifying the UTF-16 encoding algorithm.  But if you think the above is not precise enough I guess I can add your suggested text.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>69494</commentid>
    <comment_count>2</comment_count>
    <who name="Joshua Bell">jsbell</who>
    <bug_when>2012-06-28 15:53:15 +0000</bug_when>
    <thetext>(In reply to comment #1)
&gt; I guess I just want to avoid re-specifying the UTF-16 encoding algorithm.  But
&gt; if you think the above is not precise enough I guess I can add your suggested
&gt; text.

You&apos;re right, that should be fine. If anyone complains about not having it detailed it shouldn&apos;t be problematic to add it later since, as you point out, there&apos;s only one way to do it.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>69515</commentid>
    <comment_count>3</comment_count>
    <who name="Cameron McCormack">cam</who>
    <bug_when>2012-06-28 22:12:18 +0000</bug_when>
    <thetext>Thanks.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>