<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>24198</bug_id>
          
          <creation_ts>2014-01-03 08:53:13 +0000</creation_ts>
          <short_desc>The Encoding Standard should use bitwise operations instead of multiplication, division and exponentiation when natural</short_desc>
          <delta_ts>2014-04-14 13:33:44 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>WHATWG</product>
          <component>Encoding</component>
          <version>unspecified</version>
          <rep_platform>All</rep_platform>
          <op_sys>All</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>FIXED</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>minor</bug_severity>
          <target_milestone>Unsorted</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Henri Sivonen">hsivonen</reporter>
          <assigned_to name="Anne">annevk</assigned_to>
          <cc>cowan</cc>
    
    <cc>duerst</cc>
    
    <cc>jsbell</cc>
    
    <cc>mike</cc>
    
    <cc>www-international</cc>
          
          <qa_contact>sideshowbarker+encodingspec</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>97957</commentid>
    <comment_count>0</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2014-01-03 08:53:13 +0000</bug_when>
    <thetext>The Encoding Standard systematically avoids specifying bitwise operations and this leads to contortions like multiplying by 64 raised to the nth power in the UTF-8 algorithms.

This only serves to obfuscate how the algorithms should actually be implemented. It would be more useful if the specification used bitwise operations in cases like this. Even if it might be argued that bitwise operations are optimizations that don&apos;t belong in a spec, UTF-8 in particular is designed to be implemented using bitwise operations, so the spec&apos;s style amounts to obfuscation and pessimization rather than implementation detail avoidance.

For encodings other than UTF-8, this editorial style obfuscates whether an algorithm can be implemented entirely using ALU operations or whether multiplication or division instructions are actually needed, which is something that would be nice to see at a glance.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>97960</commentid>
    <comment_count>1</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2014-01-03 10:34:43 +0000</bug_when>
    <thetext>(In reply to Henri Sivonen from comment #0)
&gt; For encodings other than UTF-8, this editorial style obfuscates whether an
&gt; algorithm can be implemented entirely using ALU operations or whether
&gt; multiplication or division instructions are actually needed, which is
&gt; something that would be nice to see at a glance.

Looks like these tend to be multiplications by a constant, so the answer is &quot;yes&quot; and one might *hope* a compiler to take care of it.

Anyway, since these operations are most likely things like &quot;take these bits out of this number and concatenate them to these other bits from this other number&quot;, talking about multiplications instead of masks and shifts obscures what&apos;s happening, which isn&apos;t nice.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>97988</commentid>
    <comment_count>2</comment_count>
    <who name="John Cowan">cowan</who>
    <bug_when>2014-01-04 00:41:32 +0000</bug_when>
    <thetext>It&apos;s important to remember that division is not the same thing as right shifting, particularly in C/C++, where the effect of right-shifting a negative number is undefined.  This matters because C/C++ is still the usual implementation language for browsers.  So it&apos;s safe to change multiplications to left shifts, but any divisions must be carefully scrutinized.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>98025</commentid>
    <comment_count>3</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-01-06 13:56:10 +0000</bug_when>
    <thetext>Henri, note that in particular encoders are very slow the way they are written now. Creating a dedicated lookup table would be much quicker. Do you think we should have that too?

(The Encoding standard actually does define left and right shifts, and logical OR and AND, and uses them. Just not for utf-8...)</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>98060</commentid>
    <comment_count>4</comment_count>
    <who name="Martin Dürst">duerst</who>
    <bug_when>2014-01-07 06:46:17 +0000</bug_when>
    <thetext>(In reply to Anne from comment #3)
&gt; Henri, note that in particular encoders are very slow the way they are
&gt; written now. Creating a dedicated lookup table would be much quicker. Do you
&gt; think we should have that too?

I&apos;m not Henri, but I don&apos;t think speed is important in the spec. But make it clear to the reader that other ways of implementation giving the same result may be (much) faster.

&gt; (The Encoding standard actually does define left and right shifts, and
&gt; logical OR and AND, and uses them. Just not for utf-8...)

Do you mean bit-wise OR and AND? If you actually have these operations well-defined already, and if there are no issues along the lines of those pointed out by John at #c2, then using these operations for UTF-8 would be the way to go. I can&apos;t immagine not doing UTF-8 with these operations if available.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>98061</commentid>
    <comment_count>5</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2014-01-07 06:51:23 +0000</bug_when>
    <thetext>(In reply to Anne from comment #3)
&gt; Henri, note that in particular encoders are very slow the way they are
&gt; written now. Creating a dedicated lookup table would be much quicker. Do you
&gt; think we should have that too?

I don&apos;t think we should have that sort of optimizations in the spec.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103741</commentid>
    <comment_count>6</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-04-11 12:36:54 +0000</bug_when>
    <thetext>The terminology section covers these operations. However, it&apos;s not entirely clear to me how to convert the existing algorithms in a straightforward manner.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103804</commentid>
    <comment_count>7</comment_count>
    <who name="Henri Sivonen">hsivonen</who>
    <bug_when>2014-04-14 07:34:21 +0000</bug_when>
    <thetext>(In reply to Anne from comment #6)
&gt; The terminology section covers these operations. However, it&apos;s not entirely
&gt; clear to me how to convert the existing algorithms in a straightforward
&gt; manner.

What does that mean to implementors? It&apos;s pretty uncool if it isn&apos;t entirely clear to them, either.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103810</commentid>
    <comment_count>8</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-04-14 10:46:08 +0000</bug_when>
    <thetext>I do not disagree, but that does not bring me closer to being able to fix this.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>103816</commentid>
    <comment_count>9</comment_count>
    <who name="Anne">annevk</who>
    <bug_when>2014-04-14 13:33:44 +0000</bug_when>
    <thetext>https://github.com/whatwg/encoding/commit/cf0ad0e43d9870ca1890e8f4f85853f926c59e04

I have not removed multiplication, division, and modulo from other algorithms as they help understand how the indexes work and they are not in powers of two so would not be straightforward as I understand it from Simon who helped me do this.

Please reopen if you think this is not sufficient.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>