This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24198 - The Encoding Standard should use bitwise operations instead of multiplication, division and exponentiation when natural
Summary: The Encoding Standard should use bitwise operations instead of multiplication...
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: Encoding (show other bugs)
Version: unspecified
Hardware: All All
: P2 minor
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+encodingspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-03 08:53 UTC by Henri Sivonen
Modified: 2014-04-14 13:33 UTC (History)
5 users (show)

See Also:


Attachments

Description Henri Sivonen 2014-01-03 08:53:13 UTC
The Encoding Standard systematically avoids specifying bitwise operations and this leads to contortions like multiplying by 64 raised to the nth power in the UTF-8 algorithms.

This only serves to obfuscate how the algorithms should actually be implemented. It would be more useful if the specification used bitwise operations in cases like this. Even if it might be argued that bitwise operations are optimizations that don't belong in a spec, UTF-8 in particular is designed to be implemented using bitwise operations, so the spec's style amounts to obfuscation and pessimization rather than implementation detail avoidance.

For encodings other than UTF-8, this editorial style obfuscates whether an algorithm can be implemented entirely using ALU operations or whether multiplication or division instructions are actually needed, which is something that would be nice to see at a glance.
Comment 1 Henri Sivonen 2014-01-03 10:34:43 UTC
(In reply to Henri Sivonen from comment #0)
> For encodings other than UTF-8, this editorial style obfuscates whether an
> algorithm can be implemented entirely using ALU operations or whether
> multiplication or division instructions are actually needed, which is
> something that would be nice to see at a glance.

Looks like these tend to be multiplications by a constant, so the answer is "yes" and one might *hope* a compiler to take care of it.

Anyway, since these operations are most likely things like "take these bits out of this number and concatenate them to these other bits from this other number", talking about multiplications instead of masks and shifts obscures what's happening, which isn't nice.
Comment 2 John Cowan 2014-01-04 00:41:32 UTC
It's important to remember that division is not the same thing as right shifting, particularly in C/C++, where the effect of right-shifting a negative number is undefined.  This matters because C/C++ is still the usual implementation language for browsers.  So it's safe to change multiplications to left shifts, but any divisions must be carefully scrutinized.
Comment 3 Anne 2014-01-06 13:56:10 UTC
Henri, note that in particular encoders are very slow the way they are written now. Creating a dedicated lookup table would be much quicker. Do you think we should have that too?

(The Encoding standard actually does define left and right shifts, and logical OR and AND, and uses them. Just not for utf-8...)
Comment 4 Martin Dürst 2014-01-07 06:46:17 UTC
(In reply to Anne from comment #3)
> Henri, note that in particular encoders are very slow the way they are
> written now. Creating a dedicated lookup table would be much quicker. Do you
> think we should have that too?

I'm not Henri, but I don't think speed is important in the spec. But make it clear to the reader that other ways of implementation giving the same result may be (much) faster.

> (The Encoding standard actually does define left and right shifts, and
> logical OR and AND, and uses them. Just not for utf-8...)

Do you mean bit-wise OR and AND? If you actually have these operations well-defined already, and if there are no issues along the lines of those pointed out by John at #c2, then using these operations for UTF-8 would be the way to go. I can't immagine not doing UTF-8 with these operations if available.
Comment 5 Henri Sivonen 2014-01-07 06:51:23 UTC
(In reply to Anne from comment #3)
> Henri, note that in particular encoders are very slow the way they are
> written now. Creating a dedicated lookup table would be much quicker. Do you
> think we should have that too?

I don't think we should have that sort of optimizations in the spec.
Comment 6 Anne 2014-04-11 12:36:54 UTC
The terminology section covers these operations. However, it's not entirely clear to me how to convert the existing algorithms in a straightforward manner.
Comment 7 Henri Sivonen 2014-04-14 07:34:21 UTC
(In reply to Anne from comment #6)
> The terminology section covers these operations. However, it's not entirely
> clear to me how to convert the existing algorithms in a straightforward
> manner.

What does that mean to implementors? It's pretty uncool if it isn't entirely clear to them, either.
Comment 8 Anne 2014-04-14 10:46:08 UTC
I do not disagree, but that does not bring me closer to being able to fix this.
Comment 9 Anne 2014-04-14 13:33:44 UTC
https://github.com/whatwg/encoding/commit/cf0ad0e43d9870ca1890e8f4f85853f926c59e04

I have not removed multiplication, division, and modulo from other algorithms as they help understand how the indexes work and they are not in powers of two so would not be straightforward as I understand it from Simon who helped me do this.

Please reopen if you think this is not sufficient.