24198 2014-01-03 08:53:13 +0000 The Encoding Standard should use bitwise operations instead of multiplication, division and exponentiation when natural 2014-04-14 13:33:44 +0000 1 1 1 Unclassified WHATWG Encoding unspecified All All RESOLVED FIXED P2 minor Unsorted 1 hsivonen annevk cowan duerst jsbell mike www-international sideshowbarker+encodingspec oldest_to_newest 97957 0 hsivonen 2014-01-03 08:53:13 +0000 The Encoding Standard systematically avoids specifying bitwise operations and this leads to contortions like multiplying by 64 raised to the nth power in the UTF-8 algorithms. This only serves to obfuscate how the algorithms should actually be implemented. It would be more useful if the specification used bitwise operations in cases like this. Even if it might be argued that bitwise operations are optimizations that don't belong in a spec, UTF-8 in particular is designed to be implemented using bitwise operations, so the spec's style amounts to obfuscation and pessimization rather than implementation detail avoidance. For encodings other than UTF-8, this editorial style obfuscates whether an algorithm can be implemented entirely using ALU operations or whether multiplication or division instructions are actually needed, which is something that would be nice to see at a glance. 97960 1 hsivonen 2014-01-03 10:34:43 +0000 (In reply to Henri Sivonen from comment #0) > For encodings other than UTF-8, this editorial style obfuscates whether an > algorithm can be implemented entirely using ALU operations or whether > multiplication or division instructions are actually needed, which is > something that would be nice to see at a glance. Looks like these tend to be multiplications by a constant, so the answer is "yes" and one might *hope* a compiler to take care of it. Anyway, since these operations are most likely things like "take these bits out of this number and concatenate them to these other bits from this other number", talking about multiplications instead of masks and shifts obscures what's happening, which isn't nice. 97988 2 cowan 2014-01-04 00:41:32 +0000 It's important to remember that division is not the same thing as right shifting, particularly in C/C++, where the effect of right-shifting a negative number is undefined. This matters because C/C++ is still the usual implementation language for browsers. So it's safe to change multiplications to left shifts, but any divisions must be carefully scrutinized. 98025 3 annevk 2014-01-06 13:56:10 +0000 Henri, note that in particular encoders are very slow the way they are written now. Creating a dedicated lookup table would be much quicker. Do you think we should have that too? (The Encoding standard actually does define left and right shifts, and logical OR and AND, and uses them. Just not for utf-8...) 98060 4 duerst 2014-01-07 06:46:17 +0000 (In reply to Anne from comment #3) > Henri, note that in particular encoders are very slow the way they are > written now. Creating a dedicated lookup table would be much quicker. Do you > think we should have that too? I'm not Henri, but I don't think speed is important in the spec. But make it clear to the reader that other ways of implementation giving the same result may be (much) faster. > (The Encoding standard actually does define left and right shifts, and > logical OR and AND, and uses them. Just not for utf-8...) Do you mean bit-wise OR and AND? If you actually have these operations well-defined already, and if there are no issues along the lines of those pointed out by John at #c2, then using these operations for UTF-8 would be the way to go. I can't immagine not doing UTF-8 with these operations if available. 98061 5 hsivonen 2014-01-07 06:51:23 +0000 (In reply to Anne from comment #3) > Henri, note that in particular encoders are very slow the way they are > written now. Creating a dedicated lookup table would be much quicker. Do you > think we should have that too? I don't think we should have that sort of optimizations in the spec. 103741 6 annevk 2014-04-11 12:36:54 +0000 The terminology section covers these operations. However, it's not entirely clear to me how to convert the existing algorithms in a straightforward manner. 103804 7 hsivonen 2014-04-14 07:34:21 +0000 (In reply to Anne from comment #6) > The terminology section covers these operations. However, it's not entirely > clear to me how to convert the existing algorithms in a straightforward > manner. What does that mean to implementors? It's pretty uncool if it isn't entirely clear to them, either. 103810 8 annevk 2014-04-14 10:46:08 +0000 I do not disagree, but that does not bring me closer to being able to fix this. 103816 9 annevk 2014-04-14 13:33:44 +0000 https://github.com/whatwg/encoding/commit/cf0ad0e43d9870ca1890e8f4f85853f926c59e04 I have not removed multiplication, division, and modulo from other algorithms as they help understand how the indexes work and they are not in powers of two so would not be straightforward as I understand it from Simon who helped me do this. Please reopen if you think this is not sufficient.