Bugzilla – Bug 16604
RFE: add unsigned byte as synonym for octet
Last modified: 2013-06-17 01:14:08 UTC
Would it be possible to add the type 'unsigned byte' as a synonym for 'octet'? Doing so would improve symmetry with the other signed and unsigned types in the spec (short/unsigned short, etc.).
Why do we have both byte and octet to begin with? This is confusing with many specifications that use byte as terminology and treat e.g. 0xFF as byte. I'd prefer renaming octet to byte and dropping the current byte.
From the standpoint of the typed array and WebGL specs it's essential to have both signed and unsigned byte concepts, so dropping support for signed bytes doesn't work for us.
Since the other integer types in Web IDL are signed, and use the "unsigned" modifier for the unsigned variants, it seems most symmetric to do the same for byte.
"byte" should (and indeed does, in almost all contexts) mean 0..255, IMHO.
I recommend "short int" and "unsigned short int" if you need symmetric names, with "byte" as a synonym for "unsigned short int". Or alternatively, "tinyint" and "unsigned tinyint".
short and unsigned short already exist in Web IDL and map to C's int16 / uint16. Those typedefs are also needed for the typed array spec and likely elsewhere.
"tiny" / "tiny int" and unsigned variants could work. Not sure about the potential for namespace collisions with existing code.
Changing byte to be an unsigned type has downsides. It requires updating all existing HTML5 specs which refer to that type, and would imply introducing a "signed byte" type which is again asymmetric with how the other integer types in Web IDL behave.
I would still prefer "byte" and "unsigned byte", but would also prefer the pair of types "signed byte" and "byte" over introducing a new "tiny" concept.
How many specifications are we talking about here? I cannot find usage of this within HTML itself for instance. We could also name the type "short short", just like we have "long long" for longer than long, "short short" could be shorter than short.
I still think if we go down this route we should just be allowed to define custom ranges (e.g., percent [0-100], angle [0-360], richter[0-10.0], etc.)
Nice context for byte vs octet - http://www.tcpipguide.com/free/t_BinaryInformationandRepresentationBitsBytesNibbles-3.htm
Careful with "short int" and "unsigned short int" as octet I believe generally (though w/ basic data type lengths you can never be sure) maps to C/C++ signed/unsigned char != signed/unsigned int
Ultimately, even though you might confuse byte with short with octet, you will rarely confuse octet with something that is not 8-bits long.
Also, with octet - it's not about range, it's about storage size, although if an octet's unsigned (which I think it has to be), it should go from 0-255
Bits, bytes, words and longs, how many were going to St Ives?
octet came from OMG IDL, where it already meant an unsigned 8 bit integer type. When we needed to introduce a signed type, because OMG IDL didn't have one, I used byte because Java's byte type is signed, rather than introduce "signed octet".
I agree the current names are a bit sucky, though. But I'm not really in favour of introducing synonyms. Seeing different names for the same concept in different specs, depending on the style preferences of the author, will be confusing.
I think there are sufficiently few specs using byte that updating them if we decide to change the names will be easy enough. (It might even just be WebGL and the Typed Arrays spec.)
If people think the status quo is unacceptable, then I think my next preference would be to have "byte" and "signed byte", or "octet" and "signed octet". The former looks and sounds nicer, but the latter more strongly feels like something unsigned to me. (And true enough maybe technically a byte doesn't imply 8 bits necessarily, but I don't think it's a real enough concern.)
byte and signed byte works. E.g. http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html and HTML and many other specifications always talk about bytes in the unsigned sense. I think it would make sense if we did that consistently throughout the platform standards.
I don't think it's worth changing at this point.