This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Currently the spec says: 'Authors must use the utf-8 encoding and must use the "utf-8" label to identify it.' Given the label matching is done case-insensitively, it is not entirely clear whether authors must use this label case-sensitively or not. This should be clarified, preferably to allow either case (there is no practical benefit of requiring it to be lowercased). We should also make the "utf8" label conforming. Making this non-conforming is of no practical benefit and makes a large number of documents non-conforming.
(In reply to Geoffrey Sneddon from comment #0) > Currently the spec says: 'Authors must use the utf-8 encoding and must use > the "utf-8" label to identify it.' > > Given the label matching is done case-insensitively, it is not entirely > clear whether authors must use this label case-sensitively or not. This > should be clarified, preferably to allow either case (there is no practical > benefit of requiring it to be lowercased). Agreed. > We should also make the "utf8" label conforming. Making this non-conforming > is of no practical benefit and makes a large number of documents > non-conforming. This looks innocuous at first. However, in some products (in particular Oracle Databases), the label "utf8" is used for a variant of UTF-8 where characters outside the BMP are encoded with two surrogates, with a total of 6 bytes. For security reasons, this is prohibited in UTF-8.
Yeah, only utf-8 was intentional. Clarified the case stuff. https://github.com/whatwg/encoding/commit/61af3cdf199b4ab86babd47b7d48bb328c54a702