Bug 19931 - Should not prefer byte order mark with UTF-8
Should not prefer byte order mark with UTF-8
Status: RESOLVED DUPLICATE of bug 13392
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML/XHTML Compat. Authoring Guide (ed: Eliot Graff)
unspecified
All All
: P2 normal
: ---
Assigned To: Eliot Graff
HTML WG Bugzilla archive list
: externalComments, NE
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-10 15:47 UTC by bugz.ate.my.horse
Modified: 2012-11-10 18:49 UTC (History)
5 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description bugz.ate.my.horse 2012-11-10 15:47:02 UTC
In the section "Specifying a Document's Character Encoding", it is stated that polyglot markup uses UTF-8. It then says that the prefered way to indicate this encoding is with a Byte Order Mark. 

This is not advisable I feel due to: UTF-8 not requiring a BOM [3]; that it could cause problems with applications (apparently MSIE does or did have a problem) and programing languages (apparently inc. Java [4][5]); it causes otherwise valid ASCII to stop being ASCII. 

As such, I would swap the prefered method for indicating UTF inside the document and add a note about using the BOM.

* By using <meta charset="UTF-8"/> (the HTML encoding declaration)(preferred).
* By using the Byte Order Mark (BOM) character (could cause problems in some situations).


References: 
[1] https://en.wikipedia.org/wiki/Byte_order_mark#UTF-8
[2] https://en.wikipedia.org/wiki/UTF-8#Byte_order_mark
[3] http://www.unicode.org/faq/utf_bom.html#bom5
[4] http://bugs.sun.com/view_bug.do?bug_id=6378911
[5] http://bugs.sun.com/view_bug.do?bug_id=4508058
Comment 1 Leif Halvard Silli 2012-11-10 18:49:14 UTC
We are waiting for the editor to take action on bug 13392

*** This bug has been marked as a duplicate of bug 13392 ***