This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The BOM is rarely used, and unfamiliar to many users. Making it the 'preferred' way to indicate UTF-8 Character Encoding in polyglot is unhelpful and potentially off-putting. The XML Core WG requests the relevant paragraph in section 3, Specifying a Document's Character Encoding, be changed to read as follows: Polyglot markup uses the UTF-8 character encoding, the only character encoding for which both HTML and XML require support. HTML requires UTF-8 to be explicitly declared to avoid fallback to a legacy encoding [HTML5]. For XML, UTF-8 is an encoding default. As such, character encoding may be left undeclared in XML with the result that UTF-8 is still supported [XML10]. Polyglot markup declares the UTF-8 character encoding in the following ways, which may be used separately or in combination: * Within the document . By using <meta charset="UTF-8"/> (the HTML encoding declaration) -- preferred . By using the Byte Order Mark (BOM) character. * Outside the document . . . Submitted on behalf of the XML Core WG
(In reply to comment #0) OBJECTION: I disagree with the XML Core WG's justification for the proposed change. But if the following arguments does not convince you, then I could live with *not* declaring any particular method as "preferred". ARGUMENTS: Motivation for declaring BOM as the preferred method, is polyglotness. Have the XML Core WG considered arguments about how BOM makes HTML *more* polyglot? In particular, have you considered the following 3 points: ? 1) BOM allows to skip a HTML specific element 2) BOM takes effect in both HTML and XML. 3) BOM makes encoding handling of HTML and XML more equal, as it leads HTML-parsers to behave more like XML-parsers. Explanation of point 3): #XML: Because it would trigger fatal error, XML parsers do not permit users to accidentically or manuallly override the encoding of a polyglot XML-file - regardless of how the encoding is signalled. Hence, in an XML parser, such a file is encoding safe in the sense that manual or accidental overriding (of the UTF-8 encoding) is impossible. #HTML: HTML always allow encoding overriding. Except when there is a BOM: "the byte order mark (also known as BOM) is considered more authoritative than anything else." <http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#decode-and-encode> Meaning overriding is impossible. (Already implemented in IE, Chrome, Webkit.)
Bug 13392 discusses the same issue. Thus I am labeling this bug as a duplicate. The arguments put forward here, have already more or less been put forward there, including the idea to just remove the word "preferred": [1] ]] I suggest removing " (preferred)" to avoid a long debate on whether to endorse Leif's preference or the i18n Core WG's preference. [[ [1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=13392#c3 *** This bug has been marked as a duplicate of bug 13392 ***