16908 – BOM should not be recommended

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 16908 - BOM should not be recommended

Summary: BOM should not be recommended

Status:	RESOLVED DUPLICATE of bug 13392

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	---
Assignee:	Eliot Graff
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-05-02 14:15 UTC by Henry S. Thompson
Modified:	2012-05-03 19:00 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description Henry S. Thompson 2012-05-02 14:15:27 UTC

The BOM is rarely used, and unfamiliar to many users.  Making it the 'preferred' way to indicate UTF-8 Character Encoding in polyglot is unhelpful and potentially off-putting.

The XML Core WG requests the relevant paragraph in section 3, Specifying a Document's Character Encoding, be changed to read as follows:

   Polyglot markup uses the UTF-8 character encoding, the only character
   encoding for which both HTML and XML require support. HTML requires
   UTF-8 to be explicitly declared to avoid fallback to a legacy encoding
   [HTML5]. For XML, UTF-8 is an encoding default. As such, character
   encoding may be left undeclared in XML with the result that UTF-8 is
   still supported [XML10].

   Polyglot markup declares the UTF-8 character encoding in the following
   ways, which may be used separately or in combination:

   * Within the document
     . By using <meta charset="UTF-8"/> (the HTML encoding
        declaration) -- preferred
     . By using the Byte Order Mark (BOM) character.

   * Outside the document
     . . .

Submitted on behalf of the XML Core WG

Comment 1 Leif Halvard Silli 2012-05-02 17:08:38 UTC

(In reply to comment #0)

    OBJECTION:

I disagree with the XML Core WG's justification for the proposed change. But if the following arguments does not convince you, then I could live with *not* declaring any particular method as "preferred".

    ARGUMENTS: 

Motivation for declaring BOM as the preferred method, is polyglotness.

Have the XML Core WG considered arguments about how BOM makes HTML *more* polyglot? In particular, have you considered the following 3 points: ?

1) BOM allows to skip a HTML specific element
2) BOM takes effect in both HTML and XML.
3) BOM makes encoding handling of HTML and XML more equal,
    as it leads HTML-parsers to behave more like XML-parsers.

    Explanation of point 3):  

#XML: Because it would trigger fatal error, XML parsers do not permit users to accidentically or manuallly override the encoding of a polyglot XML-file - regardless of how the encoding is signalled. Hence, in an XML parser, such a file is encoding safe in the sense that manual or accidental overriding (of the UTF-8 encoding) is impossible.

#HTML: HTML always allow encoding overriding. Except when there is a BOM: "the byte order mark (also known as BOM) is considered more authoritative than anything else." 
<http://dvcs.w3.org/hg/encoding/raw-file/tip/Overview.html#decode-and-encode>
Meaning overriding is impossible. (Already implemented in IE, Chrome, Webkit.)

Comment 2 Leif Halvard Silli 2012-05-03 19:00:29 UTC

Bug 13392 discusses the same issue. Thus I am labeling this bug as a duplicate. The arguments put forward here,  have already more or less been put forward there, including the idea to just remove the word "preferred": [1]

]]  I suggest removing " (preferred)" to avoid a long debate on whether
     to endorse Leif's preference or the i18n Core WG's preference. [[

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=13392#c3

*** This bug has been marked as a duplicate of bug 13392 ***