This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 2519 - Byte-order-mark is compulsory in XML for UTF-16
Summary: Byte-order-mark is compulsory in XML for UTF-16
Status: CLOSED FIXED
Alias: None
Product: XPath / XQuery / XSLT
Classification: Unclassified
Component: Serialization 1.0 (show other bugs)
Version: Candidate Recommendation
Hardware: PC Linux
: P2 normal
Target Milestone: ---
Assignee: Scott Boag
QA Contact: Mailing list for public feedback on specs from XSL and XML Query WGs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-11-13 09:30 UTC by Colin Adams
Modified: 2006-02-17 20:14 UTC (History)
0 users

See Also:


Attachments

Description Colin Adams 2005-11-13 09:30:26 UTC
The wording of the serialization parameter byte-order-mark allows (and indeed
requires) an implementation to omit the byte order mark when byte-order-mark="no"
is coded.
This is invalid behaviour when encoding="UTF-16" and method="xml" or
method="xhtml", as XML requires a byte order mark for UTF-16.
Accordingly, I think that sections 5.1.11 and 6.1.11 should state some specific
rule for when encoding="UTF-16" (either to ignore the byte-order-mark parameter,
or to make it a serlialization error).
Comment 1 Michael Rys 2005-11-14 07:12:47 UTC
If the environment into which you serialize the data does guarantee the UTF-16 
encoding (such as specific string types in some programming languages and 
databases), there is no reason to require the BOM. It actually could make 
further processing more complex. Also note that the XML parsers allow for the 
encoding to be provided through external means.
Comment 2 Colin Adams 2005-11-14 07:22:53 UTC
While these two points are true, they do not negate the requirement in
section 4.3.3 of Extensible Markup Language (XML) 1.0 (Third Edition)
and Extensible Markup Language (XML) 1.1. The word MUST is used.
Comment 3 David Carlisle 2005-11-14 11:41:15 UTC
(In reply to comment #2)
> While these two points are true, they do not negate the requirement in
> section 4.3.3 of Extensible Markup Language (XML) 1.0 (Third Edition)
> and Extensible Markup Language (XML) 1.1. The word MUST is used.

This is true of complete documents, but XSLT also has a  requirement
to support fragments, for example multiple top level elements. These fragments
are often combined by some post process and having the bom there may well be
inconvenient.

this is essentially the same issue as the xml declaration. An XML declaration is
similarly mandatory for an xml document if the encoding is not utf8/16 and xslt1
mandated that it be added. This turned out to be too restrictive and xslt2
allows  people to request that it be omitted, even though this may make the
result not well formed, on the assumption that they know what they are doing...

Comment 4 Colin Adams 2005-11-14 12:16:42 UTC
But what you are describing is an external parsed general entity, and the
requirement for a BOM with encoding of UTF-16 is still mandatory for such entities.
If a BOM is incovenient for a particular application, then you can serialize in
UTF-16BE or UTF-16LE, where the BOM is not compulsory (in fact it is forbidden
by the Unicode standard).
Comment 5 Joanne Tong 2006-02-17 20:14:40 UTC
The XSL and XQuery working group discussed this comment on Feb 1, 2006 and 
decided that no technical change is required.   This decision was taken in 
order to support the use cases where different resulting XML fragments need to 
be concatenated and the BOM would add additional complexity to that process.  
A non-normative note will be added to the specification to explain why the 
combination of UTF-16 and no BOM is allowed.  This note would point out that 
the serialization specification can output XML fragments that may not be well-
formed external general parsed entities.

I am marking this bug as CLOSED.  Please reopen this bug within one week if 
you feel this resolution is unacceptable.  

Thank you for raising the comment.

Joanne