XML 1.1 Second Edition Specification Errata


This document records all known errors in the Second Edition of the Extensible Markup Language (XML) 1.1 Specification ; for updates see the latest version.

The errata are numbered, classified as Substantive or Editorial, and listed in reverse chronological order of their date of publication in each category. Changes to the text of the spec are indicated thus: deleted text, new text, modified text . Substantive corrections are proposed by the XML Core Working Group, which has consensus that they are appropriate; they are not to be considered normative until approved by a Call for Review of Proposed Corrections or a Call for Review of an Edited Recommendation.

Please email error reports to xml-editor@w3.org.

Substantive errata

Errata as of 2007-12-05


Section 4.3.3 Character Encoding in Entities

Add a new paragraph following the second paragraph, to read:

If the replacement text of an external entity is to begin with the character U+FEFF, and no text declaration is present, then a Byte Order Mark MUST be present, whether the entity is encoded in UTF-8 or UTF-16.
Provide unambiguous behavior by working around the inherent ambiguity of the BOM in Unicode.

Errata as of 2007-08-15


Section 1.1 Origin and Goals

Amend the first paragraph after the list of goals, so that it reads:

This specification, together with associated standards (Unicode [Unicode] and ISO/IEC 10646 [ISO10646] for characters, Internet RFC 30664646 [RFC1766] and the Language Subtag Registry [IANA-LANGCODES] for language identification tags, ISO 639 [ISO639] for language name codes, and ISO 3166 [ISO3166] for country name codes), provides all the information necessary to understand XML Version 1.1; and construct computer programs to process it.
Section A.1 Normative References

Change the [IETF RFC 3066] entry so that it points to IETF RFC 4646.

Section A.2 Other References

Change the [IANA-LANGCODES] entry so that it points to the new registry at http://www.iana.org/assignments/language-subtag-registry.

RFC 3066 has been replaced by RFC 4646. The old registry pointed to by the IANA-LANGCODES entry is now stale and closed. With the new registry, reference to ISO 639 and ISO 3166 is no longer necessary (and may even be harmful in the future, because of stability concerns).


Section 2.2 Characters

Amend the [#xFDD0-#xFDDF] range in the list of discouraged characters to read [#xFDD0-#xFDEF].

"#xFDDF" was a typo, as can be ascertained by consulting the Unicode standard in which the whole [#xFDD0-#xFDEF] range was introduced as a block in one version (3.1) to serve as "non-characters" for internal use.

Editorial errata

Errata as of 2008-01-18


Section 4.1 Character and Entity References

Change the first sentence of the text of the Entity Declared VC as follows:

In a document with an external subset or parameter entity references, if the document is not standalone (either "standalone='no'" is specified or there is no standalone declaration), then the Name given in the entity reference MUST match that in an entity declaration.
The existing wording was ambiguous and did not explicitly address the case of an absent standalone declaration.

Errata as of 2007-12-05


Section C Expansion of Entity and Character References

Append the following at the end of the Appendix:

In the following example

<!DOCTYPE foo [ 
<!ENTITY x "&lt;"> 
<foo attr="&x;"/>

the replacement text of x is the four characters "&lt;" because references to general entities in entity values are bypassed. The replacement text of lt is a character reference to the less-than character, for example the five characters "&#60;" (see 4.6 Predefined Entities). Since neither of these contains a less-than character the result is well-formed.

If the definition of x had been

<!ENTITY x "&#60;">

then the document would not have been well-formed, because the replacement text of x would be the single character "<" which is not permitted in attribute values (see WFC: No < in Attribute Values).

This is an editorial clarification of a case that remained confusing.


Section 4.3.3 Character Encoding in Entities

Change the second sentence of the first paragraph to read:

The terms "UTF-8" and "UTF-16" in this specification do not apply to character encodings with any other labels, even if the encodings or labels are very similar to UTF-8 or UTF-16. related character encodings, including but not limited to UTF-16BE, UTF-16LE, or CESU-8.
Section E Autodetection of Character Encodings

Change the last sentence of the first paragraph to read:

We will consider the first case first. these cases in turn.
The former reading in 4.3.3 still caused some confusion, especially with respect to UTF-16BE and UTF-16LE.


Section 5.2 Using XML Processors

Amend the last sentence of the second item of the bulleted list so that it reads:

For example, a non-validating processor may fail to normalize attribute values, include the replacement text of internal entities, or supply default attribute values, where doing so depends on having read declarations in external or parameter entities, or in the internal subset after an unread parameter entity reference.
Improve the informativeness of the example sentence.

Errata as of 2007-09-25


Section 4.1 Character and Entity References

Remove a duplicate "to" from the last paragraph of the description of the "Entity Declared" WFC, so that it reads:

Note that non-validating processors are not obligated to to read and process entity declarations occurring in parameter entities or in the external subset; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'.
This was a typo.


Section 5.2 Using XML Processors

In the last sentence of the last paragraph, remove a superfluous space just before a closing parenthese.

This was a typo.

Last updated $Date: 2008/01/18 18:24:58 $ by $Author: jigsaw $