XML 1.1 Second Edition Specification Errata

Abstract

This document records all known errors in the Second Edition of the Extensible Markup Language (XML) 1.1 Specification ; for updates see the latest version.

The errata are numbered, classified as Substantive or Editorial, and listed in reverse chronological order of their date of publication in each category. Changes to the text of the spec are indicated thus: deleted text, new text, modified text . Substantive corrections are proposed by the XML Core Working Group, which has consensus that they are appropriate; they are not to be considered normative until approved by a Call for Review of Proposed Corrections or a Call for Review of an Edited Recommendation.

Please email error reports to xml-editor@w3.org.

Substantive errata

E07

Section 4.3.3 Character Encoding in Entities: Add a new paragraph following the second paragraph, to read:

If the replacement text of an external entity is to begin with the character U+FEFF, and no text declaration is present, then a Byte Order Mark MUST be present, whether the entity is encoded in UTF-8 or UTF-16.
Rationale: Provide unambiguous behavior by working around the inherent ambiguity of the BOM in Unicode.

E01

Section 1.1 Origin and Goals: Amend the first paragraph after the list of goals, so that it reads:

This specification, together with associated standards (Unicode [Unicode] and ISO/IEC 10646 [ISO10646] for characters, Internet RFC 30664646 [RFC1766] and the Language Subtag Registry [IANA-LANGCODES] for language identification tags, ISO 639 [ISO639] for language name codes, and ISO 3166 [ISO3166] for country name codes), provides all the information necessary to understand XML Version 1.1; and construct computer programs to process it.
Section A.1 Normative References: Change the [IETF RFC 3066] entry so that it points to IETF RFC 4646.
Section A.2 Other References: Change the [IANA-LANGCODES] entry so that it points to the new registry at http://www.iana.org/assignments/language-subtag-registry.
Rationale: RFC 3066 has been replaced by RFC 4646. The old registry pointed to by the IANA-LANGCODES entry is now stale and closed. With the new registry, reference to ISO 639 and ISO 3166 is no longer necessary (and may even be harmful in the future, because of stability concerns).

E02

Section 2.2 Characters: Amend the [#xFDD0-#xFDDF] range in the list of discouraged characters to read [#xFDD0-#xFDEF].
Rationale: "#xFDDF" was a typo, as can be ascertained by consulting the Unicode standard in which the whole [#xFDD0-#xFDEF] range was introduced as a block in one version (3.1) to serve as "non-characters" for internal use.

Editorial errata

E09

Section 4.1 Character and Entity References: Change the first sentence of the text of the Entity Declared VC as follows:

In a document with an external subset or parameter entity references, if the document is not standalone (either "standalone='no'" is specified or there is no standalone declaration), then the Name given in the entity reference MUST match that in an entity declaration.
Rationale: The existing wording was ambiguous and did not explicitly address the case of an absent standalone declaration.

E08

Section C Expansion of Entity and Character References

Append the following at the end of the Appendix:

In the following example

<!DOCTYPE foo [ 
<!ENTITY x "&lt;"> 
]> 
<foo attr="&x;"/>

the replacement text of x is the four characters "<" because references to general entities in entity values are bypassed. The replacement text of lt is a character reference to the less-than character, for example the five characters "<" (see 4.6 Predefined Entities). Since neither of these contains a less-than character the result is well-formed.

If the definition of x had been

<!ENTITY x "&#60;">

then the document would not have been well-formed, because the replacement text of x would be the single character "<" which is not permitted in attribute values (see WFC: No < in Attribute Values).

Rationale

This is an editorial clarification of a case that remained confusing.

E06

Section 4.3.3 Character Encoding in Entities: Change the second sentence of the first paragraph to read:

The terms "UTF-8" and "UTF-16" in this specification do not apply to character encodings with any other labels, even if the encodings or labels are very similar to UTF-8 or UTF-16. related character encodings, including but not limited to UTF-16BE, UTF-16LE, or CESU-8.
Section E Autodetection of Character Encodings: Change the last sentence of the first paragraph to read:

We will consider the first case first. these cases in turn.
Rationale: The former reading in 4.3.3 still caused some confusion, especially with respect to UTF-16BE and UTF-16LE.

E05

Section 5.2 Using XML Processors: Amend the last sentence of the second item of the bulleted list so that it reads:

For example, a non-validating processor may fail to normalize attribute values, include the replacement text of internal entities, or supply default attribute values, where doing so depends on having read declarations in external or parameter entities, or in the internal subset after an unread parameter entity reference.
Rationale: Improve the informativeness of the example sentence.

E03

Section 4.1 Character and Entity References: Remove a duplicate "to" from the last paragraph of the description of the "Entity Declared" WFC, so that it reads:

Note that non-validating processors are not obligated to to read and process entity declarations occurring in parameter entities or in the external subset; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'.
Rationale: This was a typo.

XML 1.1 Second Edition Specification Errata

Abstract

Substantive errata

Errata as of 2007-12-05

E07

Errata as of 2007-08-15

E01

E02

Editorial errata

Errata as of 2008-01-18

E09

Errata as of 2007-12-05

E08

E06

E05

Errata as of 2007-09-25

E03

E04