Running log of potential errata for XML 1.0 3rd edition and XML 1.1 1st edition

This document is an internal working document of the XML Core WG. It maintains a running log of potential errata to the XML 1.0 spec, 3rd edition (dated 2004-02-04) and to the XML 1.1 spec, 1st edition (dated 2004-02-04). It is therefore the successor to the Running log of potential errata for XML 1.0 (second edition). It is meant to be a living document, frequently updated as new errata are discovered and as they are disposed of by the WG.

When a potential erratum is resolved, its entry in this document is moved to the Resolved cases section and, if appropriate (it is a real erratum, not a false alarm or a request for enhancement that cannot be resolved by an erratum), the official XML 1.0 3rd edition errata page or the official XML 1.1 1st edition errata page (or both) are updated.

Shortcuts:

Unresolved: PE122 PE139 PE148
In countdown:None
Resolved (and published if appropriate): PE123 PE124 PE125 PE126 PE127 PE128 PE129 PE130 PE131 PE132 PE133 PE134 PE135 PE136 PE137 PE138 PE140 PE141 PE142 PE143 PE144 PE145 PE146 PE147 PE149

Potential errata

PE122

Revisiting E15 (from second edition errata)

Status: To be examined

Category: Substantive

Impacts: 1.0 1.1

Problem
statement

From Jonathan Marsh:

I've been looking into E15 more fully.  Besides the backward
compatibility cost, there appears to be an implementability issue.

Microsoft parsers allow empty and element-only content to contain entity
references as long as those references expand to whitespace or to
nothing.  To do otherwise involves a substantial reworking of the parser
implementation strategy in use, making such a change very expensive, as
well as breaking any documents previously relying on this behavior
(though the number of such documents is likely to be small).

Our implementation difficulties are surfaced in the spec through an
obvious inconsistency in the spec.  Validation of attributes is done
after entity expansion (according to E20), but prior to character entity
expansion in elements (according to E15).  There appears to be no clear
reason why these contexts must differ.

Microsoft parsers accept all documents conformant to this erratum, but
may also accept some documents (which are unlikely to occur in the wild)
which do not conform to the constraints of this erratum.  In particular,
we fail the following test cases (by parsing each document without
error):

E15a.xml:

<!DOCTYPE foo [
<!ELEMENT foo EMPTY>
<!ENTITY empty "">
]>
<foo>&empty;</foo>

E15g.xml:

<!DOCTYPE foo [
<!ELEMENT foo (foo*)>
]>
<foo><foo/>&#32;<foo/></foo>

E15h.xml

<!DOCTYPE foo [
<!ELEMENT foo (foo*)>
<!ENTITY space "&#38;#32;">
]>
<foo><foo/>&space;<foo/></foo>

Discussion

From Jonathan Marsh:

> 3) Microsoft is problematic because unclear, Microsoft needs to tell us
> what issue exactly they have with E15.

A description of the problem with E15 is at
http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2003OctDec/0227.html.

The main objections are:
1) difficulty of implementation
2) entity expansion in attributes and elements is treated inconsistently (E20 vs. E15)
3) not backward compatible with deployed versions of MSXML

I note that failure to support E15 does not affect the infoset of the parsed document.


I would like to propose a solution, but I'm actually having trouble understanding
the erratum and how it led to the test cases we fail.  Perhaps somebody could help
me understand the test cases better.

In http://www.w3.org/TR/2003/PER-xml-20031030/PER-xml-20031030-review.html#elementvalid,
I find:

"... however, a reference to an internal entity with a literal value consisting
of character references expanding to white space does match S, since its
replacement text is the white space resulting from expansion of the character
references."

I think this specifically makes the following test case valid:

  <!DOCTYPE foo [
  <!ELEMENT foo (foo*)>
  <!ENTITY space "&#32;">
  ]>
  <foo><foo/>&space;<foo/></foo>

If that is the case, it is hard to see why the test cases we have problems
with are not valid:

E15g.xml:
  <!DOCTYPE foo [
  <!ELEMENT foo (foo*)>
  ]>
  <foo><foo/>&#32;<foo/></foo>

E15h.xml
  <!DOCTYPE foo [
  <!ELEMENT foo (foo*)>
  <!ENTITY space "&#38;#32;">
  ]>
  <foo><foo/>&space;<foo/></foo>

And for consistency the similar situation for EMPTY content:

E15a.xml:
  <!DOCTYPE foo [
  <!ELEMENT foo EMPTY>
  <!ENTITY empty "">
  ]>
  <foo>&empty;</foo>

From Richard Tobin:

I think the idea is that all the pointless possibilities that can be ruled
out, are ruled out.

> I think this specifically makes the following test case valid:
>
>   <!DOCTYPE foo [
>   <!ELEMENT foo (foo*)>
>   <!ENTITY space "&#32;">
>   ]>
>   <foo><foo/>&space;<foo/></foo>

Yes.  Entity references to space-separated sequences of elements have to be
valid, and this is just an empty sequence.  The character reference can't
be ruled out because it's gone by the time you know what the entity is
being used for.

> If that is the case, it is hard to see why the test cases we have
> problems with are not valid:
>
> E15g.xml:
>   <!DOCTYPE foo [
>   <!ELEMENT foo (foo*)>
>   ]>
>   <foo><foo/>&#32;<foo/></foo>
>
> E15h.xml
>   <!DOCTYPE foo [
>   <!ELEMENT foo (foo*)>
>   <!ENTITY space "&#38;#32;">
>   ]>
>   <foo><foo/>&space;<foo/></foo>

Well, if you regard it as a deficiency that the first case can't be ruled
out, then it's a deficiency that does not apply to these cases.

> And for consistency the similar situation for EMPTY content:
>
> E15a.xml:
>   <!DOCTYPE foo [
>   <!ELEMENT foo EMPTY>
>   <!ENTITY empty "">
>   ]>
>   <foo>&empty;</foo>

There is no good use for an entity reference in an EMPTY element, so
it can be ruled out.

PE139

Changing XML spec to use IRIs for system IDs

Status: To be examined

Category: Substantive

Impacts: 1.0 1.1

Problem
statement

From Richard Tobin:

In 4.2.2, replace the paragraph beginning "System identifiers (and other ..."
and the following 3-item list with:

System identifiers (and other XML strings meant to be used as URI
references) are converted to URI references as described in [IRIs RFC
3987].  They MAY contain characters that, according to [new URIs
RFC3986], must be escaped before a URI can be used to retrieve the
referenced resource.  XML processors MUST escape them as described in
section 3.1 of [IRIs RFC 3987].

We may want to include this existing text:

 Since escaping is not always a fully reversible process, it MUST be
 performed only when absolutely necessary and as late as possible in a
 processing chain. In particular, neither the process of converting a
 relative URI to an absolute one nor the process of passing a URI
 reference to a process or software component responsible for
 dereferencing it SHOULD trigger escaping.

but I am uncertain as to whether [Base URI] in the infoset should have
them escaped.

Discussion

From the minutes of 2005-04-20:

But [Richard] points out that he doesn't suggest we make this
change until and unless we change the references to 2396
to 3986.

Richard suggests we defer this erratum for now.

CONSENSUS to defer this erratum for now.

PE148

Review of MUSTs/SHOULDs/MAYs

Status: To be examined

Category: Substantive

Impacts: 1.0 1.1

Problem
statement

From John Cowan:

Grosso, Paul scripsit:

> ACTION to John:  Search for use of must and should
> related to the behavior of applications in XML 1.x.

Except as noted, these comments apply to both XML 1.0 and XML 1.1.

Dubious MUSTs:

in the definition of "match": replace "MUST be" with "are", since this is a
definition, not a specification of behavior.

in the definition of "enumerated attributes":  this is not really a definition,
and should be sorted into two sentences, a definition with "is" and a specification
with "MUST".

Section 3.4: the explanations of INCLUDE and IGNORE should read "MUST be processed"
instead of "MUST be considered".  Considering is not something XML processors
are equipped to do.

Dubious SHOULDs:

The remark in Section 2.13 (of XML 1.1) that XML applications that create
XML 1.1 output from either XML 1.1 or XML 1.0 input SHOULD ensure that
the output is fully normalized" is a constraint on applications, but
it is on applications that *write* XML, and as such is analogous to a
constraint on document authors.

A similar remark is found at the end of Section 5.1 in XML 1.1, and should
also be left alone.

Dubious MAYs:

It seems to me that most of the MAYs in the document are not genuine RFC 2119
MAYs (truly optional behavior) but represent choices that XML provides to
document authors.  I recommend that only the following ones be retained:

	definitions of "error", "fatal error", and "at user option"
	in 2.5, "an XML processor MAY"
	in 2.10, "an XML processor MAY report the error or MAY recover"
	in 2.13 (XML 1.1), "the processor MAY, at user option, ignore"
	in 3.2, "an XML processor MAY issue a warning"
	in 3.3, "At user option, an XML processor MAY issue a warning"
	also in 3.3, "an XML processor MAY at user option issue a warning"
	in 4.2, "an XML processor MAY issue a warning"
	in 4.3.3, "entities encoded in UTF-8 MAY begin"
	in 4.4.3, "the processor MAY, but need not, include"
	in 4.7, "They MAY additionally resolve the external identifier"

All other MAYs should be changed to plain "may".

Discussion

ACTION to Henry: Review the MAYs again and create a marked up version with changes.

Resolved and (when appropriate) published

PE123 Errors in productions 1 & 78

Status: Published 2004-04-28

Category: Substantive

Impacts: 1.1

Problem
statement

From Richard Tobin:

This seems to be wrongly formatted - there is no space between
RestrictedChar and Char*.  And since the relative precedence of
concatenation and subtraction are not specified, there ought to be
parentheses.

  [1]  document  ::=   prolog element Misc*- Char* RestrictedCharChar*

should be

  [1]  document  ::=   (prolog element Misc*) - (Char* RestrictedChar Char*)

The parenthesis problem also applies to production 78 (extParsedEnt).

Resolution

Publish two corrections, one Editorial (to add the missing spaces in [1]) and one Substantive (to introduce the parens in [1] and [78]).

The Editorial one:

Section 2.1 Well-Formed XML Documents: Amend production [1] to read (spaces added in two places):
[1] document ::= prolog element Misc* - Char* RestrictedChar Char*

The Substantive one:

Section 2.1 Well-Formed XML Documents: Amend production [1] to read (parentheses introduced):
[1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* )
Section 4.3.2 Well-Formed Parsed Entities: Amend production [78] to read (parentheses introduced):
[78] extParsedEnt ::= ( TextDecl? content ) - ( Char* RestrictedChar Char* )
Rationale: Since the relative precedence of concatenation and subtraction are not specified, there ought to be parentheses.

PE124 Inheritance of xml:lang

Status: Published 2004-06-23

Category: Substantive

Impacts: 1.0 1.1

Problem
statement

From Martin Dürst:

I'm writing to you based on an action item from the I18N WG (Core TF).
We have become aware of your message to the Chairs of the RDF WG at
http://lists.w3.org/Archives/Member/w3c-xml-core-wg/2003OctDec/0192.html:

   The XML Core WG has reviewed the RDF/XML draft and finds no issues to
   raise. We observe that the RDF/XML use of xml:lang is different than
   might be expected by an application unaware of the special semantics
   of RDF. However, the XML Recommendation only describes the meaning of
   xml:lang as an "intent", so we accept that application-specific
   semantics are allowed. And such semantics seem necessary to correctly
   capture the RDF object model in an XML serialization.

We have been really surprised when we learned about this message,
and we request the XML Core WG to reconsider it
(if necessary with wider discussion, e.g. on the XML Plenary list).
We clearly think that the statements in the above email are wrong
and damaging to interoperability.

We would like to note the following points from our discussion:
(http://www.w3.org/mid/4.2.0.58.J.20031209130728.06ef90a8@localhost)

- The XML spec says:
   "The intent declared with xml:lang is considered to apply to all
    attributes and content of the element where it is specified, unless
    overridden with an instance of xml:lang on another element within that
    content."

...

Resolution

Section 2.12 Language Identification

Amend the first paragraph after the example to read:

The language specified by xml:lang is considered to applies to all attributes and content of the element where it is specified (including the values of its attributes), and to all elements in its content unless overridden with another instance of xml:lang on another element within that content. In particular, the empty value of xml:lang is used on an element B to override a specification of xml:lang on an enclosing element A, without specifying another language. Within B, it is considered that there is no language information available, just as if xml:lang had not been specified on B or any of its ancestors. Applications determine which of an element's attribute values and which parts of its character content, if any, are treated as language-dependent values described by xml:lang.

Rationale

The original text confusingly used the word "intent" and the phrase "is considered to". The new text clarifies that xml:lang specifies a language, but also makes it clear that applications may ignore the specification when not relevant.

PE125

Hexadecimal character references in Note

Status: Published 2004-07-15

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From G. Ken Holman:

I note in the gray table in the Note in Section 2.2 that a number of the
hexadecimal references that begin ranges are missing lower case "x"
indicating the number is hexadecimal.

In fact, lines 2 through the end are missing this lower case "x" in the
start character reference for every range.

Resolution

Section 2.2 Characters

Amend the list of code points in the note following productiion [2] so that it reads:

[#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].

PE126

Wrong link to #dt-extent

Status: Published 2004-07-15

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From joseph:

4.7 Notation Declarations


First paragraph :


The first link should be


<a title="Unparsed Entity"
href="#dt-unparsed"
>unparsed entities</a>


instead of


<a title="External Entity"
href="#dt-extent"
>unparsed entities</a>

Resolution

Section 4.7 Notation Declarations: In the first paragraph, change the target location of the link corresponding to the text "unparsed entities" from "#dt-extent" to "#dt-unparsed".

PE127 1.0 example in 1.1

Status: Published 2004-07-15

Category: Editorial

Impacts: 1.1

Problem
statement

From joseph:

C Expansion of Entity and Character References


(Clarification) Forth example:


Should (?) it better be


<?xml version='1.1'?>


instead of


<?xml version='1.0'?>

Resolution

Section Expansion of Entity and Character References: In the fourth example, change <?xml version='1.0'?> to <?xml version='1.1'?>.

PE128

Missing ] in appendix I

Status: Published 2004-07-15

Category: Editorial

Impacts: 1.1

Problem
statement

From joseph:

Appendix I - Suggestions for XML Names


The sixth list entry of the ordered list :


A closing bracket ']' is missing in:

"The interlinear annotation characters ([#xFFF9-#xFFFB) should not be used in names."


It should be

([#xFFF9-#xFFFB])

Resolution

Section I Suggestions for XML Names: The sixth item in the numbered list of suggestions should read as follows:

6. The interlinear annotation characters ([#xFFF9-#xFFFB ]) should not be used in names.
Rationale: The closing ] was missing.

PE129

Missing links from non-terminal S

Status: Published 2004-07-15

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From G. Ken Holman:

A minor nit: I note a number of productions do not correctly tag the "S"
particle as being a production ([3]), hence it is not linked to the production.

Productions [60], [62] and [63] are the ones I see off hand ... there may
be more.

Discussion

FY: there are no other than [60], [62] and [63].

Resolution

In productions [60], [62] and [63], make a link from all instances of the non-terminal S to its defining production ([3]).

PE130

Missing paren in 5.2

Status: Published 2004-08-25

Category: Editorial

Impacts: 1.1

Problem
statement

From G. Ken Holman:

A right parenthesis is missing at the end of the last paragraph in section 5.2.

Discussion

It's actually not at the end, but just before the SHOULD.

Resolution

Section 5.2 Using XML Processors: Restore a missing closing parenthesis in the last sentence of the last paragraph, so that it reads:

... Applications which require DTD facilities not related to validation (such as the declaration of default attributes and internal entities that are or may be specified in external entities ) SHOULD use validating XML processors.
Rationale: The parenthesis exists in XML 1.0 Third edition and was unintentionally lost in 1.1.

PE131 Space or S in XML decl.

Status: Published 2004-08-25

Category: Substantive

Impacts: 1.1

Problem
statement

From G. Ken Holman:

I note production [32] restricts the white-space characters in advance of
the standalone declaration to be only spaces.  I would have expected the S
production, and indeed it would seem that Xerces supports the S production
and not just spaces.

Can you clarify the need for only spaces?

Discussion

We discussed at one point restricting the whitespace in the XML (and text) declaration to just #x20, but other related productions ([23], [24], [25], [80]) use S. Did we goof on this one?

Background:

Resolution

Section 2.9 Standalone Document Declaration: Amend production [32] so tyhat it reads:
[32] SDDecl ::= S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"'))
Rationale: The S in production [32] of XML 1.0 was mistakenly changed to #x20 in XML 1.1, making it inconsistant with related productions [23], [24], [25] and [80].

PE132

Validity of default attribute values (again)

Status: Published 2004-11-20

Category: Substantive

Impacts: 1.0 1.1

Problem
statement

From Dieter Köhler:

At 15:41 15.09.2004 -0400, François Yergeau wrote:
>Dieter Köhler a écrit :
>>In XML 1.0, 3rd ed., sec. 3.3.2 "Attribute Defaults" the case of a
>>default attribute not matching an enumaration is not considered.  Example:
>><!ATTLIST list
>>           type    (bullets|ordered|glossary)  "fancy">
>>There exists no VC which forbids this, because the VC Attribute Default
>>Value Syntactically Correct requires only that the default value meets
>>the "syntactic constraints of the declared attribute type", which is
>>Nmtoken in the case of Enumerations.  I would suggest adding to prod.
>>[60] a VC similar to VC Enumeration of prod [59].
>
>Please take a look at erratum E9 to the Second Edition
>(http://www.w3.org/XML/xml-V10-2e-errata#E9), which resulted in the
>situation you now see in the Third Edition.  In a nutshell, your example
>is valid, until a <list> element appears in the instance without a "type"
>attribute; the validator then applies the default value "fancy", which is
>not valid.

The current XML test-suite contains two examples of invalid XML files using
default attribute values which do not match an enumeration.  These are the
test files of ID "attr16" and "ibm-invalid-P60-ibm60i03.xml".  This seems
to be a strong indication that there exists the expectation that this type
of attribute list declarations should be considered illegal.

>The argument is that this is required for SGML compatibility and that this
>is actually what the makers of XML 1.0 wanted.  Please reply if you think
>otherwise, but be aware that errata can only fix actual errors in the
>spec, not undo what one might consider a bad design decision.

Could you clarify what "SGML compatibility" in particular means.  My simple
understand is that it means: "Every valid XML file is also a valid SGML
file".  According to this definition, stronger VCs in XML are
harmless.  Consequently, a VC as suggested by me does not question SGML
compatibility in general.

Discussion

attr16.xml:

<!DOCTYPE root [

<!ELEMENT root EMPTY>
<!ATTLIST root
    value   (brittannica | worldbook) "encarta"
    >
    <!-- tests the "attribute default legal" vc -->
]>

<root value="brittannica"/>

ibm60i03.xml:

<?xml  version="1.0"?>
<!-- validity test for Production 60-->
<!DOCTYPE test
 [
 <!ELEMENT test ANY>
 <!ELEMENT a EMPTY>
 <!ELEMENT b EMPTY>
 <!ELEMENT attr EMPTY>
 <!ATTLIST attr value (a|b) "c">
  ]>
<test>
The default value specified for an attribute does not meet the
lexical constraints of the declared attribute type.
</test>

Resolution

Section 3.3.2 Attribute Defaults

Amend the "Attribute Default Value Syntactically Correct" validity constraint so that it reads:

Validity constraint: Attribute Default Value Syntactically Correct

The declared default value MUST meet the syntactic constraints of the declared attribute type. That is, the default value of an attribute:

of type IDREF or ENTITY must match the Name production;
of type IDREFS or ENTITIES must match the Names production;
of type NMTOKEN must match the Nmtoken production;
of type NMTOKENS must match the Nmtokens production;
of an enumerated type (either a NOTATION type or an enumeration) must match one of the enumerated values.

Note that only the syntactic constraints of the type are required here; other constraints (e.g. that the value be the name of a declared unparsed entity, for an attribute of type ENTITY) may come into play if the declared default value is actually used (an element without a specification for this attribute occurs) will be reported by a validating parser only if an element without a specification for this attribute actually occurs.

Rationale

It was not clear what "meet the syntactic constraints" meant.

PE133

CDATA sections, PIs and Comments in Mixed and ANY content models

Status: Published 2004-11-08

Category: Substantive

Impacts: 1.0 1.1

Problem
statement

From Glenn Marcy:

As things stand now, the following appear to be the case:

>From Section 3, "Validity Constraint: Element Valid"

- Mixed and ANY content models do not allow CDATA sections.

- ANY content models do not allow PIs or Comments as well.

Does anyone else find this surprising?

Resolution

Section 3 Logical Structures

Amend list item 3 of the "Element Valid" validity contraint so that it reads:

The declaration matches Mixed , and the content (after replacing any entity references with their replacement text) consists of character data (including CDATA sections), comments, PIs and child elements whose types match names in the content model.

Amend list item 4 of the "Element Valid" validity contraint so that it reads:

The declaration matches ANY, and the content (after replacing any entity references with their replacement text) consists of character data, CDATA sections, comments, PIs and child elements whose types have been declared.

Rationale

The spec was not clear that CDATA sections, comments and PIs may occur in elements with content models Mixed and ANY. Section 2.5 says that "Comments MAY appear anywhere in a document outside other markup". Section 2.7 says that "CDATA sections MAY occur anywhere character data may occur". The case for PIs is a little bit less clear, but is supported by the wording for Mixed in this VC, by the equal treatment of PIs and Comments in productions [27] Misc and [29] markupdecl, as well as by widespread practice.

PE134

Non-ascii chars in XML/text declaration

Status: Resolved, not an erratum

Category: Substantive

Impacts: 1.1

Problem
statement

From Bjoern Hoehrmann:

  I am unable to find your public response that formally addresses an
issue raised by David Carlisle on the XML 1.1 Proposed Recommendation
which is publicly archived at

  http://lists.w3.org/Archives/Public/xml-editor/2003OctDec/0048.html

which you have apparently rejected (the Recommendation contains the same
apparently contradictory text). The relevant text in the Recommendation
is:

[...]
  To simplify the tasks of applications, the XML processor MUST behave
  as if it normalized all line breaks in external parsed entities
  (including the document entity) on input, before parsing, by
  translating all of the following to a single #xA character:

  [...]

  The characters #x85 and #x2028 cannot be reliably recognized and
  translated until an entity's encoding declaration (if present) has
  been read. Therefore, it is a fatal error to use them within the XML
  declaration or text declaration.
[...]

I do not understand this either. Please point me to your response to
David which will hopefully answer the following questions:

  * Why is it (in theory) not possible to recognize these characters
    reliably or (in theory) with less reliability than recognizing
    any other character such as U+0020?

  * How can a processor detect this error if it is not possible to
    recognize the offending characters reliably?

  * How can a processor detect this error if it is not possible that
    these characters are present when parsing the XML declaration due
    to line break normalization?

Resolution

No change to the spec. Respond to the commenter's 3 questions as follows:

Q: Why is it (in theory) not possible to recognize these characters reliably or (in theory) with less reliability than recognizing any other character such as U+0020?

A: Although it is possible in theory to recognize these characters, it was chosen to forbid them in order to simplify the implementation of Autodetection of Character Encodings along the lines of Appendix E. The sentence "The characters #x85 and #x2028 cannot be reliably recognized and translated until an entity's encoding declaration (if present) has been read." is not strictly true, but since it is not normative (it only explains why #x85 and #x2028 are forbidden) we do not feel compelled to amend the spec.

Q: How can a processor detect this error if it is not possible to recognize the offending characters reliably?

A: It is certainly possible to recognize those characters reliably after determining the encoding. If the encoding cannot be determined, then the point is moot, the processor cannot continue anyway.

Q: How can a processor detect this error if it is not possible that these characters are present when parsing the XML declaration due to line break normalization?

A: The statement in 2.11 "...it is a fatal error to use [U+0085 or U+2028] within the XML declaration or text declaration" creates an obligation for the processor to detect them before performing line break normalization. Yes, this is somewhat ugly since a layer of processing that could be entirely context-free now depends on the state of the parser (is within an XML/text declaration, or not).

PE135

When to check entity WFness according to 4.3.2

Status: Published 2005-01-09

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From Jeff Rafter:

 > The requirement is that "Each of the parsed entities which is
 > referenced directly or indirectly within the document is well-formed."
 > An entity is certainly *declared* in the prolog, but it is not
 > *referenced* there.

It can be referenced there in an attribute value declaration but that is 
really not the point. I have to agree with Karl, and must admit to being 
very frustrated by the opaqueness of the recommendation in this area. I 
have spent three days retrofitting a parser to perform a check that was 
not necessary-- and I am sure I am not the only person to have done so.

 > The requirement is that "Each of the parsed entities which is
 > referenced directly or indirectly within the document is well-formed."

Is *far* more clear than

   "4.3.2 The document entity is well-formed if it matches the
    production labeled <document>."

Which links to:

   "[1] document ::= prolog element Misc*"

If at that point you return to your reading in 4.3.2 you encounter:

   "An internal general parsed entity is well-formed if its
    replacement text matches the production labeled content."

By then you already missed the boat. You forgot to check what the 
implications of a wellformed *textual object* is-- which happens to 
include the <document> production-- but is not referenced anywhere in 
section 4.

I agree that there needs to be some sort of erratum here to clarify 
things. At worst, I would love to see "well-formed" in "An internal 
general parsed entity is well-formed" be turned into a link that points 
to http://www.w3.org/TR/REC-xml/#sec-well-formed.

But a clarification sentence of something like:

   "Internal general parsed entities should only be checked for\
    well-formedness if they are <included> or <included in literal>"

would help immensely. For most implementers this is a confusing area and 
can be clarified quite easily.

Resolution

Section 4.3.2 Well-Formed Parsed Entities: Add a note after the first paragraph, reading:

Note:
Only parsed entities that are referenced directly or indirectly within the document are required to be well-formed.
Rationale: This is a clarification, a reminder of the definition of well-formed documents in section 2.1 Well-Formed XML Documents (third item of the numbered list).

PE136

XML 1.1 processors accepting XML 1.0 documents

Status: Published 2005-01-09

Category: Substantive

Impacts: 1.1

Problem
statement

From David Carlisle:

I believe there is a slight inconsistency in the specification of the
behaviour on 1.0 documents:


2.8 Prolog and Document Type Declaration
says:
XML 1.1 processors SHOULD accept XML 1.0 documents as well. 

5.1 Validating and Non-Validating Processors
says:
XML 1.1 processors MUST be able to process both XML 1.0 and XML 1.1 documents. 


SHOULD or MUST?

Resolution

Section 2.8 Prolog and Document Type Declaration: Delete the first sentence of the last paragraph:

XML 1.1 processors should accept XML 1.0 documents as well.
Rationale: The sentence was inconsistent with another one in 5.1: "XML 1.1 processors must be able to process both XML 1.0 and XML 1.1 documents."

PE137

Improper RFC2119 "MAY"

Status: Published 2005-01-09

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From Tim Bray:

Section 2.:

"XML document MAY in addition be valid if it meets certain further 
constraints.]"

Er, that MAY is not an RFC2119 "MAY", it's just an ordinary English 
conditional.  It's saying that there is another class of XML documents 
called "valid" defined by meeting extra constraints. -Tim

Resolution

Section 2 Documents: Amend the first paragraph to read:

[Definition: A data object is an XML document if it is well-formed, as defined in this specification. In addition, the XML document is valid if it meets certain further constraints.]
Rationale: The second sentence was cast as normative but is in fact merely descriptive.

PE138

Further fix to E05

Status: Published 2005-01-09

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From Peter Agricola:

This refers to my First Message about this error

(Sent: Thursday, June 10, 2004 1:00 PM)

and to E05 of XML 1.1 Errata (resp. E03 of XML 1.0 Errata):




   Not only the target location of the link have to be

   changed, also the value of the 'title' attribute has

   to be changed, actually to "Unparsed Entity" !

Resolution

Section 4.7 Notation Declarations: In the first paragraph, change the title attribute of the link corresponding to the text "unparsed entities" from "External Entity" to "Unparsed Entity".

PE140

Conflict between Standalone Document Declaration VC in [32] and Entity Declared WFC in [68]

Status: Resolved, not an erratum

Category: Substantive

Impacts: 1.0 1.1

Problem
statement

From Dieter Köhler:

There seem to exist several inconsistancies with regard to Prod. [68] WFC: 
Entity Declared and the VC: Entity Declared.

1) Prod. [32] VC: Standalone Document Declaration implies that it is a VC 
if standalone="yes" and entity references (other than to amp, lt, qt, apos, 
quot) appear in the document which are defined in the external DTD subset, 
but prod. [68] WFC: Entity Declared implies that it is a WFC instead.

Resolution

It was decided at the March 2006 FTF meeting to say that "we have fiddled this wording enough and we aren't going to fiddle it any more for fear of making it worse."

PE141 "external parameter entities" is ambiguous

Status: Published 2005-11-02

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From Dieter Köhler:

There seem to exist several inconsistancies with regard to Prod. [68] WFC: 
Entity Declared and the VC: Entity Declared.

[...]

2) Prod. [68] VC: Entity Declared starts: "In a document with an external 
subset or external parameter entities ..." The term "external parameter 
entities" is ambiguous, because it can either refer to PE references or PE 
declarations.  Shouldn't it read: "... or references to external parameter 
entities ..."?

Resolution

Section 4.1 Character and Entity References: Change the first sentence of the description of the Entity Declared Validity Constraint, so that it begins:

In a document with an external subset or external parameter entitiesentity references with "standalone='no'",...
Rationale: It wasn't clear that "external parameter entities" meant references, as opposed to declarations of same.

PE142

Entity Declared WFC in [68]

Status: Resolved, not an erratum

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From Dieter Köhler:

There seem to exist several inconsistancies with regard to Prod. [68] WFC: 
Entity Declared and the VC: Entity Declared.

[...]

3) The second paragraph of the Prod. [68] WFC: Entity Declared says:

"Note that non-validating processors are not obligated to read and process 
entity declarations occuring in parameter entities or in the external 
subset; for such documents, the rule that an entity must be declared is a 
well-formedness constraint only if standalone='yes'.

Does this mean that if standalone="no", a missing declaration is always a 
VC or only if a non-validating processor is used?  If it is always a VC the 
start of the VC: Entity Declared can be changed to "In a document with 
"standalone='no'", the Name ...".  If it depends on the type of the 
processor used, for clearity the words "for such documents" should in my 
opinion be replaced by "when a non-validating processor is used on such 
documents".

Resolution

It was decided at the March 2006 FTF meeting to say that "we have fiddled this wording enough and we aren't going to fiddle it any more for fear of making it worse."

PE143

The "No < in Attribute Values" WFC in prods. [41] and [60]

Status: Published 2006-04-25

Category: Substantive

Impacts: 1.0 1.1

Problem
statement

From Dieter Köhler:

I am wondering why the prod. [41] WFC: "No External Entity References" 
applies not to default attribute values in attribute declarations while 
WFC: "No < in Attribute Default" does?  See prod. [60] WFC: "No < in 
Attribute Default", which is silently linked to prod. [41] WFC: "No < in 
Attribute Default" (shouldn't the reference perhaps be made legible?). In 
other words: Why is it required to check for '<' in external entities when 
references to such entities must not appear in actual attribute 
values?  Example:

<!DOCTYPE doc [
   <!ENTITY bar1 SYSTEM "bar.ent" >
   <!ELEMENT doc EMPTY >
   <!ATTLIST doc
             foo CDATA "&bar1;" >
]>
<doc foo="blabla" />

Here, the XML processor must check whether the "bar.ent" file contains a 
'<' even though a "&bar1;" attribute value would result in any case in a 
WFC violation.

Another line of argument would be the following:  Since the above document 
is wellformed (in case that bar.ent does not contain a '<') although the 
default attribute value refers to an external entity, and the reference to 
an external entity comes only into play, when no default attribute value 
would have been specified like in:

<!DOCTYPE doc [
   <!ENTITY bar1 SYSTEM "bar.ent" >
   <!ELEMENT doc EMPTY >
   <!ATTLIST doc
             foo CDATA "&bar1;" >
]>
<doc />

Why is it then necessary to check for '<' in default attribute values at 
all, as long as the default value is not actually used?  In other words: It 
would make no difference for the infoset, if prod. [60] WFC: "No < in 
Attribute Default" would be dropped.

Resolution

Section 3.3.2 Attribute Defaults: To the list of constraints of production [60] "DefaultDecl", add a reference to the "No External Entity References” WFC.
Rationale: This was an oversight, it is just as important not to have references to external entities in attribute value defaults as in attribute values (production [41]).

PE144

Definition of ANY

Status: Resolved, not an erratum

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From Dieter Köhler:

In prod. [39] VC: Element Valid, paragraph 4, the condition that the types 
of the child elements of an element of type ANY must have been declared 
seems to be superfluous, because this condition is also verified when each 
child element itself is tested for VC: Element Valid.  Also note that the 
other three paragraphs do no contain an equivalent condition.

Discussion

As is clarified in a follow-up message at http://lists.w3.org/Archives/Public/xml-editor/2005JulSep/0013.html, making the suggested change would not change the conditions under which a document is rated valid, so this is editorial. We could accept the suggestion, or add a note (from Tim Bray?) or do nothing.

Resolution

Do nothing. WG members consider that the superfluous text is nevertheless useful and should remain.

PE145

Prescriptive keywords in sec. 2.10 and 2.12

Status: Resolved, not an erratum

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From Dieter Köhler:

In the third paragraph of sec. 2.10 of the XML 1.0 spec., the word "should" 
needs to be capitalized: "... white space SHOULD be preserved ..."

In sect. 2.12. in the sentence starting with "When available, ...", the 
words "may" and "should" need to be capitalized.

Resolution

For sec. 2.10, Rejected, as the "should" there is not a SHOULD, it's just a prose "should".

In sect. 2.12, that's a note, so language therein cannot be normative, so Rejected.

PE146

Extend note on control chars in 2.2

Status: Published 2005-11-02

Category: Editorial

Impacts: 1.1

Problem
statement

From Jeremy Carroll:

Summary,

I suggest inserting the text:

[#x1-#x8],[#xB-#xC],[#xE-#x1F],

immediately before the text

[#x7F-#x84],

in the note in section 2.2 of XML 1.1 Rec.

http://www.w3.org/TR/2004/REC-xml11-20040204/#charsets


=======

Long version ....:


I am reviewing the rules to do with legal characters in xsd:anyURI.

Drilling down drew me to section 2.2 of XML 1.0,

http://www.w3.org/TR/2004/REC-xml-20040204/#charsets

which I compared with section 2.2 of XML 1.1

http://www.w3.org/TR/2004/REC-xml11-20040204/#charsets


===

It seems that there is a substantive change to permit control characters 
in the range #x1-#x1F in addition to #x9 | #xA | #xD permitted in 1.0.

However the note that follows has not been updated to reflect this, it 
says ...
[[
  The characters defined in the following ranges are also discouraged. 
They are either control characters or permanently undefined Unicode 
characters:

[#x7F-#x84],
]]
which seems to be the same in both versions, whereas in XML 1.1 
presumably the note would be more complete if it included a 
discouragement to use [#x1-#x8],[#xB-#xC],[#xE-#x1F]

Thus, I suggest inserting
[#x1-#x8],[#xB-#xC],[#xE-#x1F],
at the beginning of the list of discouraged characters in the note.

Resolution

blablabla

Section 2.2 Characters

Add ranges to the list of ranges in the note at the end of the section, so that it reads:

[#x1-#x8], [#xB-#xC], [#xE-#x1F], [#x7F-#x84], [#x86-#x9F], [#xFDD0-#xFDDF],
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
[#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
[#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
[#x10FFFE-#x10FFFF].

Rationale

The added ranges correspond to control characters newly allowed in XML 1.1, but which should be discouraged just as much as the other control characters already allowed in XML 1.0 but discouraged.

PE147

Conflicting use of "semantic(s)"

Status: Published 2005-12-07

Category: Editorial

Impacts: 1.0 1.1

Problem
statement

From Norm Walsh:

Mike Kay wrote the following on xml-dev:

> The XML specification does not use the term "semantic structure". It uses
> the word "semantic" twice:
> 
> (a) to say that it does not constrain the semantics of elements and
> attributes, other than those whose names beging with "xml"
> 
> (b) in 3.3.1, to say that the tokenized attributes such as ID, IDREFS "have
> varying lexical and semantic constraints".
> 
> These two statements are unfortunately contradictory, ...

We probably should be a little more careful.

Resolution

Section 3 Logical Structures: Amend the first sentence after production [39], so that it begins:

This specification does not constrain the application semantics, use, or (beyond syntax) names of the element types and attributes,...
Section 3.3.1 Attribute Types: Amend the second sentence of the first paragraph, so that it reads:

The string type may take any literal string as a value; the tokenized types have varying lexical and semantic constraintsare more constrained.
Rationale: The two uses of "semantic(s)" were somewhat conflicting.

PE149 Fix PE141

Status: Published 2006-04-25

Category: Substantive

Impacts: 1.0 1.1

Problem
statement

From Richard Tobin:

> We note that the resolution to PE141 has already made 
> what we believe is an appropriate wording change in this area.

I think we misread the erratum at the meeting.  It changes

  In a document with an external subset or external parameter entities ...

to

  In a document with an external subset or external parameter entity 
  references...

As I think we agreed at the meeting, it should read 

  In a document with an external subset or parameter entity references...

so we need to fix this erratum.

It makes no difference whether the parameter entity is external or
internal: minimal processors don't have to process it.

Resolution

Section 4.1 Character and Entity References: Change the first sentence of the description of the Entity Declared Validity Constraint, so that it begins:

In a document with an external subset or external parameter entity references with "standalone='no'",...
Rationale: It makes no difference whether the parameter entity is external or internal: minimal processors don't have to process it.

Last updated $Date: 2006/04/25 21:16:57 $ by $Author: fyergeau $