Errata Candidates for XInclude-1.0

Potential Errata

PEX1 From Elliotte Harold
Fatal XInclude errors in unactivated fallbacks
 After having read the final recommendation, I think I've changed my mind 
 on what happens here, even though no text relevant to this has changed. 
 I think the relevant sentence is:

 Other content (text, processing instructions, comments, elements not in 
 the XInclude namespace, descendants of child elements) is not 
 constrained by this specification and is ignored by the XInclude 
 processor, that is, it has no effect on include processing, and does not 
 appear in the children properties of the result infoset.

 I am now hypothesizing that fatal XInclude errors hidden inside 
 unactivated fallbacks are not fatal, and should not be reported. In any 
 case, this is what I am going to implement in XOM 1.0 (and what has been 
 implemented in the last several releases). If anyone disagrees with 
 that, please holler.
Adding a new paragraph between paragraph 3 and 4 of section 3.2:
    The content of xi:fallback elements are ignored unless a resource
    error occurs while processing the surrounding xi:include element. In
    particular, apparent fatal errors caused by the presence, absence,
    or content of elements and attributes inside the xi:fallback element
    must not be reported in xi:fallback elements that are ignored.

PEX2 From Bjoern Hoehrmann
URI or IRI errors handling
Dear XML Core Working Group, states in section 3.1:

A value that results in a syntactically invalid URI or IRI should
be reported as a fatal error, but some implementations may find it
impractical to distinguish this case from a resource error. 

This seems to assume that such resource identifiers neccessarily yield
in a resource error. Please change the text to explicitly state whether
an implementation would be considered conforming if it does not report
a fatal error and processing does not yield in a resource error. This
is possible e.g. if processing yields in a IRI that meets the generic
constraints of the IRI specification but does not meet the specific re-
quirements of the specific scheme and a request is made to via an IRI-
aware protocol for which the server is designed to tolerate such faults,
possibly licenced by the protocol specification, for example. The text
implies that such implementations would be non-conforming.
There will be no change to the spec.

We don't expect implementors of XInclude to implement 
IRI processing, so whatever ends up happening with 
respect to IRIs isn't really the fault of the XInclude 
processor. However, we don't want to license such behavior, 
so we don't want to change our current wording here. 
PEX3 From Bjoern Hoehrmann
What is an error
Dear XML Core Working Group, states in section 3:

  XInclude processors must stop processing when encountering errors
  other than resource errors, which must be handled as described in
  4.4 Fallback Behavior.

It is not clear whether the intention is to further describe require-
ments for "fatal errors" or whether this refers to other kinds of
errors. In case of the latter, it is not defined what constitutes an
"error"; please define explicitly for which conditions this require-
ment applies, for example, assuming the fragment is non-conforming,
whether this applies for

  <xi:include accept="foo" ... />

(I assume this would be non-conforming as RFC 2616 does not allow such
syntax for the Accept header).

It seems the current text turns all "errors" into "fatal errors" in
which case it would make sense to say e.g. "errors are either resource
errors or fatal errors".
This is a duplicate of PEX7.

PEX4 From Bjoern Hoehrmann
The encoding attr should trigger Accept-Charset
  In order to minimize encoding errors for parse="text" processing,
please change the definition of the encoding attribute to include a
requirement that if the attribute has a legal value and the encoding is
supported and the protocol supports such action, that the server is
informed of the encoding attribute value, e.g. for encoding="iso-8859-2"
and a HTTP request, that the request includes

  Accept-Charset: iso-8859-2

such that the server has a chance to provide a proper representation.
This comment is made obsolete by the fact that we have removed accept-charset from the final REC.
PEX5 From Bjoern Hoehrmann
XML encoding detection in parse="text"
Statement states in section 4.3:

  * if the media type of the resource is text/xml, application/xml, or
    matches the conventions text/*+xml or application/*+xml as described
    in XML Media Types [IETF RFC 3023], the encoding is recognized as
    specified in XML, otherwise 

It is not clear whether this also applies to other Media Types such as
"message" or "image", e.g. for Message/Email+XML or image/svg+xml.
Please clearly indicate to which types this applies.

I am concerned that future revisions of RFC 3023 or the registration of
MIME Types that are different from the types registered so far might
contradict the requirements of the document, for example, it has been
proposed that there is no charset parameter for image/svg+xml, thus,
without special knowledge of the image/svg+xml MIME Type, XInclude
processors would seem to be required to consider an illegal charset
parameter for image/svg+xml resources which would render them non-
conforming to the image/svg+xml registration. RFC 3023 might also be
revised to make it a fatal error if e.g. a application/xml resource
with a charset parameter that is different from the encoding that would
be determined by XML rules, it would seem that XInclude would contradict
such a requirement. Please include a discussion on how such events will
be handled for XInclude.

As indicated in

the processing of the encoding attribute seems not well-defined. For
example, HTTP/1.1 requires that implementations determine for all text/*
resources without a charset parameter ISO-8859-1 encoding, this means
that for all text/* resources an encoding can be determined without
further processing of the content, thus from the first definition of the
encoding attribute it would seem that the encoding attribute is ignored
for all text/* types. One could read this section however so that this
is not considered external encoding information and thus the encoding
attribute would apply to e.g. text/plain resources. A good first step to
improve the definition of the attribute would be to reference section
4.3 for the definition of the attribute rather than defining it in two

It is not clear how text/xml resources without a charset parameter are
to be processed, the text is, again,

  * if the media type of the resource is text/xml, application/xml, or
    matches the conventions text/*+xml or application/*+xml as described
    in XML Media Types [IETF RFC 3023], the encoding is recognized as
    specified in XML, otherwise 

Processing text/xml resources according to XML would mean to process the
resource as if it were application/xml which would be inconsistent with
RFC 3023. Please state clearly what the actual processing requirements
are and indicate clearly whether this is consistent with MIME, HTTP/1.1,
and RFC 3023. Note that RFC 3023 contradicts HTTP/1.1 as described in
RFC 3023. This would include to provide a more precise definition of
what is considered "external encoding information".

Please include a strong warning that this processing can yield in
choosing the wrong encoding e.g. for many resources as inline encoding
information or type specific defaults are ignored.
We think the XInclude spec is clear enough in that RFC 3023
already says *+xml are of type XML.  As far as svg+xml which
doesn't have a charset parameter, one just does the usual
fallback for determining charset.

Leave the spec as is
PEX6 From Bjoern Hoehrmann
parse="text" and Byte-Order Mark
Please define what happens if UTF-8 has been determined by the last item
in the list and the resource starts with U+FEFF, i.e., whether this is
to be considered a byte order mark or a character.
We will add the following clarification that applies
to all encodings:

When the first character is interpreted as a BOM,
it should be discarded.  It is interpreted as a
BOM in UTF-8, UTF-16, and UTF-32 encodings; it is
not interpreted as a BOM in the UTF-16LE, UTF-16BE,
UTF-32LE, and UTF-32BE encodings.

PEX7 From Bjoern Hoehrmann
Illegal value in encoding attribute
It is not clear what happens if the encoding attribute has an illegal
value, for example encoding="ISO_8859-7:1987" would be illegal as the
EncName production in XML 1.0 does not include ":". This is not
descibred as fatal error in the specification, thus, if the third item
applies and the encoding attribute has an illegal value applications
might choose to ignore the attribute and continue with the last item, or
ignore that the attribute value is illegal and process the document
using the ISO-8859-7 encoding. Please define processing in this case
more clearly. (Note that ISO_8859-7:1987 is a legal, registered name and
thus it would not be a resource error due to an unsupported encoding if
the implementation supports the encoding and the registered name).
See also PEX2 and PEX3.

We will change the wording for the encoding attribute
in section 3.1 to change the sentence

"The value of this attribute is an EncName as defined
in XML specification, section 4.3.3, rule [81]."

to say something like

"The value of this attribute SHOULD be a valid encoding name."

So an invalid encoding attribute isn't a fatal xinclude
error, though it might cause a resource error at some point.

PEX8 From Bjoern Hoehrmann
encoding attribute priority
Statement in section 3.1 states:


    When parse="text", it is sometimes impossible to correctly detect
    the encoding of the text resource. The encoding attribute specifies
    how the resource is to be translated. The value of this attribute is
    an EncName as defined in XML specification, section 4.3.3, rule
    [81]. The encoding attribute has no effect when parse="xml". 

It is not clear from the text above when the encoding attribute takes
priority over other information. If using this attribute means that
other encoding information such as the value of the charset parameter
for text/* types is ignored, please state this explicitly in the text.
If this is a fallback value when there is no other reliable information
please state that explicitly.
This comment was against the PR draft. We believe we've now made this clear in section 4.3 of the Rec.
PEX9 From Bjoern Hoehrmann
encoding attribute content
w.r.t. in section 3.1 :

It is not clear whether the requirements for character encodings in the
XML 1.0 Recommendation also apply to this attribute, for example, it is
recommended there that unregistered labels use the x- prefix. Please
state explicitly whether these requirements apply to this attribute
 Duplicate of PEX7 
PEX10 From Bjoern Hoehrmann
accept-* attributes improperly defined
Statement in section 3.1 states:


    The value of the accept attribute may be used by the XInclude 
    processor to aid in content negotiation. When the XInclude processor
    fetches a resource via HTTP, it should place the value of the accept
    attribute, if one exists, in the HTTP request as an Accept header as
    described in section 14.1 of [IETF RFC 2616]. Values containing
    characters outside the range #x20 through #x7E are disallowed in
    HTTP headers, and must be flagged as fatal errors. 

The lexical space seems to be unconstrained (it does not say, for
example, that the content must match the respective production rule in
RFC 2616). This makes validation of the accept attribute useless and may
cause illegal HTTP headers to be formed which is highly undesirable.
Please change the specification to clearly define the lexical space of
the attribute value.

The claim in the last sentence is simply false, HTTP headers may contain
characters outside that range, specifically the Accept header allows for
quoted-string tokens which include TEXT tokens which are defined as

  TEXT = <any OCTET except CTLs, but including LWS>

in RFC 2616. Please change the specification to give accurate
information and possibly refine the processing requirements to take into
account that such characters are allowed.

The same comments apply to the definition of the "accept-language"
attribute in the same section of the document.

First, the comment suggests we misstate what RFC 2616 says, so we need
to investigate that.

Next, we need to decide what to do about error checking the value of the
accept-* attributes. We are pretty sure we want to avoid linefeeds in
these attribute values, and we are pretty sure we don't want to require
complete validation of these values, so we appear to be leaning toward
doing some checking based on character ranges. Per 2616, section 2.2,
the correct character range includes (among other things) linefeeds
which we don't want to allow for security reasons. Therefore, we want
to be more restrictive than required by 2616. So, we plan to change:

    Values containing characters outside the range #x20 through #x7E are
    disallowed in HTTP headers, and must be flagged as fatal errors.

instead to be worded as follows:

    Values containing characters outside the range #x20 through #x7E must
    be flagged as fatal errors.
PEX11 From Bjoern Hoehrmann
Reference to RFC 2279 should be RFC 3629
Statement refers to RFC 2279
which has been obsoleted by RFC 3629; please either update the reference
or include a discussion in the document why it references the obsoleted
We will update the reference from 2279 to 3629.

PEX12 From Elliotte Harold
John Cowan wrote:

> That's definitely not the case: TagSoup reports id attributes on all
> elements as being of type ID.  So the problem is downstream of TagSoup,
> and I don't know where.  My guess would be that the implementation of
> identity transforms does not preserve the IDness of @id.

I recently discovered that almost everything to do with IDs is broken by 
design in DOM2. (Why does that not surprise me?) The practical impact is 
that it is impossible to implement a fully-conformant XInclude processor 
  in pure DOM2. You have to use implementation-dependent methods or 
leave out support for XPointers that depend on ID-type attributes.
We sympathize. Unfortunately, we can't change the DOM, but at least some
of these problems are said to be fixed in DOM Level 3.
PEX13 From Mike Brown
Normalize newlines when parse="text"?
I have a quick question about XIncludes. When processing an xi:include 
element with parse="text", must newlines in the included document be 
normalized to LF? I was having trouble finding any definitive info on 
this in the XInclude, Infoset and Character Model specs.
The question is "When processing an xi:include element with parse="text",
must newlines in the included document be normalized to LF?"

Our answer is "no". Such normalization would be part of XML processing,
not text processing.
PEX14 From Elliotte Harold
What if encoding is not an EncName
ERH asks "what should I do if the encoding attribute
does not satisfy [XML 1.0's EncName] production [81]
  Duplicate of PEX3 and PEX7.
PEX15 From Elliotte Harold
XPointers with percent escapes: what type of error
ERH says:

 Since the xpointer attribute is not a URI reference,
 %-escaping must not appear in the XPointer.... [He
 suggests clarifying that such would be] a resource
 error rather than a fatal error.

We simply need to insert
a sentence like "%-escaping is not done in XPointers, so '%' is
simply an ordinary character in the value of the xpointer 
PEX16 From Henry S. Thompson
XInclude, schema validity-assessment, xml:base and xml:lang
XInclude requires xml:base fixup with adds xml:base
attributes to a document.  This causes problems
validating the result against the original schema
if that schema doesn't mention xml:base.

Norm wants the XML Schema group to have a mode that
says "just assume all xml:* attributes are okay".

Henry points out we even have problems with validation
against DTDs in this case.

It was suggested that we add to the XInclude spec:
"An XInclude processor may, at user option, suppress
xml:base and/or xml:lang fixup."

Note, since this is "at user option" [see the XML spec
for the defn of "at user option"], all XInclude processors
MUST support xml:base and xml:lang fixup, but they MAY
provide a user-specifiable option to suppress such fixup.
Add to the XInclude specification
"An XInclude processor may, at user option, suppress xml:base and/or xml:lang fixup."
PEX17 From François Yergeau
Rewording for IRI reference attributes
 Section 4.1.1 had a definition for an IRI reference embedded in the specification
since the RFC defining IRIs was a work in progress. This mail suggest a rewording of that
The value of this attribute is an XML resource identifer as defined in 
[XML 1.1], which is interpreted as an IRI Reference as defined in RFC 
3987 [IETF RFC 3987], after the escaping procedure described in [XML 
1.1] is applied.  If necessary for the implementation, the value may be 
further converted to a URI reference as described in [XML 1.1].
Accept this as an erratum pointing to this wording in the new editions of XML. Update
the section with the new wording in the upcoming second edition version