Errata Candidates for XInclude-1.0

Potential Errata

PEX1	From Elliotte Harold
Fatal XInclude errors in unactivated fallbacks
Problem Statement	After having read the final recommendation, I think I've changed my mind on what happens here, even though no text relevant to this has changed. I think the relevant sentence is: Other content (text, processing instructions, comments, elements not in the XInclude namespace, descendants of child elements) is not constrained by this specification and is ignored by the XInclude processor, that is, it has no effect on include processing, and does not appear in the children properties of the result infoset. I am now hypothesizing that fatal XInclude errors hidden inside unactivated fallbacks are not fatal, and should not be reported. In any case, this is what I am going to implement in XOM 1.0 (and what has been implemented in the last several releases). If anyone disagrees with that, please holler.
resolution	Adding a new paragraph between paragraph 3 and 4 of section 3.2: The content of xi:fallback elements are ignored unless a resource error occurs while processing the surrounding xi:include element. In particular, apparent fatal errors caused by the presence, absence, or content of elements and attributes inside the xi:fallback element must not be reported in xi:fallback elements that are ignored.
PEX2	From Bjoern Hoehrmann
URI or IRI errors handling
Problem Statement	Dear XML Core Working Group, http://www.w3.org/TR/2004/PR-xinclude-20040930/ states in section 3.1: [...] A value that results in a syntactically invalid URI or IRI should be reported as a fatal error, but some implementations may find it impractical to distinguish this case from a resource error. [...] This seems to assume that such resource identifiers neccessarily yield in a resource error. Please change the text to explicitly state whether an implementation would be considered conforming if it does not report a fatal error and processing does not yield in a resource error. This is possible e.g. if processing yields in a IRI that meets the generic constraints of the IRI specification but does not meet the specific re- quirements of the specific scheme and a request is made to via an IRI- aware protocol for which the server is designed to tolerate such faults, possibly licenced by the protocol specification, for example. The text implies that such implementations would be non-conforming.
resolution	There will be no change to the spec. We don't expect implementors of XInclude to implement IRI processing, so whatever ends up happening with respect to IRIs isn't really the fault of the XInclude processor. However, we don't want to license such behavior, so we don't want to change our current wording here.
PEX3	From Bjoern Hoehrmann
What is an error
Problem Statement	Dear XML Core Working Group, http://www.w3.org/TR/2004/PR-xinclude-20040930/ states in section 3: [...] XInclude processors must stop processing when encountering errors other than resource errors, which must be handled as described in 4.4 Fallback Behavior. [...] It is not clear whether the intention is to further describe require- ments for "fatal errors" or whether this refers to other kinds of errors. In case of the latter, it is not defined what constitutes an "error"; please define explicitly for which conditions this require- ment applies, for example, assuming the fragment is non-conforming, whether this applies for <xi:include accept="foo" ... /> (I assume this would be non-conforming as RFC 2616 does not allow such syntax for the Accept header). It seems the current text turns all "errors" into "fatal errors" in which case it would make sense to say e.g. "errors are either resource errors or fatal errors".
resolution	This is a duplicate of PEX7.
PEX4	From Bjoern Hoehrmann
The encoding attr should trigger Accept-Charset
Problem Statement	In order to minimize encoding errors for parse="text" processing, please change the definition of the encoding attribute to include a requirement that if the attribute has a legal value and the encoding is supported and the protocol supports such action, that the server is informed of the encoding attribute value, e.g. for encoding="iso-8859-2" and a HTTP request, that the request includes Accept-Charset: iso-8859-2 such that the server has a chance to provide a proper representation.
resolution	This comment is made obsolete by the fact that we have removed accept-charset from the final REC.
PEX5	From Bjoern Hoehrmann
XML encoding detection in parse="text"
Problem Statement	http://www.w3.org/TR/2004/PR-xinclude-20040930/ states in section 4.3: [...] * if the media type of the resource is text/xml, application/xml, or matches the conventions text/+xml or application/+xml as described in XML Media Types [IETF RFC 3023], the encoding is recognized as specified in XML, otherwise [...] It is not clear whether this also applies to other Media Types such as "message" or "image", e.g. for Message/Email+XML or image/svg+xml. Please clearly indicate to which types this applies. I am concerned that future revisions of RFC 3023 or the registration of MIME Types that are different from the types registered so far might contradict the requirements of the document, for example, it has been proposed that there is no charset parameter for image/svg+xml, thus, without special knowledge of the image/svg+xml MIME Type, XInclude processors would seem to be required to consider an illegal charset parameter for image/svg+xml resources which would render them non- conforming to the image/svg+xml registration. RFC 3023 might also be revised to make it a fatal error if e.g. a application/xml resource with a charset parameter that is different from the encoding that would be determined by XML rules, it would seem that XInclude would contradict such a requirement. Please include a discussion on how such events will be handled for XInclude. As indicated in http://lists.w3.org/Archives/Public/www-xml-xinclude-comments/2004Dec/0005.html the processing of the encoding attribute seems not well-defined. For example, HTTP/1.1 requires that implementations determine for all text/* resources without a charset parameter ISO-8859-1 encoding, this means that for all text/* resources an encoding can be determined without further processing of the content, thus from the first definition of the encoding attribute it would seem that the encoding attribute is ignored for all text/* types. One could read this section however so that this is not considered external encoding information and thus the encoding attribute would apply to e.g. text/plain resources. A good first step to improve the definition of the attribute would be to reference section 4.3 for the definition of the attribute rather than defining it in two places. It is not clear how text/xml resources without a charset parameter are to be processed, the text is, again, [...] * if the media type of the resource is text/xml, application/xml, or matches the conventions text/+xml or application/+xml as described in XML Media Types [IETF RFC 3023], the encoding is recognized as specified in XML, otherwise [...] Processing text/xml resources according to XML would mean to process the resource as if it were application/xml which would be inconsistent with RFC 3023. Please state clearly what the actual processing requirements are and indicate clearly whether this is consistent with MIME, HTTP/1.1, and RFC 3023. Note that RFC 3023 contradicts HTTP/1.1 as described in RFC 3023. This would include to provide a more precise definition of what is considered "external encoding information". Please include a strong warning that this processing can yield in choosing the wrong encoding e.g. for many resources as inline encoding information or type specific defaults are ignored.
resolution	We think the XInclude spec is clear enough in that RFC 3023 already says *+xml are of type XML. As far as svg+xml which doesn't have a charset parameter, one just does the usual fallback for determining charset. Leave the spec as is
PEX6	From Bjoern Hoehrmann
parse="text" and Byte-Order Mark
Problem Statement	Please define what happens if UTF-8 has been determined by the last item in the list and the resource starts with U+FEFF, i.e., whether this is to be considered a byte order mark or a character.
resolution	We will add the following clarification that applies to all encodings: When the first character is interpreted as a BOM, it should be discarded. It is interpreted as a BOM in UTF-8, UTF-16, and UTF-32 encodings; it is not interpreted as a BOM in the UTF-16LE, UTF-16BE, UTF-32LE, and UTF-32BE encodings.
PEX7	From Bjoern Hoehrmann
Illegal value in encoding attribute
Problem Statement	It is not clear what happens if the encoding attribute has an illegal value, for example encoding="ISO_8859-7:1987" would be illegal as the EncName production in XML 1.0 does not include ":". This is not descibred as fatal error in the specification, thus, if the third item applies and the encoding attribute has an illegal value applications might choose to ignore the attribute and continue with the last item, or ignore that the attribute value is illegal and process the document using the ISO-8859-7 encoding. Please define processing in this case more clearly. (Note that ISO_8859-7:1987 is a legal, registered name and thus it would not be a resource error due to an unsupported encoding if the implementation supports the encoding and the registered name).
resolution	See also PEX2 and PEX3. We will change the wording for the encoding attribute in section 3.1 to change the sentence "The value of this attribute is an EncName as defined in XML specification, section 4.3.3, rule [81]." to say something like "The value of this attribute SHOULD be a valid encoding name." So an invalid encoding attribute isn't a fatal xinclude error, though it might cause a resource error at some point.
PEX8	From Bjoern Hoehrmann
encoding attribute priority
Problem Statement	http://www.w3.org/TR/2004/PR-xinclude-20040930/ in section 3.1 states: [...] encoding When parse="text", it is sometimes impossible to correctly detect the encoding of the text resource. The encoding attribute specifies how the resource is to be translated. The value of this attribute is an EncName as defined in XML specification, section 4.3.3, rule [81]. The encoding attribute has no effect when parse="xml". [...] It is not clear from the text above when the encoding attribute takes priority over other information. If using this attribute means that other encoding information such as the value of the charset parameter for text/* types is ignored, please state this explicitly in the text. If this is a fallback value when there is no other reliable information please state that explicitly.
resolution	This comment was against the PR draft. We believe we've now made this clear in section 4.3 of the Rec.
PEX9	From Bjoern Hoehrmann
encoding attribute content
Problem Statement	w.r.t. http://www.w3.org/TR/2004/PR-xinclude-20040930/ in section 3.1 : It is not clear whether the requirements for character encodings in the XML 1.0 Recommendation also apply to this attribute, for example, it is recommended there that unregistered labels use the x- prefix. Please state explicitly whether these requirements apply to this attribute aswell.
resolution	Duplicate of PEX7
PEX10	From Bjoern Hoehrmann
accept-* attributes improperly defined
Problem Statement	http://www.w3.org/TR/2004/PR-xinclude-20040930/ in section 3.1 states: [...] accept The value of the accept attribute may be used by the XInclude processor to aid in content negotiation. When the XInclude processor fetches a resource via HTTP, it should place the value of the accept attribute, if one exists, in the HTTP request as an Accept header as described in section 14.1 of [IETF RFC 2616]. Values containing characters outside the range #x20 through #x7E are disallowed in HTTP headers, and must be flagged as fatal errors. [...] The lexical space seems to be unconstrained (it does not say, for example, that the content must match the respective production rule in RFC 2616). This makes validation of the accept attribute useless and may cause illegal HTTP headers to be formed which is highly undesirable. Please change the specification to clearly define the lexical space of the attribute value. The claim in the last sentence is simply false, HTTP headers may contain characters outside that range, specifically the Accept header allows for quoted-string tokens which include TEXT tokens which are defined as TEXT = <any OCTET except CTLs, but including LWS> in RFC 2616. Please change the specification to give accurate information and possibly refine the processing requirements to take into account that such characters are allowed. The same comments apply to the definition of the "accept-language" attribute in the same section of the document.
resolution	First, the comment suggests we misstate what RFC 2616 says, so we need to investigate that. Next, we need to decide what to do about error checking the value of the accept-* attributes. We are pretty sure we want to avoid linefeeds in these attribute values, and we are pretty sure we don't want to require complete validation of these values, so we appear to be leaning toward doing some checking based on character ranges. Per 2616, section 2.2, the correct character range includes (among other things) linefeeds which we don't want to allow for security reasons. Therefore, we want to be more restrictive than required by 2616. So, we plan to change: Values containing characters outside the range #x20 through #x7E are disallowed in HTTP headers, and must be flagged as fatal errors. instead to be worded as follows: Values containing characters outside the range #x20 through #x7E must be flagged as fatal errors.
PEX11	From Bjoern Hoehrmann
Reference to RFC 2279 should be RFC 3629
Problem Statement	http://www.w3.org/TR/2004/PR-xinclude-20040930/ refers to RFC 2279 which has been obsoleted by RFC 3629; please either update the reference or include a discussion in the document why it references the obsoleted RFC.
resolution	We will update the reference from 2279 to 3629.
PEX12	From Elliotte Harold
getElementById
Problem Statement	John Cowan wrote: > That's definitely not the case: TagSoup reports id attributes on all > elements as being of type ID. So the problem is downstream of TagSoup, > and I don't know where. My guess would be that the implementation of > identity transforms does not preserve the IDness of @id. I recently discovered that almost everything to do with IDs is broken by design in DOM2. (Why does that not surprise me?) The practical impact is that it is impossible to implement a fully-conformant XInclude processor in pure DOM2. You have to use implementation-dependent methods or leave out support for XPointers that depend on ID-type attributes.
resolution	We sympathize. Unfortunately, we can't change the DOM, but at least some of these problems are said to be fixed in DOM Level 3.
PEX13	From Mike Brown
Normalize newlines when parse="text"?
Problem Statement	I have a quick question about XIncludes. When processing an xi:include element with parse="text", must newlines in the included document be normalized to LF? I was having trouble finding any definitive info on this in the XInclude, Infoset and Character Model specs.
resolution	The question is "When processing an xi:include element with parse="text", must newlines in the included document be normalized to LF?" Our answer is "no". Such normalization would be part of XML processing, not text processing.
PEX14	From Elliotte Harold
What if encoding is not an EncName
Problem Statement	ERH asks "what should I do if the encoding attribute does not satisfy [XML 1.0's EncName] production [81] EncName?
resolution	Duplicate of PEX3 and PEX7.
PEX15	From Elliotte Harold
XPointers with percent escapes: what type of error
Problem Statement	ERH says: Since the xpointer attribute is not a URI reference, %-escaping must not appear in the XPointer.... [He suggests clarifying that such would be] a resource error rather than a fatal error.
resolution	We simply need to insert a sentence like "%-escaping is not done in XPointers, so '%' is simply an ordinary character in the value of the xpointer attribute."
PEX16	From Henry S. Thompson
XInclude, schema validity-assessment, xml:base and xml:lang
Problem Statement	XInclude requires xml:base fixup with adds xml:base attributes to a document. This causes problems validating the result against the original schema if that schema doesn't mention xml:base. Norm wants the XML Schema group to have a mode that says "just assume all xml:* attributes are okay". Henry points out we even have problems with validation against DTDs in this case. It was suggested that we add to the XInclude spec: "An XInclude processor may, at user option, suppress xml:base and/or xml:lang fixup." Note, since this is "at user option" [see the XML spec for the defn of "at user option"], all XInclude processors MUST support xml:base and xml:lang fixup, but they MAY provide a user-specifiable option to suppress such fixup.
resolution	Add to the XInclude specification "An XInclude processor may, at user option, suppress xml:base and/or xml:lang fixup."
PEX17	From François Yergeau
Rewording for IRI reference attributes
Problem Statement	Section 4.1.1 had a definition for an IRI reference embedded in the specification since the RFC defining IRIs was a work in progress. This mail suggest a rewording of that section: ------------- The value of this attribute is an XML resource identifer as defined in [XML 1.1], which is interpreted as an IRI Reference as defined in RFC 3987 [IETF RFC 3987], after the escaping procedure described in [XML 1.1] is applied. If necessary for the implementation, the value may be further converted to a URI reference as described in [XML 1.1]. -------------
resolution	Accept this as an erratum pointing to this wording in the new editions of XML. Update the section with the new wording in the upcoming second edition version