Bugzilla – Bug 8206
[Ser] Serialization requires no escaping of < in URI attribute with XHTML
Last modified: 2010-06-29 13:53:39 UTC
A colleague pointed out this problem in the "Character Expansion" step of the phases of serialization. Suppose the output method is XHTML and the escape-uri-attributes serialization parameter has the value "yes". For any URI attribute, step 3a. requires URI escaping to be applied and that steps 3b. through 3e. be skipped.
The URI escaping is described in three steps: i) Unicode normalization; ii) percent encoding as described for fn:escape-html-uri; and iii) escaping "according to HTML rules any characters (such as < and &) where HTML requires escaping. For example, replace < with <."
For other attributes, step 3e. would cause a less than to be replaced with < or an equivalent character reference.
It's not clear which HTML rules apply here - those of the various HTML recommendations, those of the HTML output method or both. If this was a reference to the rules of the HTML output method, alone or together with the requirements of the relevant HTML recommendation, it must be noted that section 7.2 of serialization actually prohibits a less than character from being escaped. It states, "The HTML output method MUST NOT escape "<" characters occurring in attribute values."
I've reduced the priority/severity to mark this as an editorial issue. I don't believe there's any doubt about the intent - that the XHTML output method should be able to escape less than characters that appear in any attribute value.
I propose the following change:
. In Section 4, bullet 3.a.iii, change "escape according to HTML rules" to "escape according to XML or HTML rules, as determined by the applicable output method, "
At its call of 2009-11-12, the XSL WG observed that the response proposed in comment #1 did not address the question of whether "HTML rules" referred to the rules of the HTML output method or the rules of one of the HTML Recommendations. The consensus was that this referred to the rules of the output method. The following changes were proposed and accepted during the call:
. In Section 4, bullet 3.a.iii, change "escape according to HTML rules any characters (such as < and &) where HTML requires escaping" to
"escape according to the rules of the XML or HTML output method, whichever is applicable, any characters that require escaping"
. In Section 4, bullet 3.e, change "escape according to XML or HTML rules" to
"escape according to the rules of the XML or HTML output method, whichever is applicable,"
Ratification of this decision by the XQuery Working Group is pending.
As this bug is editorial, I will go ahead and make this Serialization erratum SE.E18, with the edits as described in comment #2. I will apply the changes in the next revision of the Serialization 1.0 errata document as well as to the working draft of Serialization 1.1.