Errata for XSLT 2.0 and XQuery 1.0 Serialization

10 April 2009

Latest version:
http://www.w3.org/XML/2007/qt-errata/xslt-xquery-serialization-errata.html
Editors:
Scott Boag, IBM Watson Research Center http://www.watson.ibm.com/
Henry Zongaro, IBM Toronto Software Lab http://www.ibm.com/software/ca/en/torontolab/

Abstract

This document addresses errors in the XSLT 2.0 and XQuery 1.0 Serialization Recommendation published on 23 January 2007. It records all errors that, at the time of this document's publication, have solutions that have been approved by the XSL Working Group and the XML Query Working Group. For updates see the latest version of that document.

The errata are numbered, and are listed in reverse chronological order of their date of origin. Each erratum is classified as Substantive, Editorial, or Markup. These categories are defined as follows:

Each entry contains the following information:

Colored boxes and shading are used to help distinguish new text from old, however these visual clues are not essential to an understanding of the change. The styling of old and new text is an approximation to its appearance in the published Recommendation, but is not normative. Hyperlinks are shown underlined in the erratum text, but the links are not live.

A number of indexes appear at the end of the document.

Substantive corrections are proposed by the XSL Working Group and the XML Query Working Group (part of the XML Activity), where there is consensus that they are appropriate; they are not to be considered normative until approved by a Call for Review of Proposed Corrections or a Call for Review of an Edited Recommendation.

Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string [SEerrata] in the subject line of your report, whether made in Bugzilla or in email. Each Bugzilla entry and email message should contain only one error report. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.

Status of this Document

This is a public draft. None of the errata reported in this document have been approved by a Call for Review of Proposed Corrections or a Call for Review of an Edited Recommendation. As a consequence, they must not be considered to be normative.

The Working Group does not intend to progress these errata to normative status; instead, it intends to publish a second edition of the Recommendation incorporating these errata, and to progress the second edition to normative status.

Table of Contents

  Errata

     SE.E11   This erratum makes clear which parts of the recommendation are not considered to be normative.

     SE.E10   This erratum specifies the syntactic constraints on the values of the doctype-public and doctype-system serialization parameters.

     SE.E9   This erratum makes previously non-normative text that describes how the xhtml and html output methods must behave if the indent parameter has the value yes into normative text.

     SE.E8   This erratum ensures that Unicode normalization applies to all characters that might be adjacent in the serialized result produced by the text output method, including those that are in text nodes that are separated by element nodes in the data model instance.

     SE.E7   This erratum clarifies how elements with empty content models are to be serialized under the HTML and XHTML output methods.

     SE.E6   This erratum ensures that the sequence normalization process preserves any type annotations associated with nodes in the input sequence.

     SE.E5   This erratum aligns the description of the effect of the include-content-type serialization parameter of the HTML output method with that of the XHTML output method.

     SE.E4   This erratum clarifies how descendant elements of an XML island must be serialized according to the HTML output method.

     SE.E3   This erratum corrects an editorial error concerning the currently registered XHTML media types.

     SE.E2   This erratum corrects an editorial error concerning the number of phases of serialization.

     SE.E1   This erratum places constraints on the type of string that is valid for the doctype-public attribute of xsl:output.

  Indexes

    Index by affected section

    Index by Bugzilla entry


SE.E11 - editorial

See Bug 6376

Description

This erratum makes clear which parts of the recommendation are not considered to be normative.

History

6 Jan 2009: Proposed

29 Jan 2008: Proposed

5 Feb 2009: Accepted

Change

In 1.1 Terminology (second paragraph):

Insert after the text:

[Definition] As is indicated in 10 Conformance, conformance criteria for serialization are determined by other specifications that refer to this specification. A serializer is software that implements some or all of the requirements of this specification in accordance with such conformance criteria. A serializer is not REQUIRED to directly provide a programming interface that permits a user to set serialization parameters or to provide an input sequence for serialization.

The following:

In this document, material labeled as "Note" and examples are provided for explanatory purposes and are not normative.

SE.E10 - substantive

See Bug 6466

Description

This erratum specifies the syntactic constraints on the values of the doctype-public and doctype-system serialization parameters.

History

29 Jan 2009: Proposed

5 Feb 2009: Proposed

10 Feb 2009: Accepted

Changes

  1. In 3 Serialization Parameters (first table, first table body, third row, second column):

    Replace the text:

    A string of Unicode characters. This parameter may be absent.

    With:

    A string of PubidCharXML characters. This parameter may be absent.
  2. In 3 Serialization Parameters (first table, first table body, fourth row, second column):

    Replace the text:

    A string of Unicode characters. This parameter may be absent.

    With:

    A string of Unicode characters that does not include both an apostrophe (#x27) and a quotation mark (#x22) character. This parameter may be absent.

SE.E9 - substantive

See Bug 5993

Description

This erratum makes previously non-normative text that describes how the xhtml and html output methods must behave if the indent parameter has the value yes into normative text.

History

2 Oct 2008: Proposed

10 Feb 2009: Accepted

Changes

  1. In 6.1.3 XHTML Output Method: the indent Parameter (first paragraph):

    Replace the text:

    If the indent parameter has the value yes, the serializer MAY add or remove whitespace as it serializes the result tree, so long as it does not change the way that a conforming HTML user agent would render the output.

    With:

    If the indent parameter has the value yes, the serializer MAY add or remove whitespace as it serializes the result tree, if it observes the following constraints.

    • Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.

    • Whitespace MUST NOT be added or removed adjacent to an inline element. The inline elements are those elements in the XHTML namespace in the %inline category of any of the XHTML 1.0 DTD's, in the %inline.class category of the XHTML 1.1 DTD, and elements in the XHTML namespace with local names ins and del if they are used as inline elements (i.e., if they do not contain element children).

    • Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being those in the XHTML namespace with local names pre, script, style, and textarea.

  2. In 6.1.3 XHTML Output Method: the indent Parameter (starting at first note, first paragraph):

    Replace the text:

    This rule can be satisfied by observing the following constraints:

    • Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.

    • Whitespace MUST NOT be added or removed adjacent to an inline element. The inline elements are those elements in the XHTML namespace in the %inline category of any of the XHTML 1.0 DTD's, in the %inline.class category of the XHTML 1.1 DTD, and elements in the XHTML namespace with local names ins and del if they are used as inline elements (i.e., if they do not contain element children).

    • Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being those in the XHTML namespace with local names pre, script, style, and textarea.

    With:

    The effect of the above constraints is to ensure any insertion or deletion of whitespace would not affect how a conforming HTML user agent would render the output, assuming the serialized document does not refer to any HTML style sheets.

  3. In 7.4.3 HTML Output Method: the indent Parameter (first paragraph):

    Replace the text:

    If the indent parameter has the value yes, then the HTML output method MAY add or remove whitespace as it serializes the result tree, so long as it does not change the way that a conforming HTML user agent would render the output.

    With:

    If the indent parameter has the value yes, then the HTML output method MAY add or remove whitespace as it serializes the result tree, if it observes the following constraints.

    • Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.

    • Whitespace MUST NOT be added or removed adjacent to an inline element. The inline elements are those included in the %inline category of any of the HTML 4.01 DTD's, as well as the ins and del elements if they are used as inline elements (i.e., if they do not contain element children).

    • Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being pre, script, style, and textarea.

  4. In 7.4.3 HTML Output Method: the indent Parameter (starting at first note, first paragraph):

    Replace the text:

    This rule can be satisfied by observing the following constraints:

    Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.

    Whitespace MUST NOT be added or removed adjacent to an inline element. The inline elements are those included in the %inline category of any of the HTML 4.01 DTD's, as well as the ins and del elements if they are used as inline elements (i.e., if they do not contain element children).

    Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being pre, script, style, and textarea.

    With:

    The effect of the above constraints is to ensure any insertion or deletion of whitespace would not affect how a conforming HTML user agent would render the output, assuming the serialized document does not refer to any HTML style sheets.

SE.E8 - substantive

See Bug 5441

Description

This erratum ensures that Unicode normalization applies to all characters that might be adjacent in the serialized result produced by the text output method, including those that are in text nodes that are separated by element nodes in the data model instance.

History

13 Mar 2008: Proposed

10 Feb 2009: Accepted

Changes

  1. In 4 Phases of Serialization (first numbered list, second item, second paragraph):

    Replace the text:

    In the cases of the XML and XHTML output methods, this phase also produces the following:

    • the XML or text declaration; and

    • empty element tags (except for the attribute values);

    In the case of the text output method, this phase has no effect.

    With:

    In the cases of the XML and XHTML output methods, this phase also produces the following:

    • the XML or text declaration; and

    • empty element tags (except for the attribute values);

    In the case of the text output method, this phase replaces the single document node produced by sequence normalization with a new document node that has exactly one child, which is a text node. The string value of the new text node is the string value of the document node that was produced by sequence normalization.

  2. In 8 Text Output Method (first paragraph):

    Replace the text:

    The Text output method serializes the instance of the data model by outputting the string value of the document node created by sequence normalization, without any escaping.

    With:

    The Text output method serializes the instance of the data model by outputting the string value of the document node created by the markup generation step of the phases of serialization without any escaping.

SE.E7 - substantive

See Bug 5300

Description

This erratum clarifies how elements with empty content models are to be serialized under the HTML and XHTML output methods.

History

3 Dec 2007: Proposed

3 Dec 2007: Proposed

18 Mar 2008: Accepted

Changes

  1. In 6 XHTML Output Method (first bulleted list, second item):

    Replace the text:

    Given an XHTML element whose content model is EMPTY, the serializer MUST use the minimized tag syntax, for example <br />, as the alternative syntax <br></br> allowed by XML gives uncertain results in many existing user agents. The serializer MUST include a space before the trailing />, e.g. <br />, <hr /> and <img src="karen.jpg"  alt="Karen" />.

    With:

    If an element that has no children is an XHTML element with an EMPTY content model, the serializer MUST use the minimized tag syntax, for example <br />, as the alternative syntax <br></br> allowed by XML gives uncertain results in many existing user agents. The serializer MUST include a space before the trailing />, e.g. <br />, <hr /> and <img src="karen.jpg" alt="Karen" />.

  2. In 7.1 Markup for Elements (second paragraph):

    Replace the text:

    The HTML output method MUST NOT output an end-tag for empty elements. For HTML 4.0, the empty elements are area, base, basefont, br, col, frame, hr, img, input, isindex, link, meta and param. For example, an element written as <br/> or <br></br> in an XSLT stylesheet MUST be output as <br>.

    With:

    The HTML output method MUST NOT output an end-tag for an empty element if the element type has an empty content model. For HTML 4.0, the element types that have an empty content model are area, base, basefont, br, col, frame, hr, img, input, isindex, link, meta and param. For example, an element written as <br/> or <br></br> in an XSLT stylesheet MUST be output as <br>.

SE.E6 - substantive

See Bug 5458

Description

This erratum ensures that the sequence normalization process preserves any type annotations associated with nodes in the input sequence.

History

11 Mar 2008: Proposed

18 Mar 2008: Accepted

Changes

  1. In 2 Sequence Normalization (third paragraph):

    Replace the text:

    Where the process of converting the input sequence to a normalized sequence indicates that a value MUST be cast to xs:string, that operation is defined in Section 17.1.2 Casting to xs:string and xs:untypedAtomicFO of [XQuery 1.0 and XPath 2.0 Functions and Operators] . The steps in computing the normalized sequence are:

    With:

    Where the process of converting the input sequence to a normalized sequence indicates that a value MUST be cast to xs:string, that operation is defined in Section 17.1.2 Casting to xs:string and xs:untypedAtomicFO of [XQuery 1.0 and XPath 2.0 Functions and Operators] . Where a step in the sequence normalization process indicates that a node should be copied, the copy is performed in the same way as an XSLT xsl:copy-of instruction that has a validation attribute whose value is preserve and has a select attribute whose effective value is the node, as described in Section 11.9.2 Deep CopyXT of [XSL Transformations (XSLT) Version 2.0] , or equivalently in the same way as an XQuery content expression as described in Step 1e of Section 3.7.1.3 ContentXQ of [XQuery 1.0: An XML Query Language] , where the construction mode is preserve. The steps in computing the normalized sequence are:

  2. In 2 Sequence Normalization (first note, first code section):

    Replace the text:

    <xsl:document>
      <xsl:copy-of select="$seq"/>
    </xsl:document>

    With:

    <xsl:document>
      <xsl:copy-of select="$seq" validation="preserve"/>
    </xsl:document>
  3. In 2 Sequence Normalization (first note, second code section):

    Replace the text:

    document {
      for $s in $seq return
        if ($s instance of document-node())
        then $s/child::node()
        else $s
    }

    With:

    declare construction preserve;
    
    document {
      for $s in $seq return
        if ($s instance of document-node())
        then $s/child::node()
        else $s
    }
  4. In 5.1.3 XML Output Method: the indent Parameter (first bulleted list, fourth item):

    Replace the text:

    Whitespace characters SHOULD NOT be added in places where the characters would constitute significant whitespace, for example, in the content of an element whose content model is known to be mixed.

    With:

    Whitespace characters SHOULD NOT be added in places where the characters would constitute significant whitespace, for example, in the content of an element that is annotated with a type other than xs:untyped or xs:anyType, and whose content model is known to be mixed.

SE.E5 - substantive

See Bug 5439

Description

This erratum aligns the description of the effect of the include-content-type serialization parameter of the HTML output method with that of the XHTML output method.

History

4 Feb 2008: Proposed

18 Mar 2008: Accepted

Change

In 7.4.13 HTML Output Method: the include-content-type Parameter (third paragraph):

Replace the text:

If a meta element has been added to the head element as described above, then any existing meta element child of the head element having an http-equiv attribute with the value "Content-Type" MUST be discarded.

With:

If a meta element has been added to the head element as described above, then any existing meta element child of the head element having an http-equiv attribute with the value "Content-Type", making the comparison without consideration of case and leading or trailing spaces, MUST be discarded.

SE.E4 - substantive

See Bug 5433

Description

This erratum clarifies how descendant elements of an XML island must be serialized according to the HTML output method.

History

4 Feb 2008: Proposed

18 Mar 2008: Accepted

Change

In 7.1 Markup for Elements (first numbered list, fifth item):

Insert after the text:

When serializing an element whose name is in a non-null namespace, the HTML output method MUST apply the same rules (for example, indentation rules) as when serializing a div element. The descendants of such an element MUST be serialized as if they were descendants of a div element.

The following:

, except for the influence of the cdata-section-elements serialization parameter on any text node children of the element.

SE.E3 - editorial

See Bug 5066

Description

This erratum corrects an editorial error concerning the currently registered XHTML media types.

History

24 Oct 2007: Proposed

20 Nov 2007: Accepted

Change

In 6.1.13 XHTML Output Method: the include-content-type Parameter (first note, second code):

Replace the text:

application/xhtml-xml

With:

application/xhtml+xml

SE.E2 - editorial

See Bug 4557

Description

This erratum corrects an editorial error concerning the number of phases of serialization.

History

15 May 2007: Proposed

20 Nov 2007: Accepted

Change

In 4 Phases of Serialization (first paragraph):

Replace the text:

Serialization comprises three phases of processing (preceded optionally by the sequence normalization process described in 2 Sequence Normalization).

With:

Serialization comprises five phases of processing (preceded optionally by the sequence normalization process described in 2 Sequence Normalization).

SE.E1 - substantive

Superseded by Erratum SE.E10

See Bug 4372

Description

This erratum places constraints on the type of string that is valid for the doctype-public attribute of xsl:output.

History

15 May 2007: Proposed

15 Nov 2007: Amended

20 Nov 2007: Accepted

Change

In 3 Serialization Parameters (first table, first table body, third row, second column):

Insert after the text:

A string of Unicode characters. This parameter may be absent.

The following:

It is an error if doctype-public does not conform to the syntax of PubidLiteralXML.

Index by affected section

1.1 Terminology

SE.E11

2 Sequence Normalization

SE.E6

3 Serialization Parameters

SE.E1 SE.E10

4 Phases of Serialization

SE.E2 SE.E8

5.1.3 XML Output Method: the indent Parameter

SE.E6

6 XHTML Output Method

SE.E7

6.1.3 XHTML Output Method: the indent Parameter

SE.E9

6.1.13 XHTML Output Method: the include-content-type Parameter

SE.E3

7.1 Markup for Elements

SE.E4 SE.E7

7.4.3 HTML Output Method: the indent Parameter

SE.E9

7.4.13 HTML Output Method: the include-content-type Parameter

SE.E5

8 Text Output Method

SE.E8

Index by Bugzilla entry

Bug #4372: SE.E1

Bug #4557: SE.E2

Bug #5066: SE.E3

Bug #5300: SE.E7

Bug #5433: SE.E4

Bug #5439: SE.E5

Bug #5441: SE.E8

Bug #5458: SE.E6

Bug #5993: SE.E9

Bug #6376: SE.E11

Bug #6466: SE.E10