W3C

XSLT and XQuery Serialization 3.0

W3C Working Draft 14 June 2011

This version:
http://www.w3.org/TR/2011/WD-xslt-xquery-serialization-30-20110614/
Latest version:
http://www.w3.org/TR/xslt-xquery-serialization-30/
Previous versions:
http://www.w3.org/TR/2010/WD-xslt-xquery-serialization-30-20101214/, http://www.w3.org/TR/2009/WD-xslt-xquery-serialization-11-20091215/
Editor:
Henry Zongaro, IBM Canada Lab - Toronto Site <http://www.ibm.com/software/ca/en/canadalabs/toronto_lab.html>

See also translations.

This document is also available in these non-normative formats: XML and Change markings relative to previous Working Draft.


Abstract

This document defines serialization of an instance of the data model as defined in [XQuery and XPath Data Model (XDM) 3.0] into a sequence of octets. Serialization is designed to be a component that can be used by other specifications such as [XSL Transformations (XSLT) Version 3.0] or [XQuery 3.0: An XML Query Language].

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is one document in a set of seven documents that are being progressed to Recommendation together (XQuery 3.0, XQueryX 3.0, XSLT 3.0, Data Model 3.0, Functions and Operators 3.0, Serialization 3.0, XPath 3.0).

This is a Working Draft as described in the Process Document. It was jointly developed by the W3C XSL Working Group and the W3C XML Query Working Group, each of which is part of the XML Activity. The Working Groups expect to advance this specification to Recommendation Status.

This public Working Draft makes a number of substantive technical changes (as well as many editorial changes), including new features, adopted since the previous Working Draft was published. Please note that this Working Draft of XSLT and XQuery Serialization 3.0 represents the second version of a previous W3C Recommendation.

This specification is designed to be referenced normatively from other specifications defining a host language for it; it is not intended to be implemented outside a host language. The implementability of this specification has been tested in the context of its normative inclusion in host languages defined by the XQuery 3.0 and XSLT 3.0 (expected in July/August 2011) specifications; see the XQuery 3.0 implementation report (and, in the future, the WGs expect that there will also be a member-only XSLT 3.0 implementation report) for details.

This document incorporates changes made against the previous publication of the Working Draft. Changes to this document since the previous publication of the Working Draft are detailed in F Revision Log.

Please report errors in this document using W3C's public Bugzilla system (instructions can be found at http://www.w3.org/XML/2005/04/qt-bugzilla). If access to that system is not feasible, you may send your comments to the W3C XSLT/XPath/XQuery public comments mailing list, public-qt-comments@w3.org. It will be very helpful if you include the string “[SER30]” in the subject line of your report, whether made in Bugzilla or in email. Please use multiple Bugzilla entries (or, if necessary, multiple email messages) if you have more than one comment to make. Archives of the comments and responses are available at http://lists.w3.org/Archives/Public/public-qt-comments/.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by groups operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the XML Query Working Group and also maintains a public list of any patent disclosures made in connection with the deliverables of the XSL Working Group; those pages also include instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1 Introduction
    1.1 Terminology
2 Sequence Normalization
3 Serialization Parameters
    3.1 Setting Serialization Parameters by Means of a Data Model Instance
4 Phases of Serialization
5 XML Output Method
    5.1 The Influence of Serialization Parameters upon the XML Output Method
        5.1.1 XML Output Method: the version Parameter
        5.1.2 XML Output Method: the encoding Parameter
        5.1.3 XML Output Method: the indent and suppress-indentation Parameters
        5.1.4 XML Output Method: the cdata-section-elements Parameter
        5.1.5 XML Output Method: the omit-xml-declaration and standalone Parameters
        5.1.6 XML Output Method: the doctype-system and doctype-public Parameters
        5.1.7 XML Output Method: the undeclare-prefixes Parameter
        5.1.8 XML Output Method: the normalization-form Parameter
        5.1.9 XML Output Method: the media-type Parameter
        5.1.10 XML Output Method: the use-character-maps Parameter
        5.1.11 XML Output Method: the byte-order-mark Parameter
        5.1.12 XML Output Method: the escape-uri-attributes Parameter
        5.1.13 XML Output Method: the include-content-type Parameter
6 XHTML Output Method
    6.1 The Influence of Serialization Parameters upon the XHTML Output Method
        6.1.1 XHTML Output Method: the version Parameter
        6.1.2 XHTML Output Method: the encoding Parameter
        6.1.3 XHTML Output Method: the indent and suppress-indentation Parameters
        6.1.4 XHTML Output Method: the cdata-section-elements Parameter
        6.1.5 XHTML Output Method: the omit-xml-declaration and standalone Parameters
        6.1.6 XHTML Output Method: the doctype-system and doctype-public Parameters
        6.1.7 XHTML Output Method: the undeclare-prefixes Parameter
        6.1.8 XHTML Output Method: the normalization-form Parameter
        6.1.9 XHTML Output Method: the media-type Parameter
        6.1.10 XHTML Output Method: the use-character-maps Parameter
        6.1.11 XHTML Output Method: the byte-order-mark Parameter
        6.1.12 XHTML Output Method: the escape-uri-attributes Parameter
        6.1.13 XHTML Output Method: the include-content-type Parameter
7 HTML Output Method
    7.1 Markup for Elements
    7.2 Writing Attributes
    7.3 Writing Character Data
    7.4 The Influence of Serialization Parameters upon the HTML Output Method
        7.4.1 HTML Output Method: the version Parameter
        7.4.2 HTML Output Method: the encoding Parameter
        7.4.3 HTML Output Method: the indent and suppress-indentation Parameters
        7.4.4 HTML Output Method: the cdata-section-elements Parameter
        7.4.5 HTML Output Method: the omit-xml-declaration and standalone Parameters
        7.4.6 HTML Output Method: the doctype-system and doctype-public Parameters
        7.4.7 HTML Output Method: the undeclare-prefixes Parameter
        7.4.8 HTML Output Method: the normalization-form Parameter
        7.4.9 HTML Output Method: the media-type Parameter
        7.4.10 HTML Output Method: the use-character-maps Parameter
        7.4.11 HTML Output Method: the byte-order-mark Parameter
        7.4.12 HTML Output Method: the escape-uri-attributes Parameter
        7.4.13 HTML Output Method: the include-content-type Parameter
8 Text Output Method
    8.1 The Influence of Serialization Parameters upon the Text Output Method
        8.1.1 Text Output Method: the version Parameter
        8.1.2 Text Output Method: the encoding Parameter
        8.1.3 Text Output Method: the indent and suppress-indentation Parameters
        8.1.4 Text Output Method: the cdata-section-elements Parameter
        8.1.5 Text Output Method: the omit-xml-declaration and standalone Parameters
        8.1.6 Text Output Method: the doctype-system and doctype-public Parameters
        8.1.7 Text Output Method: the undeclare-prefixes Parameter
        8.1.8 Text Output Method: the normalization-form Parameter
        8.1.9 Text Output Method: the media-type Parameter
        8.1.10 Text Output Method: the use-character-maps Parameter
        8.1.11 Text Output Method: the byte-order-mark Parameter
        8.1.12 Text Output Method: the escape-uri-attributes Parameter
        8.1.13 Text Output Method: the include-content-type Parameter
9 Character Maps
10 Conformance

Appendices

A References
    A.1 Normative References
    A.2 Informative References
B Schema for Serialization Parameters
C Summary of Error Conditions
D List of URI Attributes
E Checklist of Implementation-Defined Features (Non-Normative)
F Revision Log (Non-Normative)
    F.1 Changes since XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)
        F.1.1 Changes applied for the first Public Working Draft
        F.1.2 Changes applied for the second Public Working Draft
        F.1.3 Changes applied for the third Public Working Draft
    F.2 Changes incorporated in the Second Edition


1 Introduction

This document defines serialization of the W3C XQuery and XPath Data Model 3.0 (XDM), which is the data model of at least [XML Path Language (XPath) 3.0], [XSL Transformations (XSLT) Version 3.0], and [XQuery 3.0: An XML Query Language], and any other specifications that reference it.

In this document, examples and material labeled as "Note" are provided for explanatory purposes and are not normative.

Serialization is the process of converting an instance of the [XQuery and XPath Data Model (XDM) 3.0] into a sequence of octets. Serialization is well-defined for most data model instances.

1.1 Terminology

In this specification, where they appear in upper case, the words "MUST", "MUST NOT", "SHOULD", "SHOULD NOT", "MAY", "REQUIRED", and "RECOMMENDED" are to be interpreted as described in [RFC2119].

[Definition: As is indicated in 10 Conformance, conformance criteria for serialization are determined by other specifications that refer to this specification. A serializer is software that implements some or all of the requirements of this specification in accordance with such conformance criteria.] A serializer is not REQUIRED to directly provide a programming interface that permits a user to set serialization parameters or to provide an input sequence for serialization. In this document, material labeled as "Note" and examples are provided for explanatory purposes and are not normative.

Certain aspects of serialization are described in this specification as implementation-defined or implementation-dependent.

[Definition: Implementation-defined indicates an aspect that MAY differ between serializers, but whose actual behavior MUST be specified either by another specification that sets conformance criteria for serialization (see 10 Conformance) or in documentation that accompanies the serializer.]

[Definition: Implementation-dependent indicates an aspect that MAY differ between serializers, and whose actual behavior is not REQUIRED to be specified either by another specification that sets conformance criteria for serialization (see 10 Conformance) or in documentation that accompanies the serializer.]

[Definition: In some instances, the sequence that is input to serialization cannot be successfully converted into a sequence of octets given the set of serialization parameter (3 Serialization Parameters) values specified. A serialization error is said to occur in such an instance.] In some cases, a serializer is REQUIRED to signal such an error. What it means to signal a serialization error is determined by the relevant conformance criteria (10 Conformance) to which the serializer conforms. In other cases, there is an implementation-defined choice between signaling a serialization error and performing a recovery action. Such a recovery action will allow a serializer to produce a sequence of octets that might not fully reflect the usual requirements of the parameter settings that are in effect.

[Definition: Where this specification indicates that two strings are to be compared without regard to case, the serializer MUST translate any characters in the range #x41 (LATIN CAPITAL LETTER A) to #x5A (LATIN CAPITAL LETTER Z), inclusive, to the corresponding lower-case letters in the range #x61 (LATIN SMALL LETTER A) to #x7A (LATIN SMALL LETTER Z) only for the purposes of making the comparison. The comparison succeeds if the two strings are the same length and the code point of each character in the first string is equal to the code point of the character in the corresponding position in the second string.]

Many terms used in this document are defined in the XPath specification [XML Path Language (XPath) 3.0] or the Data Model specification [XQuery and XPath Data Model (XDM) 3.0]. Particular attention is drawn to the following:

2 Sequence Normalization

An instance of the data model that is input to the serialization process is a sequence. Prior to serializing a sequence using any of the output methods whose behavior is specified by this document (3 Serialization Parameters), the serializer MUST first compute a normalized sequence for serialization; it is the normalized sequence that is actually serialized. [Definition: The purpose of sequence normalization is to create a sequence that can be serialized as a well-formed XML document or external general parsed entity, that also reflects the content of the input sequence to the extent possible.] [Definition: The result of the sequence normalization process is a result tree.]

The normalized sequence for serialization is constructed by applying all of the following rules in order, with the initial sequence being input to the first step, and the sequence that results from any step being used as input to the subsequent step. For any implementation-defined output method, it is implementation-defined whether this sequence normalization process takes place.

Where the process of converting the input sequence to a normalized sequence indicates that a value MUST be cast to xs:string, that operation is defined in Section 18.1.2 Casting to xs:string and xs:untypedAtomic FO30 of [XQuery and XPath Functions and Operators 3.0]. Where a step in the sequence normalization process indicates that a node should be copied, the copy is performed in the same way as an XSLT xsl:copy-of instruction that has a validation attribute whose value is preserve and has a select attribute whose effective value is the node, as described in Section 11.9.2 Deep CopyXT30 of [XSL Transformations (XSLT) Version 3.0], or equivalently in the same way as an XQuery content expression as described in Step 1e of Section 3.8.1.3 Content XQ30 of [XQuery 3.0: An XML Query Language], where the construction mode is preserve. The steps in computing the normalized sequence are:

  1. If the sequence that is input to serialization is empty, create a sequence S1 that consists of a zero-length string. Otherwise, copy each item in the sequence that is input to serialization to create the new sequence S1.

  2. For each item in S1, if the item is atomic, obtain the lexical representation of the item by casting it to an xs:string and copy the string representation to the new sequence; otherwise, copy the item, which will be a node, to the new sequence. The new sequence is S2.

  3. For each subsequence of adjacent strings in S2, copy a single string to the new sequence equal to the values of the strings in the subsequence concatenated in order, each separated by a single space. Copy all other items to the new sequence. The new sequence is S3.

  4. For each item in S3, if the item is a string, create a text node in the new sequence whose string value is equal to the string; otherwise, copy the item to the new sequence. The new sequence is S4.

  5. For each item in S4, if the item is a document node, copy its children to the new sequence; otherwise, copy the item to the new sequence. The new sequence is S5.

  6. For each subsequence of adjacent text nodes in S5, copy a single text node to the new sequence equal to the values of the text nodes in the subsequence concatenated in order. Any text nodes with values of zero length are dropped. Copy all other items to the new sequence. The new sequence is S6.

  7. It is a serialization error [err:SENR0001] if an item in S6 is an attribute node, a namespace node or a function item. Otherwise, construct a new sequence, S7, that consists of a single document node and copy all the items in the sequence, which are all nodes, as children of that document node.

S7 is the normalized sequence.

The result tree rooted at the document node that is created by the final step of this sequence normalization process is the instance of the data model to which the rules of the appropriate output method are applied. If the sequence normalization process results in a serialization error, the serializer MUST signal the error.

Note:

The sequence normalization process for a sequence $seq is equivalent to constructing a document node using the XSLT instruction:

<xsl:document>
  <xsl:copy-of select="$seq" validation="preserve"/>
</xsl:document>

or the XQuery expression:

declare construction preserve;

document {
  for $s in $seq return
    if ($s instance of document-node())
    then $s/child::node()
    else $s
}

This process results in a serialization error [err:SENR0001] if sequences contain parentless attribute and/or namespace nodes.

3 Serialization Parameters

There are a number of parameters that influence how serialization is performed. Host languages MAY allow users to specify any or all of these parameters, but they are not REQUIRED to be able to do so. However, the host language specification MUST specify how the value of all applicable parameters is to be determined.

It is a serialization error [err:SEPM0016] if a parameter value is invalid for the given parameter. It is the responsibility of the host language to specify how invalid values should be handled at the level of that language.

The following serialization parameters are defined:

Serialization parameter name Permitted values for parameter
byte-order-mark One of the enumerated values yes or no. This parameter indicates whether the serialized sequence of octets is to be preceded by a Byte Order Mark. (See Section 5.1 of [Unicode Encoding].) The actual octet order used is implementation-dependent. If the encoding defines no Byte Order Mark, or if the Byte Order Mark is prohibited for the specific Unicode encoding or implementation environment, then this parameter is ignored.
cdata-section-elements A list of expanded QNames, possibly empty.
doctype-public A string of PubidCharXML characters. This parameter may be absent.
doctype-system A string of Unicode characters that does not include both an apostrophe (#x27) and a quotation mark (#x22) character. This parameter may be absent.
encoding A string of Unicode characters in the range #x21 to #x7E (that is, printable ASCII characters); the value SHOULD be a charset registered with the Internet Assigned Numbers Authority [IANA], [RFC2278] or begin with the characters x- or X-.
escape-uri-attributes One of the enumerated values yes or no.
include-content-type One of the enumerated values yes or no.
indent One of the enumerated values yes or no.
media-type A string of Unicode characters specifying the media type (MIME content type) [RFC2046]; the charset parameter of the media type MUST NOT be specified explicitly in the value of the media-type parameter. If the destination of the serialized output is annotated with a media type, this parameter MAY be used to provide such an annotation. For example, it MAY be used to set the media type in an HTTP header.
method An expanded QName with a null namespace URI, and the local part of the name equal to one of xml, xhtml, html or text, or having a non-null namespace URI. If the namespace URI is non-null, the parameter specifies an implementation-defined output method.
normalization-form One of the enumerated values NFC, NFD, NFKC, NFKD, fully-normalized, none or an implementation-defined value.
omit-xml-declaration One of the enumerated values yes or no.
standalone One of the enumerated values yes, no or omit.
suppress-indentation A list of expanded QNames, possibly empty.
undeclare-prefixes One of the enumerated values yes or no.
use-character-maps A list of pairs, possibly empty, with each pair consisting of a single Unicode character and a string of Unicode characters.
version A string of Unicode characters.

The value of the method parameter is an expanded QName. If the value has a null namespace URI, then the local name identifies a method specified in this document and MUST be one of xml, html, xhtml, or text; in this case, the output method specified MUST be used for serializing. If the namespace URI is non-null, then it identifies an implementation-defined output method; the behavior in this case is not specified by this document.

In those cases where they have no important effect on the content of the serialized result, details of the output methods defined by this specification are left unspecified and are regarded as implementation-dependent. Whether a serializer uses apostrophes or quotation marks to delimit attribute values in the XML output method is an example of such a detail.

The detailed semantics of each parameter will be described separately for each output method for which it is applicable. If the semantics of a parameter are not described for an output method, then it is not applicable to that output method.

Implementations MAY define additional serialization parameters, and MAY allow users to do so. For this purpose, the name of a serialization parameter is considered to be a QName; the parameters listed above are QNames in no namespace, while any additional serialization parameters must have names that are namespace-qualified. If the serialization method is one of the four methods xml, html, xhtml, or text, then the additional serialization parameters MAY affect the output of the serializer to the extent (but only to the extent) that this specification leaves the output implementation-defined or implementation-dependent. For example, such parameters might control whether namespace declarations on an element are written before or after the attributes of the element, or they might define the number of space or tab characters to be inserted when the indent parameter is set to yes; but they could not instruct the serializer to suppress the error that occurs when the HTML output method encounters characters that are not permitted (see error [err:SERE0014]).

3.1 Setting Serialization Parameters by Means of a Data Model Instance

A host language MAY provide, by reference to this section, a mechanism by which the settings of serialization parameters are supplied in the form of an instance of the data model as specified in [XQuery and XPath Data Model (XDM) 3.0]. The instance of the data model used to determine the settings of serialization parameters MUST be processed as if by the procedure described below.

With the exception of the use-character-maps parameter, the setting of each serialization parameter is equal to the result of evaluating the XQuery expression

(validate lax { document { . } })
   /output:serialization-parameters
   /output:*[local-name() eq $param-name]/data(@value)

with the supplied instance of the data model as the context item, the param-name variable having as its value a value of type xs:string equal to the local part of the name of the particular serialization parameter, and the other components of the dynamic context and static context as specified in the subsequent tables. If in any case evaluating this expression would yield an error, serialization error [err:SEPM0017] results.

If the result of evaluating this expression for a particular serialization parameter is the empty sequence:

  1. if the parameter is either cdata-section-elements or supress-indentation and the result of evaluating the expression

    (validate lax { document { . } })
       /output:serialization-parameters
       /output:*[local-name() eq $param-name]
    

    with the same settings of the static context and dynamic context is not an empty sequence, the setting of the parameter is the empty list;

  2. otherwise, the setting of the parameter is unspecified.

The components of the static context used in evaluating the XQuery expressions are as defined in the following table.

Static Context Component Setting
XPath compatibility mode false
Statically known namespaces The pair (output,http://www.w3.org/2010/xslt-xquery-serialization)
Default element/type namespace "none"
Default function namespace http://www.w3.org/2005/xpath-functions
In-scope schema types, In-scope element declarations, Substitution groups, In-scope attribute declarations As defined by the schema for serialization parameters (B Schema for Serialization Parameters) and any additional implementation-defined in-scope schema components
In-scope variables {param-name}
Context item static type node()
Function signatures {fn:data($arg as item()*) as xs:anyAtomicType*}
Statically known collations { (http://www.w3.org/2005/xpath-functions/collation/codepoint, The Unicode codepoint collation ) }
Default collation The Unicode codepoint collation
Construction mode strip
Ordering mode ordered
Default order for empty sequences least
Boundary space policy strip
Copy-namespaces mode (preserve,inherit)
Base URI Absent
Statically known documents None
Statically known collections None
Statically known default collection type node()*
Statically known decimal formats None

The remaining components of the dynamic context used in evaluating the XQuery expressions in the preceding table are as defined in the following table.

Dynamic Context Component Setting
Context position 1
Context size 1
Variable values The param-name variable has a value of type xs:string equal the local part of the name of the serialization parameter under consideration
Function implementations The implementation of fn:data
Current dateTime Absent
Implicit timezone Absent
Available documents None
Available collections None
Default collection None

In the case of the use-character-maps parameter, the expression

(validate lax { document { . } })
  /output:serialization-parameters/output:use-character-maps
  /output:character-map[@character eq $char]/string(@map-string)

is evaluated for each Unicode character that is permitted in an XML document. The dynamic context and static context used to evaluate the expression are as defined above, except that in-scope variables is the set {char} and the value of the variable "char" is a value of type xs:string of length one whose value is the Unicode character under consideration. If the result of evaluating the expression is not an empty sequence, the pair consisting of the Unicode character and the result of evaluating the expression is part of the list of pairs in the value of the use-character-maps parameter. It is a serialization error [err:SEPM0018] if the result of evaluating this expression for any character is a sequence of length greater than one.

Using the same settings of the components of the dynamic context and static context, serialization error [err:SEPM0019] results if the result of evaluating the following expression is not true — that is, if the data model instance specifies more than one setting for any particular serialization parameter.

(document { . })/output:serialization-parameters
   /(count(distinct-values(*/node-name(.))) eq (count(*)))

Note:

A serializer or implementation of a host language does not need to be accompanied by an XQuery processor nor by a general-purpose schema validator in order to meet the requirements of this section. It merely needs to be capable of extracting values from an XDM instance that conforms to the schema for serialization parameters, while checking that the constraints implied by the schema and additional constraints implied by the XQuery validate expression or explicitly stated in this section are satisfied.

The host language MAY provide additional mechanisms for overriding the values of any serialization parameters specified through the mechanism defined in this section, as well as additional mechanisms for specifying the values of any serialization parameters whose values remain unspecified after applying the mechanism defined in this section.

If the instance of the data model contains elements or attributes in the instance of the data model that are in a namespace other than http://www.w3.org/2010/xslt-xquery-serialization, the implementation MAY interpret them to specify the values of implementation-defined serialization parameters in an implementation-defined manner.

The following XML document, if converted to a data model instance and processed using the mechanism described in this section, would specify the settings of the method, version and indent serialization parameters with the values xml, 1.0 and yes, respectively.

<output:serialization-parameters
       xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization">
  <output:method value="xml"/>
  <output:version value="1.0"/>
  <output:indent value="yes"/>
</output:serialization-parameters>

The following document would specify the setting of the cdata-section-elements serialization parameter with value the pair of expanded QNames (http://example.org/book/chapter,heading) and (http://example.org/book,footnote)

<output:serialization-parameters
       xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
       xmlns:book="http://example.org/book"
       xmlns="http://example.org/book/chapter">
  <output:cdata-section-elements value="heading book:footnote"/>
</output:serialization-parameters>

The following document would specify the value of the method serialization parameter with the value html.

Notice that in this example, the default namespace declaration in scope has no effect on the interpretation of the setting of the method parameter.

<output:serialization-parameters
       xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
       xmlns="http://example.org/ext">
  <output:method value="html"/>
</output:serialization-parameters>

The following document would specify the value of the method serialization parameter with value equal to the expanded QName (http://example.org/ext, jsp), and the use-character-maps parameter with value equal to the list of pairs, («, <%), (», %>)

<output:serialization-parameters
       xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
       xmlns:ext="http://example.org/ext">
  <output:method value="ext:jsp"/>
  <output:use-character-maps>
    <output:character-map character="&#xAB;" map-string="&lt;%"/>
    <output:character-map character="&#xBB;" map-string="%&gt;"/>
  </output:use-character-maps>
</output:serialization-parameters>

4 Phases of Serialization

Serialization comprises five phases of processing (preceded optionally by the sequence normalization process described in 2 Sequence Normalization).

For an implementation-defined output method, any of these phases MAY be skipped or MAY be performed in a different order than is specified here. For the output methods defined in this specification, these phases are carried out sequentially as follows:

  1. A meta element is added to the normalized sequence along with discarding an existing meta element, as controlled by the include-content-type parameter for the XHTML and HTML output methods.

  2. Markup generation produces the character representation of those parts of the serialized result that describe the structure of the normalized sequence. In the cases of the XML, HTML and XHTML output methods, this phase produces the character representations of the following:

    • the document type declaration;

    • start tags and end tags (except for attribute values, whose representation is produced by the character expansion phase);

    • processing instructions; and

    • comments.

    In the cases of the XML and XHTML output methods, this phase also produces the following:

    • the XML or text declaration; and

    • empty element tags (except for the attribute values);

    In the case of the text output method, this phase replaces the single document node produced by sequence normalization with a new document node that has exactly one child, which is a text node. The string value of the new text node is the string value of the document node that was produced by sequence normalization.

  3. Character expansion is concerned with the representation of characters appearing in text and attribute nodes in the normalized sequence. For each text and attribute node, the following rules are applied in sequence.

    1. If the node is an attribute that is a URI attribute value and the escape-uri-attributes parameter is set to require escaping of URI attributes, apply URI escaping as defined below, and skip rules b-e. Otherwise, continue with rule b.

      [Definition: URI escaping consists of the following three steps applied in sequence to the content of URI attribute values:]

      1. normalize to NFC using the method defined in Section 5.4.6 fn:normalize-unicode FO30

      2. percent-encode any special characters in the URI using the method defined in Section 6.4 fn:escape-html-uri FO30

      3. escape according to the rules of the XML or HTML output method, whichever is applicable, any characters that require escaping, and any characters that cannot be represented in the selected encoding. For example, replace < with &lt;. (See also section 7.3 Writing Character Data)

      [Definition: The values of attributes listed in D List of URI Attributes are URI attribute values. Attributes are not considered to be URI attributes simply because they are namespace declaration attributes or have the type annotation xs:anyURI.]

    2. If the node is a text node whose parent element is selected by the rules of the cdata-section-elements parameter for the applicable output method, create CDATA sections as described below, and skip rules c-e. Otherwise, continue with rule c.

      Apply the following two processes in sequence to create CDATA sections

      1. Unicode Normalization if requested by the normalization-form parameter.

      2. apply changes as detailed in the description of the cdata-section-elements parameter for the applicable output method.

    3. Apply character mapping as determined by the use-character-maps parameter for the applicable output method. For characters that were substituted by this process, skip rules d and e. For the remaining characters that were not modified by character mapping, continue with rule d.

    4. Apply Unicode Normalization if requested by the normalization-form parameter.

      [Definition: Unicode Normalization is the process of removing alternate representations of equivalent sequences from textual data, to convert the data into a form that can be binary-compared for equivalence, as specified in [UAX #15: Unicode Normalization Forms]. For specific recommendations for character normalization on the World Wide Web, see [Character Model for the World Wide Web 1.0: Normalization].]

      The meanings associated with the possible values of the normalization-form parameter are defined in section 5.1.8 XML Output Method: the normalization-form Parameter.

      Continue with step e.

    5. Escape according to the rules of the XML or HTML output method, whichever is applicable, any characters (such as < and &) where XML or HTML requires escaping, and any characters that cannot be represented in the selected encoding. For example, replace < with &lt;. (See also section 7.3 Writing Character Data). For characters such as > where XML defines a built-in entity but does not require its use in all circumstances, it is implementation-dependent whether the character is escaped.

  4. Indentation, as controlled by the indent parameter and the suppress-indentation parameter, MAY add or remove whitespace according to the rules defined by the applicable output method.

  5. Encoding, as controlled by the encoding parameter, converts the character stream produced by the previous phases into an octet stream.

    Note:

    Serialization is only defined in terms of encoding the result as a stream of octets. However, a serializer may provide an option that allows the encoding phase to be skipped, so that the result of serialization is a stream of Unicode characters. The effect of any such option is implementation-defined, and a serializer is not required to support such an option.

5 XML Output Method

The XML output method serializes the normalized sequence as an XML entity that MUST satisfy the rules for either a well-formed XML document entity or a well-formed XML external general parsed entity, or both. A serialization error [err:SERE0003] results if the serializer is unable to satisfy those rules, except for content modified by the character expansion phase of serialization, as described in 4 Phases of Serialization. The effects of the character expansion phase could result in the serialized output being not well-formed, but will not result in a serialization error. If a serialization error results, the serializer MUST signal the error.

If the document node of the normalized sequence has a single element node child and no text node children, then the serialized output is a well-formed XML document entity, and the serialized output MUST conform to the appropriate version of the XML Namespaces Recommendation [XML Names] or [XML Names 1.1]. If the normalized sequence does not take this form, then the serialized output is a well-formed XML external general parsed entity, which, when referenced within a trivial XML document wrapper like this:

<?xml version="version"?>
<!DOCTYPE doc [
<!ENTITY e SYSTEM "entity-URI">
]>
<doc>&e;</doc>

where entity-URI is a URI for the entity, and the value of the version pseudo-attribute is the value of the version parameter, produces a document which MUST itself be a well-formed XML document conforming to the corresponding version of the XML Namespaces Recommendation [XML Names] or [XML Names 1.1].

[Definition: A reconstructed tree may be constructed by parsing the XML document and converting it into an instance of the data model as specified in [XQuery and XPath Data Model (XDM) 3.0].] The result of serialization MUST be such that the reconstructed tree is the same as the result tree except for the following permitted differences:

A consequence of this rule is that certain characters MUST be output as character references, to ensure that they survive the round trip through serialization and parsing. Specifically, CR, NEL and LINE SEPARATOR characters in text nodes MUST be output respectively as "&#xD;", "&#x85;", and "&#x2028;", or their equivalents; while CR, NL, TAB, NEL and LINE SEPARATOR characters in attribute nodes MUST be output respectively as "&#xD;", "&#xA;", "&#x9;", "&#x85;", and "&#x2028;", or their equivalents. In addition, the non-whitespace control characters #x1 through #x1F and #x7F through #x9F in text nodes and attribute nodes MUST be output as character references.

For example, an attribute with the value "x" followed by "y" separated by a newline will result in the output "x&#xA;y" (or with any equivalent character reference). The XML output cannot be "x" followed by a literal newline followed by a "y" because after parsing, the attribute value would be "x y" as a consequence of the XML attribute normalization rules.

Note:

XML 1.0 did not permit an XML processor to normalize NEL or LINE SEPARATOR characters to a LINE FEED character. However, if a document entity that specifies version 1.1 invokes an external general parsed entity with no text declaration or a text declaration that specifies version 1.0, the external parsed entity is processed according to the rules of XML 1.1. For this reason, NEL and LINE SEPARATOR characters in text and attribute nodes must always be escaped using character references, regardless of the value of the version parameter.

XML 1.0 permitted control characters in the range #x7F through #x9F to appear as literal characters in an XML document, but XML 1.1 requires such characters, other than NEL, to be escaped as character references. An external general parsed entity with no text declaration or a text declaration that specifies a version pseudo-attribute with value 1.0 that is invoked by an XML 1.1 document entity must follow the rules of XML 1.1. Therefore, the non-whitespace control characters in the ranges #x1 through #x1F and #x7F through #x9F must always be escaped, regardless of the value of the version parameter.

It is a serialization error [err:SEPM0004] to specify the doctype-system parameter, or to specify the standalone parameter with a value other than omit, if the instance of the data model contains text nodes or multiple element nodes as children of the root node. The serializer MUST either signal the error, or recover by ignoring the request to output a document type declaration or standalone parameter.

5.1 The Influence of Serialization Parameters upon the XML Output Method

5.1.1 XML Output Method: the version Parameter

The version parameter specifies the version of XML and the version of Namespaces in XML to be used for outputting the instance of the data model. The version output in the XML declaration (if an XML declaration is not omitted) MUST correspond to the version of XML that the serializer used for outputting the instance of the data model. The value of the version parameter MUST match the VersionNum VersionNum VersionNum VersionNumXML production of the XML Recommendation [XML10] or [XML11]. A serialization error [err:SESU0013] results if the value of the version parameter specifies a version of XML that is not supported by the serializer; the serializer MUST signal the error.

This document provides the normative definition of serialization for the XML output method if the version parameter has either the value 1.0 or 1.1. For any other value of version parameter, the behavior is implementation-defined. In that case the implementation-defined behavior MAY supersede all other requirements of this recommendation.

If the serialized result would contain an NCName NCName NCName NCName NCName NCName NCName NCName NCName NCNameNames that contains a character that is not permitted by the version of Namespaces in XML specified by the version parameter, a serialization error [err:SERE0005] results. The serializer MUST signal the error.

If the serialized result would contain a character that is not permitted by the version of XML specified by the version parameter, a serialization error [err:SERE0006] results. The serializer MUST signal the error.

For example, if the version parameter has the value 1.0, and the instance of the data model contains a non-whitespace control character in the range #x1 to #x1F, a serialization error [err:SERE0006] results. If the version parameter has the value 1.1 and a comment node in the instance of the data model contains a non-whitespace control character in the range #x1 to #x1F or a control character other than NEL in the range #x7F to #x9F, a serialization error [err:SERE0006] results.

5.1.2 XML Output Method: the encoding Parameter

The encoding parameter specifies the encoding to be used for outputting the instance of the data model. Serializers are REQUIRED to support values of UTF-8 and UTF-16. A serialization error [err:SESU0007] occurs if an output encoding other than UTF-8 or UTF-16 is requested and the serializer does not support that encoding. The serializer MUST signal the error, or recover by using UTF-8 or UTF-16 instead. The serializer MUST NOT use an encoding whose name does not match the EncName EncName EncName EncName EncName EncNameXML production of the XML Recommendation [XML10].

When outputting a newline character in the instance of the data model, the serializer is free to represent it using any character sequence that will be normalized to a newline character by an XML parser, unless a specific mapping for the newline character is provided in a character map (see 9 Character Maps).

When outputting any other character that is defined in the selected encoding, the character MUST be output using the correct representation of that character in the selected encoding.

It is possible that the instance of the data model will contain a character that cannot be represented in the encoding that the serializer is using for output. In this case, if the character occurs in a context where XML recognizes character references (that is, in the value of an attribute node or text node), then the character MUST be output as a character reference. A serialization error [err:SERE0008] occurs if such a character appears in a context where character references are not allowed (for example, if the character occurs in the name of an element). The serializer MUST signal the error.

For example, if a text node contains the character LATIN SMALL LETTER E WITH ACUTE (#xE9), and the value of the encoding parameter is US-ASCII, the character MUST be serialized as a character reference. If a comment node contains the same character, a serialization error [err:SERE0008] results.

5.1.3 XML Output Method: the indent and suppress-indentation Parameters

The indent and suppress-indentation parameters control whether the serializer MAY adjust the whitespace in the serialized result so that a person will find it easier to read. If the indent parameter has the value yes, the serializer MAY output whitespace characters in addition to the whitespace characters in the instance of the data model. It MAY also elide from the output whitespace characters that occurred in the instance of the data model or replace such whitespace characters with other whitespace characters.

[Definition: The term content has the same meaning as the term ContentXML defined in Section 3.1 Start-Tags, End-Tags, and Empty-Element TagsXML of [XML10].] [Definition: The immediate content of an element is the part of the content of the element that is not also in the content of a child element of that element.]

If the indent parameter has the value no, the serializer MUST NOT output any additional, elide or replace whitespace characters. If the indent parameter has the value yes, the serializer MUST use an algorithm for dealing with whitespace characters that satisfies all of the following constraints. If more than one constraint applies, the serializer must apply the most restrictive constraint. That is, if any applicable constraint indicates that whitespace MUST NOT be added, elided or replaced, that constraint prevails; if an applicable constraint indicates that whitespace SHOULD NOT be added, elided or replaced, while all other applicable constraints indicate that whitespace MAY be added, elided or replaced, whitespace SHOULD NOT be added, elided or replaced.

  • Whitespace characters MAY be added adjacent to a text node only if the text node contains only whitespace characters. Whitespace characters in such a text node MAY also be elided or replaced. For example, a tab MAY be inserted as a replacement for existing spaces.

  • Whitespace characters MAY be added, elided or replaced in the immediate content of an element whose type annotation is xs:untyped or xs:anyType and that has element node children, in the immediate content of an element whose content model is element only, or outside the content of any element.

  • Whitespace characters MUST NOT be added, elided or replaced in the immediate content of an element whose content model is known to be simple or empty.

  • Whitespace characters SHOULD NOT be added, elided or replaced in places where the characters would constitute significant whitespace, for example, in the immediate content of an element that is annotated with a type other than xs:untyped or xs:anyType, and whose content model is known to be mixed.

  • Whitespace characters MUST NOT be added, elided or replaced in the content of an element whose expanded QName is a member of the list of expanded QNames in the value of the suppress-indentation parameter.

  • Whitespace characters MUST NOT be added, elided or replaced in a part of the result document that is controlled by an xml:space attribute with value preserve. (See [XML10] for more information about the xml:space attribute.)

Editorial note  
The text above has been revised anticipating adoption of the response to Bugzilla bug 6808.

Note:

The effect of these rules is to ensure that whitespace is only added in places where (a) XSLT's <xsl:strip-space> declaration could cause it to be removed, and (b) it does not affect the string value of any element node with simple content. It is usually not safe to indent document types that include elements with mixed content.

Note:

The whitespace added may possibly be based on whitespace stripped from either the source document or the stylesheet (in the case of XSLT), or guided by other means that might depend on the host language, in the case of an instance of the data model created using some other process.

5.1.4 XML Output Method: the cdata-section-elements Parameter

The cdata-section-elements parameter contains a list of expanded QNames. If the expanded QName of the parent of a text node is a member of the list, then the text node MUST be output as a CDATA section, except in those circumstances described below.

If the text node contains the sequence of characters ]]>, then the currently open CDATA section MUST be closed following the ]] and a new CDATA section opened before the >.

If the text node contains characters that are not representable in the character encoding being used to output the instance of the data model, then the currently open CDATA section MUST be closed before such characters, the characters MUST be output using character references or entity references, and a new CDATA section MUST be opened for any further characters in the text node.

CDATA sections MUST NOT be used except where they have been explicitly requested by the user, either by using the cdata-section-elements parameter, or by using some other implementation-defined mechanism.

Note:

This is phrased to permit an implementor to provide an option that attempts to preserve CDATA sections present in the source document.

5.1.5 XML Output Method: the omit-xml-declaration and standalone Parameters

The XML output method MUST output an XML declaration if the omit-xml-declaration parameter has the value no. The XML declaration MUST include both version information and an encoding declaration. If the standalone parameter has the value yes or the value no, the XML declaration MUST include a standalone document declaration with the same value as the value of the standalone parameter. If the standalone parameter has the value omit, the XML declaration MUST NOT include a standalone document declaration; this ensures that it is both an XML declaration (allowed at the beginning of a document entity) and a text declaration (allowed at the beginning of an external general parsed entity).

A serialization error [err:SEPM0009] results if the omit-xml-declaration parameter has the value yes, and

  • the standalone parameter has a value other than omit; or

  • the version parameter has a value other than 1.0 and the doctype-system parameter is specified.

The serializer MUST signal the error.

Otherwise, if the omit-xml-declaration parameter has the value yes, the XML output method MUST NOT output an XML declaration.

5.1.6 XML Output Method: the doctype-system and doctype-public Parameters

If the doctype-system parameter is specified, the XML output method MUST output a document type declaration immediately before the first element. The name following <!DOCTYPE MUST be the name of the first element, if any. If the doctype-public parameter is also specified, then the XML output method MUST output PUBLIC followed by the public identifier and then the system identifier; otherwise, it MUST output SYSTEM followed by the system identifier. The internal subset MUST be empty. The doctype-public parameter MUST be ignored unless the doctype-system parameter is specified.

5.1.7 XML Output Method: the undeclare-prefixes Parameter

The Data Model allows an element node that binds a non-empty prefix to have a child element node that does not bind that same prefix. In Namespaces in XML 1.1 ([XML Names 1.1]), this can be represented accurately by undeclaring prefixes. For the undeclaring prefix of the child element node, if the undeclare-prefixes parameter has the value yes, the output method is XML or XHTML, and the version parameter value is greater than 1.0, the serializer MUST undeclare its namespace. If the undeclare-prefixes parameter has the value no and the output method is XML or XHTML, then the undeclaration of prefixes MUST NOT occur.

Consider an element x:foo with four in-scope namespaces that associate prefixes with URIs as follows:

  • x is associated with http://example.org/x

  • y is associated with http://example.org/y

  • z is associated with http://example.org/z

  • xml is associated with http://www.w3.org/XML/1998/namespace

Suppose that it has a child element x:bar with three in-scope namespaces:

  • x is associated with http://example.org/x

  • y is associated with http://example.org/y

  • xml is associated with http://www.w3.org/XML/1998/namespace

If namespace undeclaration is in effect, it will be serialized this way:

<x:foo xmlns:x="http://example.org/x"
       xmlns:y="http://example.org/y"
       xmlns:z="http://example.org/z">
       
       <x:bar xmlns:z="">...</x:bar>
       
</x:foo>

In Namespaces in XML 1.0 ([XML Names]), prefix undeclaration is not possible. If the output method is XML or XHTML, the value of the undeclare-prefixes parameter is yes, and the value of the version parameter is 1.0, a serialization error [err:SEPM0010] results; the serializer MUST signal the error.

5.1.8 XML Output Method: the normalization-form Parameter

The normalization-form parameter is applicable to the XML output method. The values NFC and none MUST be supported by the serializer. A serialization error [err:SESU0011] results if the value of the normalization-form parameter specifies a normalization form that is not supported by the serializer; the serializer MUST signal the error.

The meanings associated with the possible values of the normalization-form parameter are as follows:

If the value of the parameter is fully-normalized, then no relevant construct of the parsed entity created by the serializer may start with a composing character. The term relevant construct has the meaning defined in section 2.13 of [XML11]. If this condition is not satisfied, a serialization error [err:SERE0012] MUST be signaled.

Note:

Specifying fully-normalized as the value of this parameter does not guarantee that the XML document output by the serializer will in fact be fully normalized as defined in [XML11]. This is because the serializer does not check that the text is include normalized, which would involve checking all external entities that it refers to (such as an external DTD). Furthermore, the serializer does not check whether any character escape generated using character maps represents a composing character.

5.1.9 XML Output Method: the media-type Parameter

The media-type parameter is applicable to the XML output method. See 3 Serialization Parameters for more information.

5.1.10 XML Output Method: the use-character-maps Parameter

The use-character-maps parameter is applicable to the XML output method. The result of serialization using the XML output method is not guaranteed to be well-formed XML if character maps have been specified. See 9 Character Maps for more information.

5.1.11 XML Output Method: the byte-order-mark Parameter

The byte-order-mark parameter is applicable to the XML output method. See 3 Serialization Parameters for more information.

Note:

The byte order mark may be undesirable under certain circumstances; for example, to concatenate resulting XML fragments without additional processing to remove the byte order mark. Therefore this specification does not mandate the byte-order-mark parameter to have the value yes when the encoding is UTF-16, even though the XML 1.0 and XML 1.1 specifications state that entities encoded in UTF-16 must begin with a byte order mark. Consequently, this specification does not guarantee that the resulting XML fragment, without a byte order mark, will not cause an error when processed by a conforming XML processor.

5.1.12 XML Output Method: the escape-uri-attributes Parameter

The escape-uri-attributes parameter is not applicable to the XML output method. It is the responsibility of the host language to specify whether an error occurs if this parameter is specified in combination with the XML output method, or if the parameter is simply dropped.

5.1.13 XML Output Method: the include-content-type Parameter

The include-content-type parameter is not applicable to the XML output method. It is the responsibility of the host language to specify whether an error occurs if this parameter is specified in combination with the XML output method, or if the parameter is simply dropped.

6 XHTML Output Method

The XHTML output method serializes the instance of the data model as XML, using the HTML compatibility guidelines defined in the XHTML specification.

It is entirely the responsibility of the person or process that creates the instance of the data model to ensure that the instance of the data model conforms to the [XHTML 1.0] or [XHTML 1.1] specification. It is not an error if the instance of the data model is invalid XHTML. Equally, it is entirely under the control of the person or process that creates the instance of the data model whether the output conforms to XHTML 1.0 Strict, XHTML 1.0 Transitional, or any other specific definition of XHTML.

The serialization of the instance of the data model follows the same rules as for the XML output method, with the general exceptions noted below and parameter-specific exceptions in 6.1 The Influence of Serialization Parameters upon the XHTML Output Method. These differences are based on the HTML compatibility guidelines published in Appendix C of [XHTML 1.0], which are designed to ensure that as far as possible, XHTML is rendered correctly on user agents designed originally to handle HTML.

Note:

Appendix C of [XHTML 1.0] describes a number of compatibility guidelines for users of XHTML who wish to render their XHTML documents with HTML user agents. In some cases, such as the guideline on the form empty elements should take, only the serialization process itself has the ability to follow the guideline. In such cases, those guidelines are reflected in the requirements on the serializer described above.

In all other cases, the guidelines can be adhered to by the instance of the data model that is input to the serialization process. The guideline on the use of whitespace characters in attribute values is one such example. Another example is that xml:lang="..." does not serialize to both xml:lang="..." and lang="..." as required by some legacy user agents. It is the responsibility of the person or process that creates the instance of the data model that is input to the serialization process to ensure it is created in a way that is consistent with the guidelines. No serialization error results if the input instance of the data model does not adhere to the guidelines.

6.1 The Influence of Serialization Parameters upon the XHTML Output Method

6.1.1 XHTML Output Method: the version Parameter

The behavior for version parameter for the XHTML output method is described in 5.1.1 XML Output Method: the version Parameter.

6.1.2 XHTML Output Method: the encoding Parameter

The behavior for encoding parameter for the XHTML output method is described in 5.1.2 XML Output Method: the encoding Parameter.

6.1.3 XHTML Output Method: the indent and suppress-indentation Parameters

If the indent parameter has the value yes, the serializer MAY add or remove whitespace as it serializes the result tree, if it observes the following constraints.

  • Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.

  • Whitespace MUST NOT be added or removed adjacent to an inline element. The inline elements are those elements in the XHTML namespace in the %inline category of any of the XHTML 1.0 DTD's, in the %inline.class category of the XHTML 1.1 DTD, and elements in the XHTML namespace with local names ins and del if they are used as inline elements (i.e., if they do not contain element children).

  • Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being those in the XHTML namespace with local names pre, script, style, and textarea.

  • Whitespace characters MUST NOT be added in the content of an element whose expanded QName is a member of the list of expanded QNames in the value of the suppress-indentation parameter.

Note:

The effect of the above constraints is to ensure any insertion or deletion of whitespace would not affect how a conforming HTML user agent would render the output, assuming the serialized document does not refer to any HTML style sheets.

The HTML definition of whitespace is different from the XML definition: see section 9.1 of [HTML] 4.01 specification.

6.1.4 XHTML Output Method: the cdata-section-elements Parameter

The behavior for cdata-section-elements parameter for the XHTML output method is described in 5.1.4 XML Output Method: the cdata-section-elements Parameter.

6.1.5 XHTML Output Method: the omit-xml-declaration and standalone Parameters

The behavior for omit-xml-declaration and standalone parameters for the XHTML output method is described in 5.1.5 XML Output Method: the omit-xml-declaration and standalone Parameters.

Note:

As with the XML output method, the XHTML output method specifies that an XML declaration will be output unless it is suppressed using the omit-xml-declaration parameter. Appendix C.1 of [XHTML 1.0] provides advice on the consequences of including, or omitting, the XML declaration.

6.1.6 XHTML Output Method: the doctype-system and doctype-public Parameters

The behavior for doctype-system and doctype-public parameters for the XHTML output method is described in 5.1.6 XML Output Method: the doctype-system and doctype-public Parameters.

6.1.7 XHTML Output Method: the undeclare-prefixes Parameter

The behavior for undeclare-prefixes parameter for the XHTML output method is described in 5.1.7 XML Output Method: the undeclare-prefixes Parameter.

6.1.8 XHTML Output Method: the normalization-form Parameter

The behavior for normalization-form parameter for the XHTML output method is described in 5.1.8 XML Output Method: the normalization-form Parameter.

6.1.9 XHTML Output Method: the media-type Parameter

The behavior for media-type parameter for the XHTML output method is described in 5.1.9 XML Output Method: the media-type Parameter.

6.1.10 XHTML Output Method: the use-character-maps Parameter

The behavior for use-character-maps parameter for the XHTML output method is described in 5.1.10 XML Output Method: the use-character-maps Parameter.

6.1.11 XHTML Output Method: the byte-order-mark Parameter

The behavior for byte-order-mark parameter for the XHTML output method is described in 5.1.11 XML Output Method: the byte-order-mark Parameter.

6.1.12 XHTML Output Method: the escape-uri-attributes Parameter

If the escape-uri-attributes parameter has the value yes, the XHTML output method MUST apply URI escaping to URI attribute values, except that relative URIs MUST NOT be absolutized.

Note:

This escaping is deliberately confined to non-ASCII characters, because escaping of ASCII characters is not always appropriate, for example when URIs or URI fragments are interpreted locally by the HTML user agent. Even in the case of non-ASCII characters, escaping can sometimes cause problems. More precise control of URI escaping is therefore available by setting escape-uri-attributes to no, and controlling the escaping of URIs by using methods defined in Section 6.2 fn:encode-for-uri FO30 and Section 6.3 fn:iri-to-uri FO30.

6.1.13 XHTML Output Method: the include-content-type Parameter

If the instance of the data model includes a head element in the XHTML namespace, and the include-content-type parameter has the value yes, the XHTML output method MUST add a meta element as the first child element of the head element, specifying the character encoding actually used.

For example,

<head>
<meta http-equiv="Content-Type" content="text/html; charset=EUC-JP" />
...

The content type SHOULD be set to the value given for the media-type parameter.

Note:

It is recommended that the host language use as default value for this parameter one of the MIME types ([RFC2046]) registered for XHTML. Currently, these are text/html (registered by [RFC2854]) and application/xhtml+xml (registered by [RFC3236]). Note that some user agents fail to recognize the charset parameter if the content type is not text/html.

If a meta element has been added to the head element as described above, then any existing meta element child of the head element having an http-equiv attribute with the value "Content-Type", making the comparison without regard to case after first stripping leading and trailing spaces from the value of the attribute solely for the purposes of comparison, MUST be discarded.

Note:

This process removes possible parameters in the attribute value. For example,

<meta http-equiv="Content-Type" content="text/html;version='3.0'" />

in the data model instance would be replaced by,

<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />

7 HTML Output Method

The HTML output method serializes the instance of the data model as HTML.

For example, the following XSL stylesheet generates html output,

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" version="4.0"/>
<xsl:template match="/">
  <html>
    <xsl:apply-templates/>
  </html>
</xsl:template>
...
</xsl:stylesheet>

In the example, the version attribute of the xsl:output element indicates the version of the HTML Recommendation [HTML] to which the serialized result is to conform.

It is entirely the responsibility of the person or process that creates the instance of the data model to ensure that the instance of the data model conforms to the HTML Recommendation [HTML]. It is not an error if the instance of the data model is invalid HTML. Equally, it is entirely under the control of the person or process that creates the instance of the data model whether the output conforms to HTML. If the result tree is valid HTML, the serializer MUST serialize the result in a way that conforms with the version of HTML specified by the version serialization parameter.

Editorial note  
Need to take into account HTML 5.0, per request made in Bugzilla bug 6129.

7.1 Markup for Elements

The HTML output method MUST NOT output an element differently from the XML output method unless the expanded QName of the element has a null namespace URI. [Definition: An element whose expanded QName has a non-null namespace URI MUST be output as XML. This is known as an XML Island.] If the expanded QName of the element has a null namespace URI, but the local part of the expanded QName is not recognized as the name of an HTML element, the element MUST be output in the same way as a non-empty, inline element such as span. In particular:

  1. If the result tree contains namespace nodes for namespaces other than the XML namespace, the HTML output method MUST represent these namespaces using attributes named xmlns or xmlns:prefix in the same way as the XML output method would represent them when the version parameter is set to 1.0.

  2. If the result tree contains elements or attributes whose names have a non-null namespace URI, the HTML output method MUST generate namespace-prefixed QNames for these nodes in the same way as the XML output method would do when the version parameter is set to 1.0.

  3. Where special rules are defined later in this section for serializing specific HTML elements and attributes, these rules MUST NOT be applied to an element or attribute whose name has a non-null namespace URI. However, the generic rules for the HTML output method that apply to all elements and attributes, for example the rules for escaping special characters in the text and the rules for indentation, MUST be used also for namespaced elements and attributes.

  4. When serializing an element whose name is not defined in the HTML specification, but that is in the null namespace, the HTML output method MUST apply the same rules (for example, indentation rules) as when serializing a span element. The descendants of such an element MUST be serialized as if they were descendants of a span element.

  5. When serializing an element whose name is in a non-null namespace, the HTML output method MUST apply the same rules (for example, indentation rules) as when serializing a div element. The descendants of such an element MUST be serialized as if they were descendants of a div element. , except for the influence of the cdata-section-elements serialization parameter on any text node children of the element.

The HTML output method MUST NOT output an end-tag for an empty element if the element type has an empty content model. For HTML 4.0, the element types that have an empty content model are area, base, basefont, br, col, frame, hr, img, input, isindex, link, meta and param. For example, an element written as <br/> or <br></br> in an XSLT stylesheet MUST be output as <br>.

Note:

The markup generation step of the phases of serialization only creates start tags and end tags for the HTML output method, never XML-style empty element tags. As such, a serializer MUST serialize an HTML element that has no children, but whose content model is not empty, using a pair of adjacent start and end element tags, or as a solitary start tag if the permitted by the context.

The HTML output method MUST recognize the names of HTML elements making the comparison without regard to case. For example, elements named br, BR or Br MUST all be recognized as the HTML br element and output without an end-tag.

The HTML output method MUST NOT perform escaping for any text node descendant, nor for any attribute of an element node descendant, of a script or style element.

For example, a script element created by an XQuery direct element constructor or an XSLT literal result element, such as:

<script>if (a &lt; b) foo()</script>

or

<script><![CDATA[if (a < b) foo()]]></script>

MUST be output as

<script>if (a < b) foo()</script>

A common requirement is to output a script element as shown in the example below:

<script type="application/ecmascript">
      document.write ("<em>This won't work</em>")
</script>

This is invalid HTML, for the reasons explained in section B.3.2 of the [HTML] 4.01 specification. Nevertheless, it is possible to output this fragment, using either of the following constructs:

Firstly, by use of a script element created by an XQuery direct element constructor or an XSLT literal result element:

<script type="application/ecmascript">
      document.write ("<em>This won't work</em>")
</script>

Secondly, by constructing the markup from ordinary text characters:

<script type="application/ecmascript">
      document.write ("&lt;em&gt;This won't work&lt;/em&gt;")
</script>

As the [HTML] specification points out, the correct way to write this is to use the escape conventions for the specific scripting language. For JavaScript, it can be written as:

<script type="application/ecmascript">
      document.write ("&lt;em&gt;This will work&lt;\/em&gt;")
</script>

The [HTML] 4.01 specification also shows examples of how to write this in various other scripting languages. The escaping MUST be done manually; it will not be done by the serializer.

7.2 Writing Attributes

The HTML output method MUST NOT escape "<" characters occurring in attribute values.

A boolean attribute is an attribute with only a single allowed value in any of the HTML DTDs, where the allowed value is equal without regard to case to the name of the attribute. The HTML output method MUST output any boolean attribute in minimized form if and only if the value of the attribute node actually is equal to the name of the attribute making the comparison without regard to case.

For example, a start-tag created using the following XQuery direct element constructor or XSLT literal result element

<OPTION selected="selected">

MUST be output as

<OPTION selected>

The HTML output method MUST NOT escape a & character occurring in an attribute value immediately followed by a { character (see Section B.7.1 of the HTML Recommendation [HTML]).

For example, a start-tag created using the following XQuery direct element constructor or XSLT literal result element

<BODY bgcolor='&amp;{{randomrbg}};'>

MUST be output as

<BODY bgcolor='&{randomrbg};'>

See 7.4 The Influence of Serialization Parameters upon the HTML Output Method for additional directives on how attributes may be written.

7.3 Writing Character Data

The HTML output method MAY output a character using a character entity reference in preference to using a numeric character reference, if an entity is defined for the character in the version of HTML that the output method is using. Entity references and character references SHOULD be used only where the character is not present in the selected encoding, or where the visual representation of the character is unclear (as with &nbsp;, for example).

When outputting a sequence of whitespace characters in the instance of the data model, within an element where whitespace is treated normally (but not in elements such as pre and textarea), the HTML output method MAY represent it using any sequence of whitespace that will be treated in the same way by an HTML user agent. See section 3.5 of [XHTML Modularization] for some additional information on handling of whitespace by an HTML user agent.

Certain characters are permitted in XML, but not in HTML — for example, the control characters #x7F-#x9F, are permitted in both XML 1.0 and XML 1.1, and the control characters #x1-#x8, #xB, #xC and #xE-#x1F are permitted in XML 1.1, but none of these is permitted in HTML. It is a serialization error [err:SERE0014] to use the HTML output method when such characters appear in the instance of the data model. The serializer MUST signal the error.

The HTML output method MUST terminate processing instructions with > rather than ?>. It is a serialization error [err:SERE0015] to use the HTML output method when > appears within a processing instruction in the data model instance being serialized.

7.4 The Influence of Serialization Parameters upon the HTML Output Method

7.4.1 HTML Output Method: the version Parameter

The version attribute indicates the version of the HTML Recommendation [HTML] to which the serialized result is to conform. If the serializer does not support the version of HTML specified by this parameter, it MUST signal a serialization error [err:SESU0013].

This document provides the normative definition of serialization for the HTML output method if the version parameter has the lexical form of a value of type decimal whose value is 1.0 or greater, but no greater than 4.01. For any other value of version parameter, the behavior is implementation-defined. In that case the implementation-defined behavior MAY supersede all other requirements of this recommendation.

7.4.2 HTML Output Method: the encoding Parameter

The encoding parameter specifies the encoding to be used. Serializers are REQUIRED to support values of UTF-8 and UTF-16. A serialization error [err:SESU0007] occurs if an output encoding other than UTF-8 or UTF-16 is requested and the serializer does not support that encoding. The serializer MUST signal the error.

It is possible that the instance of the data model will contain a character that cannot be represented in the encoding that the serializer is using for output. In this case, if the character occurs in a context where HTML recognizes character references, then the character MUST be output as a character entity reference or decimal numeric character reference; otherwise (for example, in a script or style element or in a comment), the serializer MUST signal a serialization error [err:SERE0008].

See 7.4.13 HTML Output Method: the include-content-type Parameter regarding how this parameter is used with the include-content-type parameter.

7.4.3 HTML Output Method: the indent and suppress-indentation Parameters

If the indent parameter has the value yes, then the HTML output method MAY add or remove whitespace as it serializes the result tree, if it observes the following constraints.

  • Whitespace MUST NOT be added other than before or after an element, or adjacent to an existing whitespace character.

  • Whitespace MUST NOT be added or removed adjacent to an inline element. The inline elements are those included in the %inline category of any of the HTML 4.01 DTD's, as well as the ins and del elements if they are used as inline elements (i.e., if they do not contain element children).

  • Whitespace MUST NOT be added or removed inside a formatted element, the formatted elements being pre, script, style, and textarea.

  • Whitespace characters MUST NOT be added in the content of an element whose expanded QName is a member of the list of expanded QNames in the value of the suppress-indentation parameter.

Note:

The effect of the above constraints is to ensure any insertion or deletion of whitespace would not affect how a conforming HTML user agent would render the output, assuming the serialized document does not refer to any HTML style sheets.

Note that the HTML definition of whitespace is different from the XML definition (see section 9.1 of the [HTML] specification).

7.4.4 HTML Output Method: the cdata-section-elements Parameter

The cdata-section-elements parameter is not applicable to the HTML output method, except in the case of XML Islands.

7.4.5 HTML Output Method: the omit-xml-declaration and standalone Parameters

The omit-xml-declaration and standalone parameters are not applicable to the HTML output method.

7.4.6 HTML Output Method: the doctype-system and doctype-public Parameters

If the doctype-public or doctype-system parameters are specified, then the HTML output method MUST output a document type declaration immediately before the first element. The name following <!DOCTYPE MUST be HTML or html. If the doctype-public parameter is specified, then the output method MUST output PUBLIC followed by the specified public identifier; if the doctype-system parameter is also specified, it MUST also output the specified system identifier following the public identifier. If the doctype-system parameter is specified but the doctype-public parameter is not specified, then the output method MUST output SYSTEM followed by the specified system identifier.

7.4.7 HTML Output Method: the undeclare-prefixes Parameter

The undeclare-prefixes parameter is not applicable to the HTML output method.

7.4.8 HTML Output Method: the normalization-form Parameter

The normalization-form parameter is applicable to the HTML output method. The values NFC and none MUST be supported by the serializer. A serialization error [err:SESU0011] results if the value of the normalization-form parameter specifies a normalization form that is not supported by the serializer; the serializer MUST signal the error.

7.4.9 HTML Output Method: the media-type Parameter

The media-type parameter is applicable to the HTML output method. See 3 Serialization Parameters for more information. See 7.4.13 HTML Output Method: the include-content-type Parameter regarding how this parameter is used with the include-content-type parameter.

7.4.10 HTML Output Method: the use-character-maps Parameter

The use-character-maps parameter is applicable to the HTML output method. See 9 Character Maps for more information.

7.4.11 HTML Output Method: the byte-order-mark Parameter

The byte-order-mark parameter is applicable to the HTML output method. See 3 Serialization Parameters for more information.

7.4.12 HTML Output Method: the escape-uri-attributes Parameter

If the escape-uri-attributes parameter has the value yes, the HTML output method MUST apply URI escaping to URI attribute values, except that relative URIs MUST NOT be absolutized.

Note:

This escaping is deliberately confined to non-ASCII characters, because escaping of ASCII characters is not always appropriate, for example when URIs or URI fragments are interpreted locally by the HTML user agent. Even in the case of non-ASCII characters, escaping can sometimes cause problems. More precise control of URI escaping is therefore available by setting escape-uri-attributes to no, and controlling the escaping of URIs by using methods defined in Section 6.2 fn:encode-for-uri FO30 and Section 6.3 fn:iri-to-uri FO30.

7.4.13 HTML Output Method: the include-content-type Parameter

If there is a head element, and the include-content-type parameter has the value yes, the HTML output method MUST add a meta element as the first child element of the head element specifying the character encoding actually used.

For example,

<HEAD>
<META http-equiv="Content-Type" content="text/html; charset=EUC-JP">
...

The content type MUST be set to the value given for the media-type parameter.

If a meta element has been added to the head element as described above, then any existing meta element child of the head element having an http-equiv attribute with the value "Content-Type", making the comparison without regard to case after first stripping leading and trailing spaces from the value of the attribute solely for the purposes of comparison, MUST be discarded.

Note:

This process removes possible parameters in the attribute value. For example,

<meta http-equiv="Content-Type" content="text/html;version='3.0'"/>

in the data model instance would be replaced by,

<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>

8 Text Output Method

The Text output method serializes the instance of the data model by outputting the string value of the document node created by the markup generation step of the phases of serialization without any escaping.

A newline character in the instance of the data model MAY be output using any character sequence that is conventionally used to represent a line ending in the chosen system environment.

8.1 The Influence of Serialization Parameters upon the Text Output Method

8.1.1 Text Output Method: the version Parameter

The version parameter is not applicable to the Text output method.

8.1.2 Text Output Method: the encoding Parameter

The encoding parameter identifies the encoding that the Text output method MUST use to convert sequences of characters to sequences of bytes. Serializers are REQUIRED to support values of UTF-8 and UTF-16. A serialization error [err:SESU0007] occurs if the serializer does not support the encoding specified by the encoding parameter. The serializer MUST signal the error. If the instance of the data model contains a character that cannot be represented in the encoding that the serializer is using for output, the serializer MUST signal a serialization error [err:SERE0008].

8.1.3 Text Output Method: the indent and suppress-indentation Parameters

The indent and suppress-indentation parameters are not applicable to the Text output method.

8.1.4 Text Output Method: the cdata-section-elements Parameter

The cdata-section-elements parameter is not applicable to the Text output method.

8.1.5 Text Output Method: the omit-xml-declaration and standalone Parameters

The omit-xml-declaration and standalone parameters are not applicable to the Text output method.

8.1.6 Text Output Method: the doctype-system and doctype-public Parameters

The doctype-system and doctype-public parameters are not applicable to the Text output method.

8.1.7 Text Output Method: the undeclare-prefixes Parameter

The undeclare-prefixes parameter is not applicable to the Text output method.

8.1.8 Text Output Method: the normalization-form Parameter

The normalization-form parameter is applicable to the Text output method. The values NFC and none MUST be supported by the serializer. A serialization error [err:SESU0011] results if the value of the normalization-form parameter specifies a normalization form that is not supported by the serializer; the serializer MUST signal the error.

8.1.9 Text Output Method: the media-type Parameter

The media-type parameter is applicable to the Text output method. See 3 Serialization Parameters for more information.

8.1.10 Text Output Method: the use-character-maps Parameter

The use-character-maps parameter is applicable to the Text output method. See 9 Character Maps for more information.

8.1.11 Text Output Method: the byte-order-mark Parameter

The byte-order-mark parameter is applicable to the Text output method. See 3 Serialization Parameters for more information.

8.1.12 Text Output Method: the escape-uri-attributes Parameter

The escape-uri-attributes parameter is not applicable to the Text output method.

8.1.13 Text Output Method: the include-content-type Parameter

The include-content-type parameter is not applicable to the Text output method.

9 Character Maps

The use-character-maps parameter is a list of characters and corresponding string substitutions.

Character maps allow a specific character appearing in a text or attribute node in the instance of the data model to be replaced with a specified string of characters during serialization. The string that is substituted is output "as is," and the serializer performs no checks that the resulting document is well-formed. This mechanism can therefore be used to introduce arbitrary markup in the serialized output. See Section 23.1 Character MapsXT30 of [XSL Transformations (XSLT) Version 3.0] for examples of using character mapping in XSLT.

Character mapping is applied to the characters that actually appear in a text or attribute node in the instance of the data model, before any other serialization operations such as escaping or Unicode Normalization are applied. If a character is mapped, then it is not subjected to XML or HTML escaping, nor to Unicode Normalization. The string that is substituted for a character is not validated or processed in any way by the serializer, except for translation into the target encoding. In particular, it is not subjected to XML or HTML escaping, it is not subjected to Unicode Normalization, and it is not subjected to further character mapping.

Character mapping is not applied to characters in text nodes whose parent elements are listed in the cdata-section-elements parameter, nor to characters for which output escaping has been disabled (disabling output escaping is an [XSL Transformations (XSLT) Version 3.0] feature), nor to characters in attribute values that are subject to URI escaping defined for the HTML and XHTML output methods, unless URI escaping has been disabled using the escape-uri-attributes parameter in the output definition.

On serialization, occurrences of a character specified in the use-character-maps in text nodes and attribute values are replaced by the corresponding string from the use-character-maps parameter.

Note:

Using a character map can result in non-well-formed documents if the string contains XML-significant characters. For example, it is possible to create documents containing unmatched start and end tags, references to entities that are not declared, or attributes that contain tags or unescaped quotation marks.

If a character is mapped, then it is not subjected to XML or HTML escaping.

A serialization error [err:SERE0008] occurs if character mapping causes the output of a string containing a character that cannot be represented in the encoding that the serializer is using for output. The serializer MUST signal the error.

10 Conformance

[Definition: Serialization is intended primarily as a component of a host language such as [XSL Transformations (XSLT) Version 3.0] or [XQuery 3.0: An XML Query Language].] Therefore, this document relies on specifications that use it to specify conformance criteria for Serialization in their respective environments. Specifications that set conformance criteria for their use of Serialization MUST NOT change the semantic definitions of Serialization as given in this specification, except by subsetting and/or compatible extensions. It is the responsibility of the host language to specify how serialization errors should be handled.

Certain facilities in this specification are described as producing implementation-defined results. A claim that asserts conformance with this specification MUST be accompanied by documentation stating the effect of each implementation-defined feature. For convenience, a non-normative checklist of implementation-defined features is provided at E Checklist of Implementation-Defined Features.

A References

A.1 Normative References

Character Model for the World Wide Web 1.0: Normalization
Character Model for the World Wide Web 1.0: Normalization, Richard Ishida, François Yergeau, Addison Phillips, et. al., Editors. World Wide Web Consortium, 27 Oct 2005. This version is http://www.w3.org/TR/2005/WD-charmod-norm-20051027/. The latest version is available at http://www.w3.org/TR/charmod-norm/.
XQuery and XPath Data Model (XDM) 3.0
XQuery and XPath Data Model (XDM) 3.0, Norman Walsh, John Snelson, Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/WD-xpath-datamodel-30-20101214/. The latest version is available at http://www.w3.org/TR/xpath-datamodel-30/.
XQuery and XPath Functions and Operators 3.0
XQuery and XPath Functions and Operators 3.0, Michael Kay, Editor. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/WD-xpath-functions-30-20101214/. The latest version is available at http://www.w3.org/TR/xpath-functions-30/.
HTML
HTML 4.01 Specification, Arnaud Le Hors, David Raggett, and Ian Jacobs, Editors. World Wide Web Consortium, 24 Dec 1999. This version is http://www.w3.org/TR/1999/REC-html401-19991224/. The latest version is available at http://www.w3.org/TR/html401/.
IANA
Character Sets. Internet Assigned Numbers Authority. Jan 2005.
RFC2046
Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types, N. Freed, N. Borenstein. Network Working Group, IETF, Nov 1996.
RFC2119
Key words for use in RFCs to Indicate Requirement Levels, S. Bradner. Network Working Group, IETF, Mar 1997.
RFC2278
IANA Charset Registration Procedures, N. Freed and J. Postel Network Working Group, IETF, Jan 1998.
RFC2854
The 'text/html' Media Type, D. Connolly, L. Masinter. Network Working Group, IETF, Jun 2000.
RFC3236
The 'application/xhtml+xml' Media Type, M. Baker and P. Stark. Network Working Group, IETF, Jan 2002.
Unicode Encoding
Unicode Character Encoding Model, Unicode Consortium. Unicode Standard Annex #17.
UAX #15: Unicode Normalization Forms
Unicode Normalization Forms, Unicode Consortium. Unicode Standard Annex #15.
XHTML 1.0
XHTML™ 1.0 The Extensible HyperText Markup Language (Second Edition), Steven Pemberton, Editor. World Wide Web Consortium, 01 Aug 2002. This version is http://www.w3.org/TR/2002/REC-xhtml1-20020801/. The latest version is available at http://www.w3.org/TR/xhtml1/.
XHTML 1.1
XHTML™ 1.1 - Module-based XHTML, Shane McCarron and Murray Altheim, Editors. World Wide Web Consortium, 31 May 2001. This version is http://www.w3.org/TR/2001/REC-xhtml11-20010531. The latest version is available at http://www.w3.org/TR/xhtml11/. XHTML™ 1.1 - Module-based XHTML - Second Edition, Shane McCarron and Masayasu Ishikawa, Editors. World Wide Web Consortium, 16 Feb 2007. This version is http://www.w3.org/TR/2007/WD-xhtml11-20070216. The latest version is available at http://www.w3.org/TR/xhtml11/.
XML10
Extensible Markup Language (XML) 1.0 (Fifth Edition), Tim Bray, François Yergeau, Jean Paoli, et. al., Editors. World Wide Web Consortium, 26 Nov 2008. This version is http://www.w3.org/TR/2008/REC-xml-20081126/. The latest version is available at http://www.w3.org/TR/xml/.
XML11
Extensible Markup Language (XML) 1.1 (Second Edition), Eve Maler, Jean Paoli, John Cowan, et. al., Editors. World Wide Web Consortium, 16 Aug 2006. This version is http://www.w3.org/TR/2006/REC-xml11-20060816. The latest version is available at http://www.w3.org/TR/xml11/.
XML Names
Namespaces in XML 1.0 (Third Edition), Dave Hollander, Richard Tobin, Tim Bray, et. al., Editors. World Wide Web Consortium, 08 Dec 2009. This version is http://www.w3.org/TR/2009/REC-xml-names-20091208/. The latest version is available at http://www.w3.org/TR/xml-names/.
XML Names 1.1
Namespaces in XML 1.1 (Second Edition), Dave Hollander, Tim Bray, Andrew Layman, and Richard Tobin, Editors. World Wide Web Consortium, 16 Aug 2006. This version is http://www.w3.org/TR/2006/REC-xml-names11-20060816. The latest version is available at http://www.w3.org/TR/xml-names11/.
XML Schema
XML Schema Part 1: Structures Second Edition, David Beech, Henry S. Thompson, Murray Maloney, and Noah Mendelsohn, Editors. World Wide Web Consortium, 28 Oct 2004. This version is http://www.w3.org/TR/2004/REC-xmlschema-1-20041028/. The latest version is available at http://www.w3.org/TR/xmlschema-1/.
XML Path Language (XPath) 3.0
XML Path Language (XPath) 3.0, Jonathan Robie, Don Chamberlin, Michael Dyck, John Snelson, Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/WD-xpath-30-20101214/. The latest version is available at http://www.w3.org/TR/xpath-30/.
XQuery 3.0: An XML Query Language
XQuery 3.0: An XML Query Language, Jonathan Robie, Don Chamberlin, Michael Dyck, John Snelson, Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/WD-xquery-30-20101214/. The latest version is available at http://www.w3.org/TR/xquery-30/.
XSL Transformations (XSLT) Version 3.0
XSL Transformations (XSLT) Version 3.0 (expected), Michael Kay, Editor. World Wide Web Consortium, (not yet published but anticipated in July or August 2011; see the list of XSLT specifications)

A.2 Informative References

XHTML Modularization
XHTML™ Modularization 1.1, Shane McCarron, Subramanian Peruvemba, Mark Birbeck, et. al., Editors. World Wide Web Consortium, 08 Oct 2008. This version is http://www.w3.org/TR/2008/REC-xhtml-modularization-20081008. The latest version is available at http://www.w3.org/TR/xhtml-modularization/. Modularization of XHTML™ 1.0 - Second Edition, , , , et. al., Editors. World Wide Web Consortium, 18 Feb 2004. This version is http://www.w3.org/TR/2004/WD-xhtml-modularization-20040218. The latest version is available at http://www.w3.org/TR/xhtml-modularization/.
XQuery 1.0 and XPath 2.0 Data Model
XQuery 1.0 and XPath 2.0 Data Model (XDM) (Second Edition), Norman Walsh, Mary Fernández, Ashok Malhotra, et. al., Editors. World Wide Web Consortium, 14 December 2010. This version is http://www.w3.org/TR/2010/REC-xpath-datamodel-20101214/. The latest version is available at http://www.w3.org/TR/xpath-datamodel/.
XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)
XSLT 2.0 and XQuery 1.0 Serialization (Second Edition), W3C Recommendation, Henry Zongaro, Norman Walsh, Joanne Tong, et. al., Editors. World Wide Web Consortium, 14  December  2010. This version is http://www.w3.org/TR/2010/REC-xslt-xquery-serialization-20101214/

B Schema for Serialization Parameters

The following schema describes the structure of a Data Model instance that can be used to specify the settings of serialization parameters using the mechanism described in 3.1 Setting Serialization Parameters by Means of a Data Model Instance.

A copy of this schema is available at http://www.w3.org/2010/xslt-xquery-serialization/schema-for-parameters-for-xslt-xquery-serialization.xsd.

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
      targetNamespace="http://www.w3.org/2010/xslt-xquery-serialization"
      xmlns:output="http://www.w3.org/2010/xslt-xquery-serialization"
      elementFormDefault="qualified">

  <xs:annotation>
    <xs:documentation>
      This is a schema for serialization parameters for
      XSLT and XQuery Serialization 3.0.

      This schema is available for use under the conditions of the
      W3C Software License published at
      http://www.w3.org/Consortium/Legal/copyright-software-19980720 
      
      It defines a schema for XML Infoset instances with which a user of
      a host language MAY specify serialization parameters for use in
      serializing an instance of the XQuery and XPath Data Model.  It
      also provides hooks that allow the inclusion of implementation-
      defined serialization parameters and implementation-defined
      modifiers to serialization parameters.
    </xs:documentation>
  </xs:annotation>

  <xs:simpleType name="QNames-type">
    <xs:list itemType="xs:QName"/>
  </xs:simpleType>

  <xs:simpleType name="yes-no-type">
    <xs:restriction base="xs:token">
      <xs:enumeration value="no"/>
      <xs:enumeration value="yes"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="yes-no-omit-type">
    <xs:restriction base="xs:token">
      <xs:enumeration value="no"/>
      <xs:enumeration value="omit"/>
      <xs:enumeration value="yes"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="char-type">
    <xs:restriction base="xs:string">
      <xs:maxLength value="1"/>
      <xs:minLength value="1"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="encoding-string-type">
    <xs:restriction base="xs:string">
      <xs:pattern value="[A-Za-z]([A-Za-z0-9._]|'-')*"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="method-type">
    <xs:union>
      <xs:simpleType>
        <xs:restriction base="xs:string">
          <xs:enumeration value="html"/>
          <xs:enumeration value="text"/>
          <xs:enumeration value="xml"/>
          <xs:enumeration value="xhtml"/>
        </xs:restriction>
      </xs:simpleType>
      <xs:simpleType>
        <xs:restriction base="xs:QName">
          <xs:pattern value=".*:.*"/>
        </xs:restriction>
      </xs:simpleType>
    </xs:union>
  </xs:simpleType>

  <xs:simpleType name="pubid-char-string-type">
    <xs:restriction base="xs:string">
      <xs:pattern value="([- \r\n\ta-zA-Z0-9'()+,./:=?;!*#@$_%])*"/>
    </xs:restriction>
  </xs:simpleType>

  <xs:simpleType name="system-id-string-type">
    <xs:restriction base="xs:string">
      <xs:pattern value="[^']*|[^&quot;]*"/>
    </xs:restriction>
  </xs:simpleType>

  <!--
     - Base type of all serialization parameter types
    -->
  <xs:complexType name="base-param-type">
    <xs:complexContent>
      <xs:restriction base="xs:anyType">
        <xs:anyAttribute namespace="##other" processContents="lax"/>
      </xs:restriction>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Generic string serialization parameters
    -->
  <xs:complexType name="string-param-type">
    <xs:complexContent>
      <xs:extension base="output:base-param-type">
        <xs:attribute name="value" type="xs:string" use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Serialization parameter type for "yes", "no" or "omit"
     - serialization parameters
    -->
  <xs:complexType name="yes-no-param-type">
    <xs:complexContent>
      <xs:extension base="output:base-param-type">
        <xs:attribute name="value" type="output:yes-no-type" use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Serialization parameter type for list of xs:QName
     - serialization parameters
    -->
  <xs:complexType name="QNames-param-type">
    <xs:complexContent>
      <xs:extension base="output:base-param-type">
        <xs:attribute name="value" type="output:QNames-type" use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Serialization parameter type for "yes", "no" or "omit"
     - serialization parameters
    -->
  <xs:complexType name="yes-no-omit-param-type">
    <xs:complexContent>
      <xs:extension base="output:base-param-type">
        <xs:attribute name="value" type="output:yes-no-omit-type"
              use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Serialization parameter element for byte-order-mark parameter
    -->
  <xs:element name="serialization-parameter-element"
        abstract="true"
        type="output:base-param-type"/>

  <!--
     - Serialization parameter element for byte-order-mark parameter
    -->
  <xs:element id="byte-order-mark" name="byte-order-mark" type="output:yes-no-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for cdata-section-elements parameter
    -->
  <xs:element id="cdata-section-elements" name="cdata-section-elements" type="output:QNames-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter type for doctype-public parameter
    -->
  <xs:complexType name="doctype-public-param-type">
    <xs:complexContent>
      <xs:extension base="output:base-param-type">
        <xs:attribute name="value" type="output:pubid-char-string-type"
              use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Serialization parameter element for doctype-public parameter
    -->
  <xs:element id="doctype-public" name="doctype-public" type="output:doctype-public-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter type for doctype-system parameter
    -->
  <xs:complexType name="doctype-system-param-type">
    <xs:complexContent>
      <xs:extension base="output:base-param-type">
        <xs:attribute name="value" type="output:system-id-string-type"
              use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Serialization parameter element for doctype-system parameter
    -->
  <xs:element id="doctype-system" name="doctype-system" type="output:doctype-system-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter type for encoding parameter
    -->
  <xs:complexType name="encoding-param-type">
    <xs:complexContent>
      <xs:extension base="output:base-param-type">
        <xs:attribute name="value" type="output:encoding-string-type"
              use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Serialization parameter element for method parameter
    -->
  <xs:element id="encoding" name="encoding" type="output:encoding-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for escape-uri-attributes parameter
    -->
  <xs:element id="escape-uri-attributes" name="escape-uri-attributes" type="output:yes-no-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for include-content-type parameter
    -->
  <xs:element id="include-content-type" name="include-content-type" type="output:yes-no-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for indent parameter
    -->
  <xs:element id="indent" name="indent" type="output:yes-no-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for media-type parameter
    -->
  <xs:element id="media-type" name="media-type" type="output:yes-no-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter type for method parameter
    -->
  <xs:complexType name="method-param-type">
    <xs:complexContent>
      <xs:extension base="output:base-param-type">
        <xs:attribute name="value" type="output:method-type"
                use="required"/>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Serialization parameter element for method parameter
    -->
  <xs:element id="method" name="method" type="output:method-param-type"
        substitutionGroup="output:serialization-parameter-element"/>
  <!--
     - Serialization parameter element for normalization-form parameter
    -->
  <xs:element id="normalization-form" name="normalization-form" type="output:string-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for omit-xml-declaration parameter
    -->
  <xs:element id="omit-xml-declaration" name="omit-xml-declaration" type="output:yes-no-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for standalone parameter
    -->
  <xs:element id="standalone" name="standalone" type="output:yes-no-omit-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for suppress-indentation parameter
    -->
  <xs:element id="suppress-indentation" name="suppress-indentation" type="output:QNames-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for undeclare-prefixes parameter
    -->
  <xs:element id="undeclare-prefixes" name="undeclare-prefixes" type="output:yes-no-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter type for use-character-maps
     - parameter
    -->
  <xs:complexType name="use-character-maps-param-type">
    <xs:complexContent>
      <xs:extension base="output:base-param-type">
        <xs:sequence>
          <xs:element name="character-map" minOccurs="0"
                  maxOccurs="unbounded">
            <xs:complexType>
              <xs:attribute name="character" type="output:char-type"/>
              <xs:attribute name="map-string" type="xs:string"/>
              <xs:anyAttribute namespace="##other"
                     processContents="lax"/>
            </xs:complexType>
          </xs:element>
          <xs:any minOccurs="0" namespace="##other"
                     processContents="lax"/>
        </xs:sequence>
      </xs:extension>
    </xs:complexContent>
  </xs:complexType>

  <!--
     - Serialization parameter element for use-character-maps parameter
    -->
  <xs:element id="use-character-maps" name="use-character-maps"
        type="output:use-character-maps-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <!--
     - Serialization parameter element for version parameter
    -->
  <xs:element id="version" name="version"
        type="output:string-param-type"
        substitutionGroup="output:serialization-parameter-element"/>

  <xs:element name="serialization-parameters">
    <xs:complexType>
      <xs:sequence>
        <xs:element ref="output:serialization-parameter-element"
              minOccurs="0" maxOccurs="unbounded"/>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

C Summary of Error Conditions

This document uses the err prefix which represents the same namespace URI (http://www.w3.org/2005/xqt-errors) as defined in [XML Path Language (XPath) 3.0]. Use of this namespace prefix binding in this document is not normative.

err:SENR0001

It is an error if an item in S6 in sequence normalization is an attribute node or a namespace node.

err:SERE0003

It is an error if the serializer is unable to satisfy the rules for either a well-formed XML document entity or a well-formed XML external general parsed entity, or both, except for content modified by the character expansion phase of serialization.

err:SEPM0004

It is an error to specify the doctype-system parameter, or to specify the standalone parameter with a value other than omit, if the instance of the data model contains text nodes or multiple element nodes as children of the root node.

err:SERE0005

It is an error if the serialized result would contain an NCName NCName NCName NCName NCName NCName NCName NCName NCName NCNameNames that contains a character that is not permitted by the version of Namespaces in XML specified by the version parameter.

err:SERE0006

It is an error if the serialized result would contain a character that is not permitted by the version of XML specified by the version parameter.

err:SESU0007

It is an error if an output encoding other than UTF-8 or UTF-16 is requested and the serializer does not support that encoding.

err:SERE0008

It is an error if a character that cannot be represented in the encoding that the serializer is using for output appears in a context where character references are not allowed (for example if the character occurs in the name of an element).

err:SEPM0009

It is an error if the omit-xml-declaration parameter has the value yes, and the standalone attribute has a value other than omit; or the version parameter has a value other than 1.0 and the doctype-system parameter is specified.

err:SEPM0010

It is an error if the output method is xml or xhtml, the value of the undeclare-prefixes parameter is yes, and the value of the version parameter is 1.0.

err:SESU0011

It is an error if the value of the normalization-form parameter specifies a normalization form that is not supported by the serializer.

err:SERE0012

It is an error if the value of the normalization-form parameter is fully-normalized and any relevant construct of the result begins with a combining character.

err:SESU0013

It is an error if the serializer does not support the version of XML or HTML specified by the version parameter.

err:SERE0014

It is an error to use the HTML output method if characters which are permitted in XML but not in HTML appear in the instance of the data model.

err:SERE0015

It is an error to use the HTML output method when > appears within a processing instruction in the data model instance being serialized.

err:SEPM0016

It is an error if a parameter value is invalid for the defined domain.

err:SEPM0017

It is an error if evaluating an expression in order to extract the setting of a serialization parameter from a data model instance would yield an error.

err:SEPM0018

It is an error if evaluating an expression in order to extract the setting of the use-character-maps serialization parameter from a data model instance would yield a sequence of length greater than one.

err:SEPM0019

It is an error if an instance of the data model used to specify the settings of serialization parameters specifies the value of the same parameter more than once.

D List of URI Attributes

The following list of attributes are declared as type %URI or %UriList for a given HTML or XHTML element, with the exception of the name attribute for element A which is not a URI type. The name attribute for element A should be escaped as is recommended by the HTML Recommendation [HTML] in Appendix B.2.1.

Attributes Elements
action FORM
archive OBJECT
background BODY
cite BLOCKQUOTE, DEL, INS, Q
classid OBJECT
codebase APPLET, OBJECT
data OBJECT
datasrc BUTTON, DIV, INPUT, OBJECT, SELECT, SPAN, TABLE, TEXTAREA
for SCRIPT
href A, AREA, BASE, LINK
longdesc FRAME, IFRAME, IMG
name A
profile HEAD
src FRAME, IFRAME, IMG, INPUT, SCRIPT
usemap IMG, INPUT, OBJECT

E Checklist of Implementation-Defined Features (Non-Normative)

This appendix provides a summary of Serialization features whose effect is explicitly implementation-defined. The conformance rules (see 10 Conformance) require vendors to provide documentation that explains how these choices have been exercised.

  1. For any implementation-defined output method, it is implementation-defined whether sequence normalization process takes place. (See 2 Sequence Normalization)
  2. If the namespace URI is non-null for the method serialization parameter, then the parameter specifies an implementation-defined output method. (See 3 Serialization Parameters)
  3. The effect of additional serialization parameters on the output of the serializer, where the name of such a parameter must be namespace-qualified, is implementation-defined or implementation-dependent. The extent of this effect on the output must not override the provisions of this specification. (See 3 Serialization Parameters)
  4. The effect of providing an option that allows the encoding phase to be skipped, so that the result of serialization is a stream of Unicode characters, is implementation-defined. The serializer is not required to support such an option. (See 4 Phases of Serialization)
  5. If an implementation supports a value of the version parameter for the XML or XHTML output method for which this document does not provide a normative definition, the behavior is implementation-defined. (See 5.1.1 XML Output Method: the version Parameter)
  6. An serializer may provide an implementation-defined mechanism to place CDATA sections in the result tree. (See 5.1.4 XML Output Method: the cdata-section-elements Parameter)
  7. If the value of the normalization-form form parameter is not NFC, NFD, NFKC, NFKD, fully-normalized, or none then the meaning of the value and its effect is implementation-defined. (See 5.1.8 XML Output Method: the normalization-form Parameter)
  8. If an implementation supports a value of the version parameter for the HTML output method for which this document does not provide a normative definition, the behavior is implementation-defined. (See 7.4.1 HTML Output Method: the version Parameter)

F Revision Log (Non-Normative)

There are two categories of changes that have been made to this document: those that have also been incorporated into [XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)], and those made since that document was published.

F.1 Changes since XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)

The following sections detail the changes to the serialization specification since the publication of the [XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)]. All changes are of a minor nature. No change introduces an incompatibility, unless indicated below. Informally this means, unless indicated otherwise below, given a sequence that does not rely on any feature of [XQuery and XPath Data Model (XDM) 3.0] that was not available in [XQuery 1.0 and XPath 2.0 Data Model], and a set of serialization parameters that does not include the suppress-indentation parameter, a serializer that conforms to the requirements of [XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)] would produce a serialized result or an error that is consistent with the requirements of this specification, given the same sequence and serialization parameters, in addition to the suppress-indentation parameter with value empty.

F.1.1 Changes applied for the first Public Working Draft

The following changes have been applied since the publication of [XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)]. None of these changes introduces an incompatibility with [XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)].

  • Applied decision of Bugzilla bug 6723 (Erratum SE.E13), clarifying how HTML elements that have no children but whose content model is not empty are serialized.

  • Applied decision of Bugzilla bug 6732 (Erratum SE.E12), clarifying for which versions of XML and HTML this document makes normative statements.

  • Take into account presence of function items in a sequence that is to be serialized.

  • Miscellaneous minor editorial improvements.

F.1.2 Changes applied for the second Public Working Draft

The following changes have been applied since the first Public Working Draft of this specification was published. None of these changes introduces an incompatibility with [XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)], unless otherwise indicated below.

  • Added definition of suppress-indentation serialization parameter as requested in Bugzilla bug 6535.

  • Applied changes for Bugzilla bug 7829 (erratum SE.E14), clarifying how minimized attributes are handled under the rules of the HTML output method.

  • Applied changes for Bugzilla bug 8245 (erratum SE.E15), correcting a serialization error that mentions which control characters are not permitted under the rules of the HTML output method.

  • Applied changes for Bugzilla bug 7823 (erratum SE.E16), clarifying how the script and style elements are handled for the HTML output method.

  • Applied changes for Bugzilla bug 8651 (erratum SE.E17), clarifying what it means to compare without regard to case.

  • Applied changes for Bugzilla bug 8206 (erratum SE.E18), clarifying what it means to escape according to HTML or XML rules.

  • Relaxed rules for the XML output method that specify where a serializer is permitted to add whitespace, as requested in Bugzilla bug 6808. This introduces an incompatibility only inasmuch as the serialized results produced by a serializer conforming to this specification could differ from the results a serializer that adheres to [XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)] would be permitted to produce.

  • Defined a mechanism for specifying serialization parameter settings in the form of a data model instance as suggested for resolving Bugzilla bug 9302

  • Editorial change to replace all uses of the words legal and illegal with more appropriate terms.

F.1.3 Changes applied for the third Public Working Draft

The following changes have been applied since the second Public Working Draft of this specification was published. None of these changes introduces an incompatibility with [XSLT 2.0 and XQuery 1.0 Serialization (Second Edition)], unless otherwise indicated below.

  • Applied changes for Bugzilla bug 11635 (erratum SE.E19), correcting description of error err:SEPM0010.

F.2 Changes incorporated in the Second Edition

The following table lists changes made to this document that are described in detail in the Errata to the first edition. The rationale for each erratum is explained in the corresponding Bugzilla database entry. The following table summarizes the errata that have been applied.

Erratum Bugzilla Category Description
E1 4372 substantive This erratum places constraints on the type of string that is valid for the doctype-public attribute of xsl:output.
E2 4557 editorial This erratum corrects an editorial error concerning the number of phases of serialization.
E3 5066 editorial This erratum corrects an editorial error concerning the currently registered XHTML media types.
E4 5433 substantive This erratum clarifies how descendant elements of an XML island must be serialized according to the HTML output method.
E5 5439 substantive This erratum aligns the description of the effect of the include-content-type serialization parameter of the HTML output method with that of the XHTML output method.
E6 5458 substantive This erratum ensures that the sequence normalization process preserves any type annotations associated with nodes in the input sequence.
E7 5300 substantive This erratum clarifies how elements with empty content models are to be serialized under the HTML and XHTML output methods.
E8 5441 substantive This erratum ensures that Unicode normalization applies to all characters that might be adjacent in the serialized result produced by the text output method, including those that are in text nodes that are separated by element nodes in the data model instance.
E9 5993 substantive This erratum makes previously non-normative text that describes how the xhtml and html output methods must behave if the indent parameter has the value yes into normative text.
E10 6466 substantive This erratum specifies the syntactic constraints on the values of the doctype-public and doctype-system serialization parameters.
E11 6376 editorial This erratum makes clear which parts of the recommendation are not considered to be normative.