The following steps form the HTML fragment serialization
algorithm. The algorithm takes as input a DOM
Document, referred to as the node, and either returns a string or raises an
This algorithm serializes the children of the node being serialized, not the node itself.
Let s be a string, and initialize it to the empty string.
For each child node of the node, in tree order, run the following steps:
Let current node be the child node being processed.
Append the appropriate string from the following list to s:
Append a U+003C LESS-THAN SIGN (
character, followed by the element's tag name. (For nodes
created by the HTML parser,
Document.renameNode(), the tag name will be
For each attribute that the element has, append a U+0020
SPACE character, the attribute's name (which, for attributes
set by the HTML parser or by
Element.setAttribute(), will be lowercase), a
U+003D EQUALS SIGN (
=) character, a
U+0022 QUOTATION MARK (
character, the attribute's value, escaped as described below in attribute
mode, and a second U+0022 QUOTATION MARK (
While the exact order of attributes is UA-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialize an element's attributes in the same order.
Append a U+003E GREATER-THAN SIGN (
If current node is an
wbr element, then continue on to the next child
node at this point.
If current node is a
listing element, append
a U+000A LINE FEED (LF) character.
Append the value of running the HTML fragment
serialization algorithm on the current
node element (thus recursing into this algorithm for
that element), followed by a U+003C LESS-THAN SIGN (
<) character, a U+002F SOLIDUS (
/) character, the element's tag name again,
and finally a U+003E GREATER-THAN SIGN (
If one of the ancestors of current node
plaintext element, then append the value of current node's
Otherwise, append the value of current
data DOM attribute, escaped as described
Append the literal string
LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS,
U+002D HYPHEN-MINUS), followed by the value of current node's
attribute, followed by the literal string
(U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN
Append the literal string
LESS-THAN SIGN, U+003F QUESTION MARK), followed by the value
of current node's
target DOM attribute, followed by a single
U+0020 SPACE character, followed by the value of current node's
attribute, followed by a single U+003E GREATER-THAN SIGN
Append the literal string
LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL
LETTER D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL
LETTER C, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL
LETTER Y, U+0050 LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL
LETTER E), followed by a space (U+0020 SPACE), followed by the
value of current node's
name DOM attribute, followed by the literal
> (U+003E GREATER-THAN SIGN).
Other node types (e.g.
occur as children of elements. If, despite this, they somehow do
occur, this algorithm must raise an
The result of the algorithm is the string s.
Escaping a string (for the
purposes of the algorithm above) consists of replacing any
occurrences of the "
&" character by the
&", any occurrences of the
U+00A0 NO-BREAK SPACE character by the string "
", and, if the algorithm was invoked in
the attribute mode, any occurrences of the "
"" character by the string "
"", or if it was not, any occurrences of
<" character by the string "
<", any occurrences of the "
>" character by the string "
Entity reference nodes are assumed to be expanded by the user agent, and are therefore not covered in the algorithm above.
It is possible that the output of this algorithm, if
parsed with an HTML parser, will not return the
original tree structure. For instance, if a
element to which a
Comment node has been
appended is serialized and the output is then reparsed, the comment
will end up being displayed in the text field. Similarly, if, as a
result of DOM manipulation, an element contains a comment that
contains the literal string "
when the result of serializing the element is parsed, the comment
will be truncated at that point and the rest of the comment will be
interpreted as markup. More examples would be making a
script element contain a text node with the text string
</script>", or having a
p element that
ul element (as the
start tag would imply the end
tag for the
The following steps form the HTML fragment parsing
algorithm. The algorithm optionally takes as input an
Element node, referred to as the context element, which gives the context for the
parser, as well as input, a string to parse, and
returns a list of zero or more nodes.
Parts marked fragment case in algorithms in the parser section are parts that only occur if the parser was created for the purposes of this algorithm (and with a context element). The algorithms have been annotated with such markings for informational purposes only; such markings have no normative weight. If it is possible for a condition described as a fragment case to occur even when the parser wasn't created for the purposes of handling this algorithm, then that is an error in the specification.
Create a new
Document node, and mark it as being
an HTML document.
Create a new HTML parser, and associate it with
the just created
If there is a context element, run these substeps:
Set the HTML parser's tokenization stage's content model flag according to the context element, as follows:
Let root be a new
with no attributes.
Append the element root to the
Document node created above.
Set up the parser's stack of open elements so that it contains just the single element root.
Reset the parser's insertion mode appropriately.
The parser will reference the context element as part of that algorithm.
Set the parser's
form element pointer
to the nearest node to the context element
that is a
form element (going straight up the
ancestor chain, and including the element itself, if it is a
form element), or, if there is no such
form element, to null.
Place into the input stream for the HTML parser just created the input. The encoding confidence is irrelevant.
Start the parser and let it run until it has consumed all the characters just inserted into the input stream.
If there is a context element, return the child nodes of root, in tree order.
Otherwise, return the children of the
object, in tree order.