The following steps form the HTML
fragment serialisation serialization algorithm . The algorithm
takes as input a DOM Element
or Document
, referred to as the node , and either returns
a string or raises an exception.
This algorithm serialises serializes
the children of the node being serialised, serialized, not the node itself.
Let s be a string, and initialise it to the empty string.
For each child node child of
the node , in tree
order , append run the following steps:
Let current node be the child node being processed.
Append the appropriate string from the following list to s :
Element
Append a U+003C LESS-THAN SIGN ( <
)
character, followed by the element's tag name. (For nodes created
by the HTML parser , Document.createElement()
, or Document.renameNode()
, the tag name will be
lowercase.)
For each attribute that the element has, append a U+0020 SPACE
character, the attribute's name (which, for attributes set by the
HTML parser or by Element.setAttributeNode()
or Element.setAttribute()
, will be lowercase), a U+003D
EQUALS SIGN ( =
) character, a U+0022
QUOTATION MARK ( "
) character, the
attribute's value, escaped as described below , in
attribute mode , and a second
U+0022 QUOTATION MARK ( "
) character.
While the exact order of attributes is UA-defined, and may
depend on factors such as the order that the attributes were given
in the original markup, the sort order must be stable, such that
consecutive invocations of this algorithm serialise serialize an
element's attributes in the same order.
Append a U+003E GREATER-THAN SIGN ( >
)
character.
If the child current node is an
area
, base
, basefont
,
bgsound
, br
,
col
, embed
, frame
, hr
, img
,
input
, link
,
meta
, param
, spacer
, or
wbr
element, then continue on to the next child node
at this point.
If the child current node is a
pre
or
textarea
, or
listing
element, append a
U+000A LINE FEED (LF) character.
Append the value of running the HTML
fragment serialisation serialization algorithm on the child current
node element (thus recursing into this algorithm for
that element), followed by a U+003C LESS-THAN SIGN ( <
) character, a U+002F SOLIDUS ( /
) character, the element's tag name again, and finally
a U+003E GREATER-THAN SIGN ( >
)
character.
Text
or CDATASection
nodeIf one of the ancestors of the child
current node is a
style
, script
, xmp
, iframe
, noembed
,
noframes
, noscript
, or plaintext
element, then append the value of the
child current node node's 's data
DOM attribute literally.
Otherwise, append the value of the
child current node node's 's data
DOM attribute, escaped as described below .
Comment
Append the literal string <!--
(U+003C LESS-THAN
SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS, U+002D
HYPHEN-MINUS), followed by the value of the child current
node node's 's data
DOM attribute,
followed by the literal string -->
(U+002D
HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN SIGN).
ProcessingInstruction
Append the child literal string
<?
(U+003C LESS-THAN SIGN, U+003F QUESTION MARK), followed
by the value of current node 's
target
DOM attribute, followed by a single U+0020
SPACE character, followed by the value of current node 's data
DOM attribute,
followed by a single U+003E GREATER-THAN SIGN character ('>
').
DocumentType
Append the literal string <!DOCTYPE
(U+003C
LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL
LETTER D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL
LETTER C, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL
LETTER Y, U+0050 LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL
LETTER E), followed by a space (U+0020 SPACE), followed by the
value of the child current
node node's 's name
DOM attribute,
followed by the literal string >
(U+003E
GREATER-THAN SIGN).
Other nodes node types (e.g. Attr
)
cannot occur as children of elements. If If, despite this,
they do, somehow
do occur, this algorithm must raise an
INVALID_STATE_ERR
exception.
The result of the algorithm is the string s .
Escaping a string (for the
purposes of the algorithm above) consists of replacing any
occurances occurrences of the " &
"
character by the string " &
", any
occurances occurrences of the " <
"
character by the string " <
", any
occurances occurrences of the " >
"
character by the string " >
",
and any occurances occurrences of
the U+00A0 NO-BREAK SPACE character by the string "
", and, if the algorithm was invoked in
the attribute mode
,any occurrences of the " "
" character by the string " "
".
Entity reference nodes are assumed to be expanded by the user agent, and are therefore not covered in the algorithm above.
It is possible that the output of this algorithm,
if parsed with an HTML parser , will not
return the original tree structure. For instance, if a
textarea
element to which a Comment
node has been appended is serialised serialized
and the output is then reparsed, the comment will end up being
displayed in the text field. Similarly, if, as a result of DOM
manipulation, an element contains a comment that contains the
literal string " -->
", then when the
result of serialising serializing the element is parsed, the comment
will be truncated at that point and the rest of the comment will be
interpreted as markup. More examples would be making a
script
element contain a text
node with the text string " </script>
", or
having a p
element that contains a
ul
element (as the ul
element's start
tag would imply the end tag for the p
).
The following steps form the HTML
fragment parsing algorithm . The algorithm takes as input a
DOM Element
, referred to as the context , element, which
gives the context for the parser, as well as input , a string to parse, and returns a list of zero or
more nodes.
Parts marked fragment case in algorithms in the parser section are parts that only occur if the parser was created for the purposes of this algorithm. The algorithms have been annotated with such markings for informational purposes only; such markings have no normative weight. If it is possible for a condition described as a fragment case to occur even when the parser wasn't created for the purposes of handling this algorithm, then that is an error in the specification.
Create a new Document
node, and mark it as being an
HTML document .
Create a new HTML parser , and associate
it with the just created Document
node.
Set the HTML parser 's tokenisation stage's content model flag according to the context element, as follows:
title
or
textarea
elementstyle
,
script
, xmp
,
iframe
, noembed
,
or noframes
elementnoscript
elementplaintext
elementSwitch the HTML parser 's tree
construction stage to the main phase . Let root be a new html
element with no attributes.
Append the element root to the
Document
node created above.
Set up the parser's stack of open elements so that it contains just the single element root .
Reset the parser's insertion mode appropriately .
The parser will reference the context node element as part of that algorithm.
Set the parser's form
element pointer to the nearest node to the context element that is a
form
element (going straight up the ancestor chain,
and including the element itself, if it is a form
element), or, if there is no such form
element, to
null.
Place into the input stream for the HTML parser just created the input .
Start the parser and let it run until it has consumed all the characters just inserted into the input stream.
Return all the child nodes of root , preserving the document order.