Once the user agent stops parsing the document, the user agent must run the following steps:
Set the current document readiness to "interactive" and the insertion point to undefined.
Pop all the nodes off the stack of open elements.
If the list of scripts that will execute when the document has finished parsing is not empty, run these substeps:
Spin the event loop until the first
     script in the list of scripts that will
     execute when the document has finished parsing has its
     "ready to be parser-executed" flag set and
     the parser's Document has no style sheet that
     is blocking scripts.
Execute the
     first script in the list of scripts that will
     execute when the document has finished parsing.
Remove the first script element from the
     list of scripts that will execute when the document has
     finished parsing (i.e. shift out the first entry in the
     list).
If the list of scripts that will execute when the document has finished parsing is still not empty, repeat these substeps again from substep 1.
Queue a task to fire a simple
   event that bubbles named DOMContentLoaded at the
   Document.
Spin the event loop until the set of scripts that will execute as soon as possible and the list of scripts that will execute in order as soon as possible are empty.
Spin the event loop until there is nothing that
   delays the load event in
   the Document.
Queue a task to set the current document readiness to "complete".
If the Document is in a browsing
   context, then queue a task to fire a
   simple event named load at
   the Document's Window object, but with
   its target set to the
   Document object (and the currentTarget set to the
   Window object).
If the Document is in a browsing
   context, then queue a task to fire a pageshow event at the
   Window object of the Document, but with
   its target set to the
   Document object (and the currentTarget set to the
   Window object), using the
   PageTransitionEvent interface, with the persisted
   attribute set to false. This event must not bubble, must not be
   cancelable, and has no default action.
If the Document has any pending
   application cache download process tasks, then queue each such task in the order they were added to
   the list of pending application cache download process
   tasks, and then empty the list of pending application
   cache download process tasks. The task source
   for these tasks is the
   networking task source.
The Document is now ready for post-load
   tasks.
Queue a task to mark the Document
   as completely loaded.
When the user agent is to abort a parser, it must run the following steps:
Throw away any pending content in the input stream, and discard any future content that would have been added to it.
Pop all the nodes off the stack of open elements.
Except where otherwise specified, the task source for the tasks mentioned in this section is the DOM manipulation task source.
When an application uses an HTML parser in
  conjunction with an XML pipeline, it is possible that the
  constructed DOM is not compatible with the XML tool chain in certain
  subtle ways. For example, an XML toolchain might not be able to
  represent attributes with the name xmlns,
  since they conflict with the Namespaces in XML syntax. There is also
  some data that the HTML parser generates that isn't
  included in the DOM itself. This section specifies some rules for
  handling these issues.
If the XML API being used doesn't support DOCTYPEs, the tool may drop DOCTYPEs altogether.
If the XML API doesn't support attributes in no namespace that
  are named "xmlns", attributes whose names
  start with "xmlns:", or attributes in the
  XMLNS namespace, then the tool may drop such
  attributes.
The tool may annotate the output with any namespace declarations required for proper operation.
If the XML API being used restricts the allowable characters in the local names of elements and attributes, then the tool may map all element and attribute local names that the API wouldn't support to a set of names that are allowed, by replacing any character that isn't supported with the uppercase letter U and the six digits of the character's Unicode code point when expressed in hexadecimal, using digits 0-9 and capital letters A-F as the symbols, in increasing numeric order.
For example, the element name foo<bar, which can be output by the HTML
  parser, though it is neither a legal HTML element name nor a
  well-formed XML element name, would be converted into fooU00003Cbar, which is a well-formed XML
  element name (though it's still not legal in HTML by any means).
As another example, consider the attribute
  xlink:href. Used on a MathML element, it becomes, after
  being adjusted, an
  attribute with a prefix "xlink" and a local
  name "href". However, used on an HTML element,
  it becomes an attribute with no prefix and the local name "xlink:href", which is not a valid NCName, and thus
  might not be accepted by an XML API. It could thus get converted,
  becoming "xlinkU00003Ahref".
The resulting names from this conversion conveniently can't clash with any attribute generated by the HTML parser, since those are all either lowercase or those listed in the adjust foreign attributes algorithm's table.
If the XML API restricts comments from having two consecutive U+002D HYPHEN-MINUS characters (--), the tool may insert a single U+0020 SPACE character between any such offending characters.
If the XML API restricts comments from ending in a U+002D HYPHEN-MINUS character (-), the tool may insert a single U+0020 SPACE character at the end of such comments.
If the XML API restricts allowed characters in character data, attribute values, or comments, the tool may replace any U+000C FORM FEED (FF) character with a U+0020 SPACE character, and any other literal non-XML character with a U+FFFD REPLACEMENT CHARACTER.
If the tool has no way to convey out-of-band information, then the tool may drop the following information:
form element ancestor (use of the
   form element pointer in the parser)The mutations allowed by this section apply
  after the HTML parser's rules have been
  applied. For example, a <a::> start tag
  will be closed by a </a::> end tag, and
  never by a </aU00003AU00003A> end tag, even
  if the user agent is using the rules above to then generate an
  actual element in the DOM with the name aU00003AU00003A for that start tag.
This section is non-normative.
This section examines some erroneous markup and discusses how the HTML parser handles these cases.
This section is non-normative.
The most-often discussed example of erroneous markup is as follows:
<p>1<b>2<i>3</b>4</i>5</p>
The parsing of this markup is straightforward up to the "3". At this point, the DOM looks like this:
Here, the stack of open elements has five elements
  on it: html, body, p,
  b, and i. The list of active
  formatting elements just has two: b and
  i. The insertion mode is "in body".
Upon receiving the end tag token with the tag name "b", the "adoption agency algorithm" is
  invoked. This is a simple case, in that the formatting
  element is the b element, and there is no
  furthest block. Thus, the stack of open
  elements ends up with just three elements: html,
  body, and p, while the list of
  active formatting elements has just one: i. The
  DOM tree is unmodified at this point.
The next token is a character ("4"), triggers the reconstruction of
  the active formatting elements, in this case just the
  i element. A new i element is thus created
  for the "4" text node. After the end tag token for the "i" is also
  received, and the "5" text node is inserted, the DOM looks as
  follows:
This section is non-normative.
A case similar to the previous one is the following:
<b>1<p>2</b>3</p>
Up to the "2" the parsing here is straightforward:
The interesting part is when the end tag token with the tag name "b" is parsed.
Before that token is seen, the stack of open
  elements has four elements on it: html,
  body, b, and p. The
  list of active formatting elements just has the one:
  b. The insertion mode is "in body".
Upon receiving the end tag token with the tag name "b", the "adoption agency algorithm" is invoked, as
  in the previous example. However, in this case, there is a
  furthest block, namely the p element. Thus,
  this time the adoption agency algorithm isn't skipped over.
The common ancestor is the body
  element. A conceptual "bookmark" marks the position of the
  b in the list of active formatting
  elements, but since that list has only one element in it,
  the bookmark won't have much effect.
As the algorithm progresses, node ends up set
  to the formatting element (b), and last
  node ends up set to the furthest block
  (p).
The last node gets appended (moved) to the common ancestor, so that the DOM looks like:
A new b element is created, and the children of the
  p element are moved to it:
b#text: 2Finally, the new b element is appended to the
  p element, so that the DOM looks like:
The b element is removed from the list of
  active formatting elements and the stack of open
  elements, so that when the "3" is parsed, it is appended to
  the p element:
This section is non-normative.
Error handling in tables is, for historical reasons, especially strange. For example, consider the following markup:
<table><b><tr><td>aaa</td></tr>bbb</table>ccc
The highlighted b element start tag is not allowed
  directly inside a table like that, and the parser handles this case
  by placing the element before the table. (This is called foster parenting.) This can be seen by
  examining the DOM tree as it stands just after the
  table element's start tag has been seen:
...and then immediately after the b element start
  tag has been seen:
At this point, the stack of open elements has on it
  the elements html, body,
  table, and b (in that order, despite the
  resulting DOM tree); the list of active formatting
  elements just has the b element in it; and the
  insertion mode is "in table".
The tr start tag causes the b element
  to be popped off the stack and a tbody start tag to be
  implied; the tbody and tr elements are
  then handled in a rather straight-forward manner, taking the parser
  through the "in table
  body" and "in
  row" insertion modes, after which the DOM looks as
  follows:
Here, the stack of open elements has on it the
  elements html, body, table,
  tbody, and tr; the list of active
  formatting elements still has the b element in
  it; and the insertion mode is "in row".
The td element start tag token, after putting a
  td element on the tree, puts a marker on the list
  of active formatting elements (it also switches to the "in cell" insertion
  mode).
The marker means that when the "aaa" character tokens are seen,
  no b element is created to hold the resulting text
  node:
The end tags are handled in a straight-forward manner; after
  handling them, the stack of open elements has on it the
  elements html, body, table,
  and tbody; the list of active formatting
  elements still has the b element in it (the
  marker having been removed by the "td" end tag token); and the
  insertion mode is "in table body".
Thus it is that the "bbb" character tokens are found. These
  trigger the "in table
  text" insertion mode to be used (with the original
  insertion mode set to "in table body"). The character tokens are collected,
  and when the next token (the table element end tag) is
  seen, they are processed as a group. Since they are not all spaces,
  they are handled as per the "anything else" rules in the "in table" insertion mode,
  which defer to the "in
  body" insertion mode but with foster parenting.
When the
  active formatting elements are reconstructed, a
  b element is created and foster parented, and then the "bbb" text node is
  appended to it:
The stack of open elements has on it the elements
  html, body, table,
  tbody, and the new b (again, note that
  this doesn't match the resulting tree!); the list of active
  formatting elements has the new b element in it;
  and the insertion mode is still "in table body".
Had the character tokens been only space characters instead of "bbb", then those
  space characters would just be
  appended to the tbody element.
Finally, the table is closed by a "table" end
  tag. This pops all the nodes from the stack of open
  elements up to and including the table element,
  but it doesn't affect the list of active formatting
  elements, so the "ccc" character tokens after the table
  result in yet another b element being created, this
  time after the table:
This section is non-normative.
Consider the following markup, which for this example we will
  assume is the document with URL http://example.com/inner, being rendered as the
  content of an iframe in another document with the
  URL http://example.com/outer:
<div id=a>
 <script>
  var div = document.getElementById('a');
  parent.document.body.appendChild(div);
 </script>
 <script>
  alert(document.URL);
 </script>
</div>
<script>
 alert(document.URL);
</script>
  Up to the first "script" end tag, before the script is parsed, the result is relatively straightforward:
After the script is parsed, though, the div element
  and its child script element are gone:
They are, at this point, in the Document of the
  aforementioned outer browsing context. However, the
  stack of open elements still contains the
  div element.
Thus, when the second script element is parsed, it
  is inserted into the outer Document
  object.
This also means that the script's global object is
  the outer browsing context's Window
  object, not the Window object inside the
  iframe.
This isn't a security problem since the script that
  moves the div into the outer Document can
  only do so because the two Document object have the
  same origin.
Thus, the first alert says "http://example.com/outer".
Once the div element's end tag is parsed, the
  div element is popped off the stack, and so the next
  script element is in the inner Document:
This second alert will say "http://example.com/inner".
This section is non-normative.
Elaborating on the example in the previous section, consider a
  case where a script element with a src attribute is parsed, but while
  the external script is being downloaded, the element is moved to
  another document.
In this case, the script's global object is that
  second document's browsing context's
  Window object, not the Window object of
  the document into which the element was parsed.
This section is non-normative.
The following markup shows how nested formatting elements (such
  as b) get collected and continue to be applied even as
  the elements they are contained in are closed, but that excessive
  duplicates are thrown away.
<!DOCTYPE html> <p><b class=x><b class=x><b><b class=x><b class=x><b>X <p>X <p><b><b class=x><b>X <p></b></b></b></b></b></b>X
The resulting DOM tree is as follows:
Note how the second p element in the markup has no
  explicit b elements, but in the resulting DOM, up to
  three of each kind of formatting element (in this case three
  b elements with the class attribute, and two unadorned
  b elements) get reconstructed before the element's
  "X".
Also note how this means that in the final paragraph only six
  b end tags are needed to completely clear the list of
  formatting elements, even though nine b start tags have
  been seen up to this point.
The following steps form the HTML fragment serialization
  algorithm. The algorithm takes as input a DOM
  Element, Document, or
  DocumentFragment referred to as the
  node, and either returns a string or raises an exception.
This algorithm serializes the children of the node being serialized, not the node itself.
Let s be a string, and initialize it to the empty string.
For each child node of the node, in tree order, run the following steps:
Let current node be the child node being processed.
Append the appropriate string from the following list to s:
ElementIf current node is an element in the HTML namespace, the MathML namespace, or the SVG namespace, then let tagname be current node's local name. Otherwise, let tagname be current node's qualified name.
Append a U+003C LESS-THAN SIGN character (<), followed by tagname.
For HTML elements created by the
        HTML parser or Document.createElement(), tagname will be lowercase.
For each attribute that the element has, append a U+0020 SPACE character, the attribute's serialized name as described below, a U+003D EQUALS SIGN character (=), a U+0022 QUOTATION MARK character ("), the attribute's value, escaped as described below in attribute mode, and a second U+0022 QUOTATION MARK character (").
An attribute's serialized name for the purposes of the previous paragraph must be determined as follows:
The attribute's serialized name is the attribute's local name.
For attributes on HTML elements
          set by the HTML parser or by Element.setAttributeNode() or Element.setAttribute(), the local name will
          be lowercase.
The attribute's serialized name is the string "xml:" followed by the attribute's local
         name.
xmlnsThe attribute's serialized name is the string "xmlns".
xmlnsThe attribute's serialized name is the string "xmlns:" followed by the attribute's local
         name.
The attribute's serialized name is the string "xlink:" followed by the attribute's local
         name.
The attribute's serialized name is the attribute's qualified name.
While the exact order of attributes is UA-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialize an element's attributes in the same order.
Append a U+003E GREATER-THAN SIGN character (>).
If current node is an
        area, base, basefont,
        bgsound, br, col,
        command, embed, frame,
        hr, img, input,
        keygen, link, meta,
        param, source, track or
        wbr element, then continue on to the next child
        node at this point.
If current node is a pre,
        textarea, or listing element, append
        a U+000A LINE FEED (LF) character.
Append the value of running the HTML fragment serialization algorithm on the current node element (thus recursing into this algorithm for that element), followed by a U+003C LESS-THAN SIGN character (<), a U+002F SOLIDUS character (/), tagname again, and finally a U+003E GREATER-THAN SIGN character (>).
Text or CDATASection
       nodeIf the parent of current node is a
        style, script, xmp,
        iframe, noembed,
        noframes, or plaintext element, or
        if the parent of current node is
        noscript element and scripting is enabled for the
        node, then append the value of current
        node's data IDL attribute
        literally.
Otherwise, append the value of current
        node's data IDL attribute, escaped as described
        below.
CommentAppend the literal string <!-- (U+003C
        LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+002D HYPHEN-MINUS,
        U+002D HYPHEN-MINUS), followed by the value of current node's data IDL
        attribute, followed by the literal string -->
        (U+002D HYPHEN-MINUS, U+002D HYPHEN-MINUS, U+003E GREATER-THAN
        SIGN).
ProcessingInstructionAppend the literal string <? (U+003C
        LESS-THAN SIGN, U+003F QUESTION MARK), followed by the value
        of current node's target IDL attribute, followed by a single
        U+0020 SPACE character, followed by the value of current node's data IDL
        attribute, followed by a single U+003E GREATER-THAN SIGN
        character (>).
DocumentTypeAppend the literal string <!DOCTYPE (U+003C
        LESS-THAN SIGN, U+0021 EXCLAMATION MARK, U+0044 LATIN CAPITAL
        LETTER D, U+004F LATIN CAPITAL LETTER O, U+0043 LATIN CAPITAL
        LETTER C, U+0054 LATIN CAPITAL LETTER T, U+0059 LATIN CAPITAL
        LETTER Y, U+0050 LATIN CAPITAL LETTER P, U+0045 LATIN CAPITAL
        LETTER E), followed by a space (U+0020 SPACE), followed by the
        value of current node's name IDL attribute, followed by the literal
        string > (U+003E GREATER-THAN SIGN).
Other node types (e.g. Attr) cannot
      occur as children of elements. If, despite this, they somehow do
      occur, this algorithm must raise an
      INVALID_STATE_ERR exception.
The result of the algorithm is the string s.
Entity reference nodes are assumed to be expanded by the user agent, and are therefore not covered in the algorithm above.
It is possible that the output of this algorithm, if parsed with an HTML parser, will not return the original tree structure.
For instance, if a textarea element to which a
   Comment node has been appended is serialized
   and the output is then reparsed, the comment will end up being
   displayed in the text field. Similarly, if, as a result of DOM
   manipulation, an element contains a comment that contains the
   literal string "-->", then when the result
   of serializing the element is parsed, the comment will be truncated
   at that point and the rest of the comment will be interpreted as
   markup. More examples would be making a script element
   contain a text node with the text string
   "</script>", or having a p element
   that contains a ul element (as the ul
   element's start tag would
   imply the end tag for the p).
This can enable cross-site scripting attacks. An example of this
   would be a page that lets the user enter some font names that are
   then inserted into a CSS style block via the DOM and
   which then uses the innerHTML
   IDL attribute to get the HTML serialization of that
   style element: if the user enters
   "</style><script>attack</script>" as a font
   name, innerHTML will return
   markup that, if parsed in a different context, would contain a
   script node, even though no script node
   existed in the original DOM.
Escaping a string (for the purposes of the algorithm above) consists of running the following steps:
Replace any occurrence of the "&"
   character by the string "&".
Replace any occurrences of the U+00A0 NO-BREAK SPACE
   character by the string " ".
If the algorithm was invoked in the attribute mode,
   replace any occurrences of the """
   character by the string """.
If the algorithm was not invoked in the
   attribute mode, replace any occurrences of the "<" character by the string "<", and any occurrences of the ">" character by the string ">".
The following steps form the HTML fragment parsing
  algorithm. The algorithm optionally takes as input an
  Element node, referred to as the context element, which gives the context for the
  parser, as well as input, a string to parse, and
  returns a list of zero or more nodes.
Parts marked fragment case in algorithms in the parser section are parts that only occur if the parser was created for the purposes of this algorithm (and with a context element). The algorithms have been annotated with such markings for informational purposes only; such markings have no normative weight. If it is possible for a condition described as a fragment case to occur even when the parser wasn't created for the purposes of handling this algorithm, then that is an error in the specification.
Create a new Document node, and mark it as being
    an HTML document.
If there is a context element, and the
    Document of the context element
    is in quirks mode, then let the Document
    be in quirks mode. Otherwise, if there is a context element, and the Document of
    the context element is in limited-quirks
    mode, then let the Document be in
    limited-quirks mode. Otherwise, leave the
    Document in no-quirks mode.
Create a new HTML parser, and associate it with
    the just created Document node.
If there is a context element, run these substeps:
Set the state of the HTML parser's tokenization stage as follows:
title or textarea
       elementstyle, xmp,
       iframe, noembed, or
       noframes elementscript elementnoscript elementplaintext elementFor performance reasons, an implementation that does not report errors and that uses the actual state machine described in this specification directly could use the PLAINTEXT state instead of the RAWTEXT and script data states where those are mentioned in the list above. Except for rules regarding parse errors, they are equivalent, since there is no appropriate end tag token in the fragment case, yet they involve far fewer state transitions.
Let root be a new html element
      with no attributes.
Append the element root to the
      Document node created above.
Set up the parser's stack of open elements so that it contains just the single element root.
Reset the parser's insertion mode appropriately.
The parser will reference the context element as part of that algorithm.
Set the parser's form element pointer
      to the nearest node to the context element
      that is a form element (going straight up the
      ancestor chain, and including the element itself, if it is a
      form element), or, if there is no such
      form element, to null.
Place into the input stream for the HTML parser just created the input. The encoding confidence is irrelevant.
Start the parser and let it run until it has consumed all the characters just inserted into the input stream.
If there is a context element, return the child nodes of root, in tree order.
Otherwise, return the children of the Document
    object, in tree order.