This section describes the HTML syntax in detail. In places, it also notes differences between the the HTML syntax and the XML syntax, but it does not describe the XML syntax in detail (the XML syntax is instead defined by rules in the XML specification [XML] and in the Namespaces in XML 1.0 specification [XMLNS]).
This section is divided into the following parts:
A doctype (sometimes capitalized as “DOCTYPE”) is an special instruction which, for legacy reasons that have to do with processing modes in browsers, is a required part of any document in the HTML syntax; it must match the characteristics of one of the following three formats:
A normal doctype consists of the following parts, in exactly the following order:
<!DOCTYPE".HTML".>"
      character.The following is an example of a conformant normal doctype.
<!DOCTYPE html>
A deprecated doctype consists of the following parts, in exactly the following order:
<!DOCTYPE".HTML".PUBLIC".""
      character or a
      "'"
      character.""
      character or a
      "'"
      character).""
          character or a
          "'"
          character.""
          character or a
          "'"
          character).>"
      character.A permitted-public-ID-system-ID-combination is any combination of a public ID (the first quoted string in the doctype) and system ID (the second quoted string, if any, in the doctype) such that the combination corresponds to one of the six deprecated doctypes in the following list of deprecated doctypes:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN"> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN"> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
A legacy-tool-compatible doctype consists of the following parts, in exactly the following order:
<!DOCTYPE".HTML".SYSTEM".""
      character or a
      "'"
      character.about:legacy-compat".""
      character or a
      "'"
      character).>"
      character.The following is examples of a conformant legacy-tool-compatible doctype.
<!doctype HTML system "about:legacy-compat">
A character encoding declaration is a mechanism for specifying the character encoding used to store or transmit a document.
The following restrictions apply to character encoding declarations:
If the document does not start with a
    U+FEFF BYTE ORDER MARK (BOM) character, and if its
    encoding is not explicitly given by a
    Content-Type HTTP header, then the character
    encoding used
    must
    be an
    ASCII-compatible character encoding,
    and, in addition, if that encoding isn't US-ASCII itself, then
    the encoding
    must
    be specified using a
    meta element with a
    charset
    attribute or a meta element
    in the
    encoding declaration
      state.
If the document contains a meta
    element with a
    charset
    attribute or a meta element in the
    encoding declaration state,
    then the character encoding used
    must
    be an
    ASCII-compatible character encoding.
An ASCII-compatible character encoding is one that is a superset of US-ASCII (specifically, ANSI_X3.4-1968) for bytes in the set 0x09, 0x0A, 0x0C, 0x0D, 0x20 - 0x22, 0x26, 0x27, 0x2C - 0x3F, 0x41 - 0x5A, and 0x61 - 0x7A.
Documents should not use UTF-32, JIS_C6226-1983, JIS_X0212-1990, HZ-GB-2312, JOHAB (Windows code page 1361), encodings based on ISO-2022, or encodings based on EBCDIC.
Documents must not use CESU-8, UTF-7, BOCU-1, or SCSU encodings.
In a document in the XML syntax, the XML declaration, as defined in the XML specification [XML] should be used to provide character-encoding information, if necessary.
An element’s content model defines the element’s structure: What contents (if any) the element can contain, as well as what attributes (if any) the element can have. The HTML elements section of this reference describes the content models for all of elements that are part of the HTML language. An element must not contain contents or attributes that are not part of its content model.
The contents of an element are any elements, character data, and comments that it contains. Attributes and their values are not considered to be the “contents” of an element.
A void element is an element whose content model never allows it to have contents under any circumstances. Void elements can have attributes.
The following is a complete list of the void elements in HTML:
The following list describes syntax rules for the the HTML syntax. Rules for the the XML syntax are defined in the XML specification [XML].
0–9,
      a–z,
      and
      A–Z.<"
        character./"
        character, which may be present only if the element is a
        void element.>"
        character.<"
        character./"
        character>"
        character.If an element has both a start tag and an end tag, its end tag must be contained within the contents of the same element in which its start tag is contained. An end tag that is not contained within the same contents as its start tag is said to be a misnested tag.
In the following example, the
        "</i>"
        end tag
        is a
        misnested tag,
        because it is not contained
        within the 
        contents
        of the
        b
        element that contains its corresponding
        "<i>"
        start tag.
<b>foo <i>bar</b> baz</i>
Attributes for an element are expressed inside the element’s start tag. Attributes have a name and a value.
There must never be two or more attributes on the same start tag whose names are a case-insensitive match for each other.
The following list describes syntax rules for attributes in documents in the HTML syntax. Syntax rules for attributes in documents in the XML syntax. are defined in the XML specification [XML].
"",
      "'",
      ">",
      "/",
      "=",
      the control characters,
      and any characters that are not defined by Unicode.Name production defined in
      the XML specification [XML]
      and that contain no
      ":"
      characters, and whose first three characters are not a
      case-insensitive match
      for the string "xml".In the the HTML syntax, attributes can be specified in four different ways:
Certain attributes may be specified by providing just the attribute name, with no value.
In the following example, the
            disabled
            attribute is given with the empty attribute
            syntax:
<input disabled>
An unquoted attribute value is specified by providing the following parts in exactly the following order:
="
            characterIn addition to the general requirements given above for attribute values, an unquoted attribute value has the following restrictions:
"",
            "'",
            ">",
            "=",
            charactersIn the following example, the
            value
            attribute is given with the unquoted attribute value
            syntax:
<input value=yes>
If the value of an attribute using the unquoted
            attribute syntax is followed by a
            "/"
            character, then there
            must
            be at least one
            space character
            after the value and before the
            "/"
            character.
A single-quoted attribute value is specified by providing the following parts in exactly the following order:
="
            character'"
            character'"
            character.In addition to the general requirements given above for attribute values, a single-quoted attribute value has the following restriction:
'"
            charactersIn the following example, the
            type attribute
            is given with the single-quoted attribute value
            syntax:
<input type='checkbox'>
A double-quoted attribute value is specified by providing the following parts in exactly the following order:
="
            character""
            character""
            characterIn addition to the general requirements given above for attribute values, a double-quoted attribute value has the following restriction:
""
            charactersIn the following example, the
            title attribute is
            given with the double-quoted attribute value syntax:
<code title="U+003C LESS-THAN SIGN"><</code>
Text in element contents (including in comments) and attribute values must consist of Unicode characters, with the following restrictions:
Character data contains text, in some cases in combination with character references, along with certain additional restrictions. There are three types of character data that can occur in documents:
Certain elements and strings in the values of particular attributes contain normal character data. Normal character data can contain the following:
Normal character data has the following restrictions:
<"
          charactersIn documents in the HTML syntax, the title and textarea elements can contain replaceable character data. Replaceable character data can contain the following:
<"
          characters
          
          
          Replaceable character data has the following restrictions:
</"
          followed by characters that are a
          case-insensitive match
          for the tag name of the element containing the
          replaceable character data (for example,
          "</title" or
          "</textarea"),
          followed by a
          space character,
          ">",
          or
          "/".Replaceable character data, as described in this reference, is a feature of the HTML syntax that is not available in the XML syntax. Documents in the XML syntax must not contain replaceable character data as described in this reference; instead they must conform to all syntax constraints described in the XML specification [XML].
In documents in the HTML syntax, the script, and style elements can contain non-replaceable character data. Non-replaceable character data can contain the following:
<"
          characters
          
          
          Non-replaceable character data has the following restrictions:
</",
          followed by characters that are a
          case-insensitive match
          for the tag name of the element containing the
          replaceable character data (for example,
          "</script"
          or
          "</style",
          followed by a
          space character,
          ">",
          or
          "/".Non-replaceable character data, as described in this reference, is a feature of the HTML syntax that is not available in the XML syntax. Documents in the XML syntax must not contain non-replaceable character data as described in this reference; instead they must conform to all syntax constraints defined in the XML specification [XML].
Character references are a form of markup for representing single individual characters. There are three types of character references:
Named character references consist of the following parts in exactly the following order:
&"
          character.;"
          character.The following is an example of a named character
          reference for the character
          "†"
          (U+2020 DAGGER).
†
Decimal numerical character references consist of the following parts, in exactly the following order.
&"
          character.#"
          character.0–9,
          representing a base-ten integer that itself is a Unicode
          code point that is not
          U+0000,
          U+000D,
          in the range U+0080–U+009F,
          or in the range 0xD8000–0xDFFF (surrogates).;"
          character.The following is an example of a decimal numeric
          character reference for the character
          "†"
          (U+2020 DAGGER).
†
Hexadecimal numeric character references consist of the following parts, in exactly the following order.
&"
          character.#"
          character.x"
          character
          or a
          "X"
          character.0–9,
          a–f,
          and
          A–F,
          representing a base-sixteen integer that itself is a
          Unicode code point that is not
          U+0000,
          U+000D,
          in the range U+0080–U+009F,
          or in the range 0xD800–0xDFFF (surrogates).;"
          character.The following is an example of a hexadecimal numeric
          character reference for the character
          "†"
          (U+2020 DAGGER).
†
Character references are not themselves text, and no part of a character reference is text.
An
    ambiguous ampersand
    is an
    "&"
    character that is followed by some
    text
    other than a
    space character,
    a
    "<",
    character, or another
    "&"
    character.
SVG and MathML elements are elements from the SVG and MathML namespaces. SVG and MathML elements can be used both in documents in the HTML syntax and in documents in the XML syntax. Syntax rules for SVG and MathML elements in documents in the XML syntax are defined in the XML specification [XML]. The following list describes additional syntax rules that specifically apply to SVG and MathML elements in documents in the HTML syntax.
/"
     character before the closing
     ">"
     character are said to be
     marked as self-closing.CDATA sections in SVG and MathML contents in documents in the HTML syntax consist of the following parts, in exactly the following order:
<![CDATA["]]>“]]>"CDATA sections are allowed only in the contents of elements from the SVG and MathML namespaces.
The following shows an example of a CDATA section.
<annotation encoding="text/latex">
  <![CDATA[\documentclass{article}
  \begin{document}
  \title{E}
  \maketitle
  The base of the natural logarithms, approximately 2.71828.
  \end{document}]]>
</annotation>
   
4.07. Comments # T
Comments consist of the following parts, in exactly the following order:
<!--"-->"The text part of comments has the following restrictions:
>" character->"--"-" characterThe following is an example of a comment.