Table of contents
      1. 3.2.5 Content models
        1. 3.2.5.1 Kinds of content
          1. 3.2.5.1.1 Metadata content
          2. 3.2.5.1.2 Flow content
          3. 3.2.5.1.3 Sectioning content
          4. 3.2.5.1.4 Heading content
          5. 3.2.5.1.5 Phrasing content
          6. 3.2.5.1.6 Embedded content
          7. 3.2.5.1.7 Interactive content
        2. 3.2.5.2 Transparent content models
        3. 3.2.5.3 Paragraphs
      2. 3.2.6 Requirements relating to bidirectional-algorithm formatting characters
      3. 3.2.7 Annotations for assistive technology products (ARIA)

3.2.5 Content models

Each element defined in this specification has a content model: a description of the element's expected contents. An HTML element must have contents that match the requirements described in the element's content model.

As noted in the conformance and terminology sections, for the purposes of determining if an element matches its content model or not, CDATASection nodes in the DOM are treated as equivalent to Text nodes, and entity reference nodes are treated as if they were expanded in place.

The space characters are always allowed between elements. User agents represent these characters between elements in the source markup as text nodes in the DOM. Empty text nodes and text nodes consisting of just sequences of those characters are considered inter-element whitespace.

Inter-element whitespace, comment nodes, and processing instruction nodes must be ignored when establishing whether an element's contents match the element's content model or not, and must be ignored when following algorithms that define document and element semantics.

Thus, an element A is said to be preceded or followed by a second element B if A and B have the same parent node and there are no other element nodes or text nodes (other than inter-element whitespace) between them. Similarly, a node is the only child of an element if that element contains no other nodes other than inter-element whitespace, comment nodes, and processing instruction nodes.

Authors must not use HTML elements anywhere except where they are explicitly allowed, as defined for each element, or as explicitly required by other specifications. For XML compound documents, these contexts could be inside elements from other namespaces, if those elements are defined as providing the relevant contexts.

For example, the Atom specification defines a content element. When its type attribute has the value xhtml, the Atom specification requires that it contain a single HTML div element. Thus, a div element is allowed in that context, even though this is not explicitly normatively stated by this specification. [ATOM]

In addition, HTML elements may be orphan nodes (i.e. without a parent node).

For example, creating a td element and storing it in a global variable in a script is conforming, even though td elements are otherwise only supposed to be used inside tr elements.

var data = {
  name: "Banana",
  cell: document.createElement('td'),
};
3.2.5.1 Kinds of content

Each element in HTML falls into zero or more categories that group elements with similar characteristics together. The following broad categories are used in this specification:

Some elements also fall into other categories, which are defined in other parts of this specification.

These categories are related as follows:

Sectioning content, heading content, phrasing content, and
  embedded content are all types of flow content. Embedded content is
  also a type of phrasing content.

In addition, certain elements are categorized as form-associated elements and further subcategorized to define their role in various form-related processing models.

Some elements have unique requirements and do not fit into any particular category.

3.2.5.1.1 Metadata content

Metadata content is content that sets up the presentation or behavior of the rest of the content, or that sets up the relationship of the document with other documents, or that conveys other "out of band" information.

Elements from other namespaces whose semantics are primarily metadata-related (e.g. RDF) are also metadata content.

Thus, in the XML serialization, one can use RDF, like this:

<html xmlns="http://www.w3.org/1999/xhtml"
      xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
 <head>
  <title>Hedral's Home Page</title>
  <r:RDF>
   <Person xmlns="http://www.w3.org/2000/10/swap/pim/contact#"
           r:about="http://hedral.example.com/#">
    <fullName>Cat Hedral</fullName>
    <mailbox r:resource="mailto:hedral@damowmow.com"/>
    <personalTitle>Sir</personalTitle>
   </Person>
  </r:RDF>
 </head>
 <body>
  <h1>My home page</h1>
  <p>I like playing with string, I guess. Sister says squirrels are fun
  too so sometimes I follow her to play with them.</p>
 </body>
</html>

This isn't possible in the HTML serialization, however.

3.2.5.1.2 Flow content

Most elements that are used in the body of documents and applications are categorized as flow content.

As a general rule, elements whose content model allows any flow content should have either at least one descendant text node that is not inter-element whitespace, or at least one descendant element node that is embedded content. For the purposes of this requirement, del elements and their descendants must not be counted as contributing to the ancestors of the del element.

This requirement is not a hard requirement, however, as there are many cases where an element can be empty legitimately, for example when it is used as a placeholder which will later be filled in by a script, or when the element is part of a template and would on most pages be filled in but on some pages is not relevant.

3.2.5.1.3 Sectioning content

Sectioning content is content that defines the scope of headings and footers.

Each sectioning content element potentially has a heading and an outline. See the section on headings and sections for further details.

There are also certain elements that are sectioning roots. These are distinct from sectioning content, but they can also have an outline.

3.2.5.1.4 Heading content

Heading content defines the header of a section (whether explicitly marked up using sectioning content elements, or implied by the heading content itself).

3.2.5.1.5 Phrasing content

Phrasing content is the text of the document, as well as elements that mark up that text at the intra-paragraph level. Runs of phrasing content form paragraphs.

As a general rule, elements whose content model allows any phrasing content should have either at least one descendant text node that is not inter-element whitespace, or at least one descendant element node that is embedded content. For the purposes of this requirement, nodes that are descendants of del elements must not be counted as contributing to the ancestors of the del element.

Most elements that are categorized as phrasing content can only contain elements that are themselves categorized as phrasing content, not any flow content.

Text, in the context of content models, means text nodes. Text is sometimes used as a content model on its own, but is also phrasing content, and can be inter-element whitespace (if the text nodes are empty or contain just space characters).

3.2.5.1.6 Embedded content

ISSUE-80 (title-alternative) blocks progress to Last Call

Embedded content is content that imports another resource into the document, or content from another vocabulary that is inserted into the document.

Elements that are from namespaces other than the HTML namespace and that convey content but not metadata, are embedded content for the purposes of the content models defined in this specification. (For example, MathML, or SVG.)

Some embedded content elements can have fallback content: content that is to be used when the external resource cannot be used (e.g. because it is of an unsupported format). The element definitions state what the fallback is, if any.

3.2.5.1.7 Interactive content

Interactive content is content that is specifically intended for user interaction.

Certain elements in HTML have an activation behavior, which means that the user can activate them. This triggers a sequence of events dependent on the activation mechanism, and normally culminating in a click event, as described below.

The user agent should allow the user to manually trigger elements that have an activation behavior, for instance using keyboard or voice input, or through mouse clicks. When the user triggers an element with a defined activation behavior in a manner other than clicking it, the default action of the interaction event must be to run synthetic click activation steps on the element.

When a user agent is to run synthetic click activation steps on an element, the user agent must run pre-click activation steps on the element, then fire a click event at the element. The default action of this click event must be to run post-click activation steps on the element. If the event is canceled, the user agent must run canceled activation steps on the element instead.

When a pointing device is clicked, the user agent must run these steps:

  1. Let e be the nearest activatable element of the element designated by the user (defined below), if any.

  2. If there is an element e, run pre-click activation steps on it.

  3. Dispatch the required click event.

    If there is an element e, then the default action of the click event must be to run post-click activation steps on element e.

    If there is an element e but the event is canceled, the user agent must run canceled activation steps on element e.

The above doesn't happen for arbitrary synthetic events dispatched by author script. However, the click() method can be used to make it happen programmatically.

Given an element target, the nearest activatable element is the element returned by the following algorithm:

  1. If target has a defined activation behavior, then return target and abort these steps.

  2. If target has a parent element, then set target to that parent element and return to the first step.

  3. Otherwise, there is no nearest activatable element.

When a user agent is to run pre-click activation steps on an element, it must run the pre-click activation steps defined for that element, if any.

When a user agent is to run canceled activation steps on an element, it must run the canceled activation steps defined for that element, if any.

When a user agent is to run post-click activation steps on an element, it must run the activation behavior defined for that element. Activation behaviors can refer to the click event that was fired by the steps above leading up to this point.

3.2.5.2 Transparent content models

Some elements are described as transparent; they have "transparent" in the description of their content model. The content model of a transparent element is derived from the content model of its parent element: the elements required in the part of the content model that is "transparent" are the same elements as required in the part of the content model of the parent of the transparent element in which the transparent element finds itself.

For instance, an ins element inside a ruby element cannot contain an rt element, because the part of the ruby element's content model that allows ins elements is the part that allows phrasing content, and the rt element is not phrasing content.

In some cases, where transparent elements are nested in each other, the process has to be applied iteratively.

Consider the following markup fragment:

<p><object><param><ins><map><a href="/">Apples</a></map></ins></object></p>

To check whether "Apples" is allowed inside the a element, the content models are examined. The a element's content model is transparent, as is the map element's, as is the ins element's, as is the part of the object element's in which the ins element is found. The object element is found in the p element, whose content model is phrasing content. Thus, "Apples" is allowed, as text is phrasing content.

When a transparent element has no parent, then the part of its content model that is "transparent" must instead be treated as accepting any flow content.

3.2.5.3 Paragraphs

The term paragraph as defined in this section is distinct from (though related to) the p element defined later. The paragraph concept defined here is used to describe how to interpret documents.

A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.

In the following example, there are two paragraphs in a section. There is also a heading, which contains phrasing content that is not a paragraph. Note how the comments and inter-element whitespace do not form paragraphs.

<section>
  <h1>Example of paragraphs</h1>
  This is the <em>first</em> paragraph in this example.
  <p>This is the second.</p>
  <!-- This is not a paragraph. -->
</section>

Paragraphs in flow content are defined relative to what the document looks like without the a, ins, del, and map elements complicating matters, since those elements, with their hybrid content models, can straddle paragraph boundaries, as shown in the first two examples below.

Generally, having elements straddle paragraph boundaries is best avoided. Maintaining such markup can be difficult.

The following example takes the markup from the earlier example and puts ins and del elements around some of the markup to show that the text was changed (though in this case, the changes admittedly don't make much sense). Notice how this example has exactly the same paragraphs as the previous one, despite the ins and del elements — the ins element straddles the heading and the first paragraph, and the del element straddles the boundary between the two paragraphs.

<section>
  <ins><h1>Example of paragraphs</h1>
  This is the <em>first</em> paragraph in</ins> this example<del>.
  <p>This is the second.</p></del>
  <!-- This is not a paragraph. -->
</section>

Let view be a view of the DOM that replaces all a, ins, del, and map elements in the document with their contents. Then, in view, for each run of sibling phrasing content nodes uninterrupted by other types of content, in an element that accepts content other than phrasing content as well as phrasing content, let first be the first node of the run, and let last be the last node of the run. For each such run that consists of at least one node that is neither embedded content nor inter-element whitespace, a paragraph exists in the original DOM from immediately before first to immediately after last. (Paragraphs can thus span across a, ins, del, and map elements.)

Conformance checkers may warn authors of cases where they have paragraphs that overlap each other (this can happen with object, video, audio, and canvas elements, and indirectly through elements in other namespaces that allow HTML to be further embedded therein, like svg or math).

A paragraph is also formed explicitly by p elements.

The p element can be used to wrap individual paragraphs when there would otherwise not be any content other than phrasing content to separate the paragraphs from each other.

In the following example, the link spans half of the first paragraph, all of the heading separating the two paragraphs, and half of the second paragraph. It straddles the paragraphs and the heading.

<aside>
 Welcome!
 <a href="about.html">
  This is home of...
  <h1>The Falcons!</h1>
  The Lockheed Martin multirole jet fighter aircraft!
 </a>
 This page discusses the F-16 Fighting Falcon's innermost secrets.
</aside>

Here is another way of marking this up, this time showing the paragraphs explicitly, and splitting the one link element into three:

<aside>
 <p>Welcome! <a href="about.html">This is home of...</a></p>
 <h1><a href="about.html">The Falcons!</a></h1>
 <p><a href="about.html">The Lockheed Martin multirole jet
 fighter aircraft!</a> This page discusses the F-16 Fighting
 Falcon's innermost secrets.</p>
</aside>

It is possible for paragraphs to overlap when using certain elements that define fallback content. For example, in the following section:

<section>
 <h1>My Cats</h1>
 You can play with my cat simulator.
 <object data="cats.sim">
  To see the cat simulator, use one of the following links:
  <ul>
   <li><a href="cats.sim">Download simulator file</a>
   <li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a>
  </ul>
  Alternatively, upgrade to the Mellblom Browser.
 </object>
 I'm quite proud of it.
</section>

There are five paragraphs:

  1. The paragraph that says "You can play with my cat simulator. object I'm quite proud of it.", where object is the object element.
  2. The paragraph that says "To see the cat simulator, use one of the following links:".
  3. The paragraph that says "Download simulator file".
  4. The paragraph that says "Use online simulator".
  5. The paragraph that says "Alternatively, upgrade to the Mellblom Browser.".

The first paragraph is overlapped by the other four. A user agent that supports the "cats.sim" resource will only show the first one, but a user agent that shows the fallback will confusingly show the first sentence of the first paragraph as if it was in the same paragraph as the second one, and will show the last paragraph as if it was at the start of the second sentence of the first paragraph.

To avoid this confusion, explicit p elements can be used.

3.2.6 Requirements relating to bidirectional-algorithm formatting characters

Text content in HTML elements with child text nodes, and text in attributes of HTML elements that allow free-form text, may contain characters in the range U+202A to U+202E (the bidirectional-algorithm formatting characters). However, the use of these characters is restricted so that any embedding or overrides generated by these characters do not start and end with different parent elements, and so that all such embeddings and overrides are explicitly terminated by a U+202C POP DIRECTIONAL FORMATTING character. This helps reduce incidences of text being reused in a manner that has unforeseen effects on the bidirectional algorithm.

The aforementioned restrictions are defined by specifying that certain parts of documents form bidirectional-algorithm formatting character ranges, and then imposing a requirement on such ranges.

The string resulting from the concatenation of the data of all of an HTML element's text nodes, if any, is a bidirectional-algorithm formatting character range.

The value of a namespace-less attribute of an HTML element is a bidirectional-algorithm formatting character range.

Any strings that, as described above, are bidirectional-algorithm formatting character ranges must match the string production in the following ABNF, the character set for which is Unicode. [ABNF]

string        = *( plaintext ( embedding / override ) ) plaintext
embedding     = ( lre / rle ) string pdf
override      = ( lro / rlo ) string pdf
lre           = %x202A ; U+202A LEFT-TO-RIGHT EMBEDDING
rle           = %x202B ; U+202B RIGHT-TO-LEFT EMBEDDING
lro           = %x202D ; U+202D LEFT-TO-RIGHT OVERRIDE
rlo           = %x202E ; U+202E RIGHT-TO-LEFT OVERRIDE
pdf           = %x202C ; U+202C POP DIRECTIONAL FORMATTING
plaintext     = *( %x0000-2029 / %x202F-10FFFF )
                ; any string with no bidirectional-algorithm formatting characters

For convenience, where possible authors will likely prefer to use the dir attribute, the bdo element, and the bdi element, rather than maintaining the bidirectional-algorithm formatting characters manually.

3.2.7 Annotations for assistive technology products (ARIA)

ISSUE-109 (aria-section-title) and ISSUE-129 (aria-mapping) block progress to Last Call

Authors may use the ARIA role and aria-* attributes on HTML elements, in accordance with the requirements described in the ARIA specifications, except where these conflict with the strong native semantics described below. These exceptions are intended to prevent authors from making assistive technology products report nonsensical states that do not represent the actual state of the document. [ARIA]

User agents are required to implement ARIA semantics on all HTML elements, as defined in the ARIA specifications. The implicit ARIA semantics defined below must be recognized by implementations. [ARIAIMPL]

The following table defines the strong native semantics and corresponding implicit ARIA semantics that apply to HTML elements. Each language feature (element or attribute) in a cell in the first column implies the ARIA semantics (role, states, and/or properties) given in the cell in the second column of the same row. Authors must not set the ARIA role and aria-* attributes in a manner that conflicts with the semantics described in the following table, except that the presentation role may always be used. When multiple rows apply to an element, the role from the last row to define a role must be applied, and the states and properties from all the rows must be combined.

Language feature Strong native semantics and implied ARIA semantics
a element that creates a hyperlink link role
area element that creates a hyperlink link role
base element No role
button element button role
datalist element listbox role, with the aria-multiselectable property set to "false"
details element aria-expanded state set to "true" if the element's open attribute is present, and set to "false" otherwise
h1 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth
h2 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth
h3 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth
h4 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth
h5 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth
h6 element that does not have an hgroup ancestor heading role, with the aria-level property set to the element's outline depth
head element No role
hgroup element heading role, with the aria-level property set to the element's outline depth
hr element separator role
html element No role
img element whose alt attribute's value is empty presentation role
input element with a type attribute in the Button state button role
input element with a type attribute in the Checkbox state aria-checked state set to "mixed" if the element's indeterminate IDL attribute is true, or "true" if the element's checkedness is true, or "false" otherwise
input element with a type attribute in the Color state No role
input element with a type attribute in the Date state No role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the Date and Time state No role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the Local Date and Time state No role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the E-mail state with no suggestions source element textbox role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the File Upload state No role
input element with a type attribute in the Hidden state No role
input element with a type attribute in the Image Button state button role
input element with a type attribute in the Month state No role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the Number state spinbutton role, with the aria-readonly state set to "true" if the element has a readonly attribute, the aria-valuemax property set to the element's maximum, the aria-valuemin property set to the element's minimum, and, if the result of applying the rules for parsing floating point number values to the element's value is a number, with the aria-valuenow property set to that number
input element with a type attribute in the Password state textbox role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the Radio Button state aria-checked state set to "true" if the element's checkedness is true, or "false" otherwise
input element with a type attribute in the Range state slider role, with the aria-valuemax property set to the element's maximum, the aria-valuemin property set to the element's minimum, and the aria-valuenow property set to the result of applying the rules for parsing floating point number values to the element's value, if that results in a number, or the default value otherwise
input element with a type attribute in the Reset Button state button role
input element with a type attribute in the Search state with no suggestions source element textbox role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the Submit Button state button role
input element with a type attribute in the Telephone state with no suggestions source element textbox role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the Text state with no suggestions source element textbox role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the Text, Search, Telephone, URL, or E-mail states with a suggestions source element combobox role, with the aria-owns property set to the same value as the list attribute, and the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the Time state No role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the URL state with no suggestions source element textbox role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element with a type attribute in the Week state No role, with the aria-readonly state set to "true" if the element has a readonly attribute
input element that is required The aria-required state set to "true"
keygen element No role
label element No role
link element that creates a hyperlink link role
menu element with a type attribute in the context menu state No role
menu element with a type attribute in the list state menu role
menu element with a type attribute in the toolbar state toolbar role
meta element No role
meter element No role
nav element navigation role
noscript element No role
optgroup element No role
option element that is in a list of options or that represents a suggestion in a datalist element option role, with the aria-selected state set to "true" if the element's selectedness is true, or "false" otherwise.
param element No role
progress element progressbar role, with, if the progress bar is determinate, the aria-valuemax property set to the maximum value of the progress bar, the aria-valuemin property set to zero, and the aria-valuenow property set to the current value of the progress bar
script element No role
select element with a multiple attribute listbox role, with the aria-multiselectable property set to "true"
select element with no multiple attribute listbox role, with the aria-multiselectable property set to "false"
select element with a required attribute The aria-required state set to "true"
source element No role
style element No role
summary element heading role
textarea element textbox role, with the aria-multiline property set to "true", and the aria-readonly state set to "true" if the element has a readonly attribute
textarea element with a required attribute The aria-required state set to "true"
title element No role
An element that defines a command, whose Type facet is "checkbox", and that is a descendant of a menu element whose type attribute in the list state menuitemcheckbox role, with the aria-checked state set to "true" if the command's Checked State facet is true, and "false" otherwise
An element that defines a command, whose Type facet is "command", and that is a descendant of a menu element whose type attribute in the list state menuitem role
An element that defines a command, whose Type facet is "radio", and that is a descendant of a menu element whose type attribute in the list state menuitemradio role, with the aria-checked state set to "true" if the command's Checked State facet is true, and "false" otherwise
Element that is disabled The aria-disabled state set to "true"
Element with a hidden attribute The aria-hidden state set to "true"
Element that is a candidate for constraint validation but that does not satisfy its constraints The aria-invalid state set to "true"

Some HTML elements have native semantics that can be overridden. The following table lists these elements and their implicit ARIA semantics, along with the restrictions that apply to those elements. Each language feature (element or attribute) in a cell in the first column implies, unless otherwise overridden, the ARIA semantic (role, state, or property) given in the cell in the second column of the same row, but this semantic may be overridden under the conditions listed in the cell in the third column of that row. In addition, any element may be given the presentation role, regardless of the restrictions below.

Language feature Default implied ARIA semantic Restrictions
address element No role If specified, role must be contentinfo
article element article role Role must be either article, document, application, or main
aside element note role Role must be either note, complementary, or search
audio element No role If specified, role must be application
details element group role Role must be a role that supports aria-expanded
embed element No role If specified, role must be either application, document, or img
footer element No role If specified, role must be contentinfo
header element No role If specified, role must be banner
iframe element No role If specified, role must be either application, document, or img
img element whose alt attribute's value is absent img role No restrictions
input element with a type attribute in the Checkbox state checkbox role Role must be either checkbox or menuitemcheckbox
input element with a type attribute in the Radio Button state radio role Role must be either radio or menuitemradio
li element whose parent is an ol or ul element listitem role Role must be either listitem, menuitemcheckbox, menuitemradio, option, tab, or treeitem
object element No role If specified, role must be either application, document, or img
ol element list role Role must be either directory, list, listbox, menu, menubar, tablist, toolbar, tree
output element status role No restrictions
section element region role Role must be either alert, alertdialog, application, contentinfo, dialog, document, log, main, marquee, region, search, or status
ul element list role Role must be either directory, list, listbox, menu, menubar, tablist, toolbar, tree
video element No role If specified, role must be application
The body element document role Role must be either document or application

Conformance checkers are encouraged to phrase errors such that authors are encouraged to use more appropriate elements rather than remove accessibility annotations. For example, if an a element is marked as having the button role, a conformance checker could say "Use a more appropriate element to represent a button, for example a button element or an input element" rather than "The button role cannot be used with a elements".

These features can be used to make accessibility tools render content to their users in more useful ways. For example, ASCII art, which is really an image, appears to be text, and in the absence of appropriate annotations would end up being rendered by screen readers as a very painful reading of lots of punctuation. Using the features described in this section, one can instead make the ATs skip the ASCII art and just read the caption:

<figure role="img" aria-labelledby="fish-caption"> 
 <pre>
 o           .'`/
     '      /  (
   O    .-'` ` `'-._      .')
      _/ (o)        '.  .' /
      )       )))     ><  <
      `\  |_\      _.'  '. \
        '-._  _ .-'       '.)
    jgs     `\__\
 </pre>
 <figcaption id="fish-caption">
  Joan G. Stark, "<cite>fish</cite>".
  October 1997. ASCII on electrons. 28×8.
 </figcaption> 
</figure>