Each element defined in this specification has a content model: a description of the element's expected contents. An HTML element must have contents that match the requirements described in the element's content model.
The space characters
are always allowed between elements. User agents represent these
characters between elements in the source markup as Text
nodes in the DOM. Empty Text
nodes and Text
nodes consisting of just sequences of those characters are
considered inter-element
whitespace.
Inter-element whitespace, comment nodes, and processing instruction nodes must be ignored when establishing whether an element's contents match the element's content model or not, and must be ignored when following algorithms that define document and element semantics.
Thus, an element A is said to
be preceded or followed by a second element B if A and B have
the same parent node and there are no other element nodes or
Text
nodes (other than inter-element whitespace)
between them. Similarly, a node is the only child of an
element if that element contains no other nodes other than inter-element whitespace,
comment nodes, and processing instruction nodes.
Authors must not use HTML elements anywhere except where they are explicitly allowed, as defined for each element, or as explicitly required by other specifications. For XML compound documents, these contexts could be inside elements from other namespaces, if those elements are defined as providing the relevant contexts.
For example, the Atom specification defines a content
element. When its type
attribute has the value xhtml
, the Atom
specification requires that it contain a single HTML div
element. Thus, a div
element is allowed in that context, even
though this is not explicitly normatively stated by this
specification. [ATOM]
In addition, HTML elements may be orphan nodes (i.e. without a parent node).
For example, creating a td
element and storing it in a global variable in
a script is conforming, even though td
elements are otherwise only supposed to be
used inside tr
elements.
var data = { name: "Banana", cell: document.createElement('td'), };
Each element in HTML falls into zero or more categories that group elements with similar characteristics together. The following broad categories are used in this specification:
Some elements also fall into other categories, which are defined in other parts of this specification.
These categories are related as follows:
Sectioning content, heading content, phrasing content, embedded content, and interactive content are all types of flow content. Metadata is sometimes flow content. Metadata and interactive content are sometimes phrasing content. Embedded content is also a type of phrasing content, and sometimes is interactive content.
Other categories are also used for specific purposes, e.g. form controls are specified using a number of categories to define common requirements. Some elements have unique requirements and do not fit into any particular category.
Metadata content is content that sets up the presentation or behavior of the rest of the content, or that sets up the relationship of the document with other documents, or that conveys other "out of band" information.
Elements from other namespaces whose semantics are primarily metadata-related (e.g. RDF) are also metadata content.
Thus, in the XML serialization, one can use RDF, like this:
<html xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <head> <title>Hedral's Home Page</title> <r:RDF> <Person xmlns="http://www.w3.org/2000/10/swap/pim/contact#" r:about="http://hedral.example.com/#"> <fullName>Cat Hedral</fullName> <mailbox r:resource="mailto:hedral@damowmow.com"/> <personalTitle>Sir</personalTitle> </Person> </r:RDF> </head> <body> <h1>My home page</h1> <p>I like playing with string, I guess. Sister says squirrels are fun too so sometimes I follow her to play with them.</p> </body> </html>
This isn't possible in the HTML serialization, however.
Most elements that are used in the body of documents and applications are categorized as flow content.
a
abbr
address
area
(if it is a descendant of a
map
element)article
aside
audio
b
bdi
bdo
blockquote
br
button
canvas
cite
code
command
datalist
del
details
dfn
dialog
div
dl
em
embed
fieldset
figure
footer
form
h1
h2
h3
h4
h5
h6
header
hgroup
hr
i
iframe
img
input
ins
kbd
keygen
label
map
mark
math
menu
meter
nav
noscript
object
ol
output
p
pre
progress
q
ruby
s
samp
script
section
select
small
span
strong
style
(if the scoped
attribute is present)
sub
sup
svg
table
textarea
time
u
ul
var
video
wbr
Sectioning content is content that defines the scope of headings and footers.
Each sectioning content element potentially has a heading and an outline. See the section on headings and sections for further details.
There are also certain elements that are sectioning roots. These are distinct from sectioning content, but they can also have an outline.
Heading content defines the header of a section (whether explicitly marked up using sectioning content elements, or implied by the heading content itself).
Phrasing content is the text of the document, as well as elements that mark up that text at the intra-paragraph level. Runs of phrasing content form paragraphs.
a
abbr
area
(if it is a descendant of a
map
element)audio
b
bdi
bdo
br
button
canvas
cite
code
command
datalist
del
dfn
em
embed
i
iframe
img
input
ins
kbd
keygen
label
map
mark
math
meter
noscript
object
output
progress
q
ruby
s
samp
script
select
small
span
strong
sub
sup
svg
textarea
time
u
var
video
wbr
As a general rule, elements whose content model allows any
phrasing content should have either at
least one descendant Text
node that is not inter-element whitespace, or at
least one descendant element node that is embedded content. For the purposes of
this requirement, nodes that are descendants of del
elements must not be counted as contributing
to the ancestors of the del
element.
Most elements that are categorized as phrasing content can only contain elements that are themselves categorized as phrasing content, not any flow content.
Text, in the
context of content models, means Text
nodes. Text is sometimes used as
a content model on its own, but is also phrasing content, and can be inter-element whitespace (if
the Text
nodes are empty or contain just space
characters).
Text
nodes and attribute values must consist of Unicode
characters, must not contain U+0000 characters, must not
contain permanently undefined Unicode characters (noncharacters),
and must not contain control characters other than space characters.
This specification includes extra constraints on the exact value
of Text
nodes and attribute values depending on their precise context.
Embedded content is content that imports another resource into the document, or content from another vocabulary that is inserted into the document.
Elements that are from namespaces other than the HTML namespace and that convey content but not metadata, are embedded content for the purposes of the content models defined in this specification. (For example, MathML, or SVG.)
Some embedded content elements can have fallback content: content that is to be used when the external resource cannot be used (e.g. because it is of an unsupported format). The element definitions state what the fallback is, if any.
Interactive content is content that is specifically intended for user interaction.
a
audio
(if the controls
attribute is present)button
details
embed
iframe
img
(if the usemap
attribute is present)input
(if the type
attribute is not in the state)keygen
label
menu
(if the type
attribute is in the toolbar state)object
(if the usemap
attribute is present)
select
textarea
video
(if the controls
attribute is present)Certain elements in HTML have an
activation behavior, which means that the user can activate
them. This triggers a sequence of events dependent on the
activation mechanism, and normally culminating in a click
event.
As a general rule, elements whose content model allows any
flow content or phrasing content should have at least
one child node that is palpable
content and that does not have the attribute specified.
This requirement is not a hard requirement, however, as there are many cases where an element can be empty legitimately, for example when it is used as a placeholder which will later be filled in by a script, or when the element is part of a template and would on most pages be filled in but on some pages is not relevant.
Conformance checkers are encouraged to provide a mechanism for authors to find elements that fail to fulfill this requirement, as an authoring aid.
The following elements are palpable content:
a
abbr
address
article
aside
audio
(if the controls
attribute is present)b
bdi
bdo
blockquote
button
canvas
cite
code
details
dfn
div
dl
(if the element's children include at least
one name-value group)em
embed
fieldset
figure
footer
form
h1
h2
h3
h4
h5
h6
header
hgroup
i
iframe
img
input
(if the type
attribute is not in the state)ins
kbd
keygen
label
map
mark
math
menu
(if the type
attribute is in the toolbar state or the
list state)meter
nav
object
ol
(if the element's children include at least
one li
element)output
p
pre
progress
q
ruby
s
samp
section
select
small
span
strong
sub
sup
svg
table
textarea
time
u
ul
(if the element's children include at least
one li
element)var
video
Some elements are described as transparent; they have "transparent" in the description of their content model. The content model of a transparent element is derived from the content model of its parent element: the elements required in the part of the content model that is "transparent" are the same elements as required in the part of the content model of the parent of the transparent element in which the transparent element finds itself.
For instance, an ins
element inside a ruby
element cannot contain an
rt
element, because the part of the
ruby
element's content model that allows
ins
elements is the part that allows phrasing content, and the
rt
element is not phrasing content.
In some cases, where transparent elements are nested in each other, the process has to be applied iteratively.
Consider the following markup fragment:
<p><ins><map><a href="/">Apples</a></map></ins></p>
To check whether "Apples" is allowed inside the a
element, the content models are examined. The
a
element's content model is transparent, as is
the map
element's, as is the ins
element's. The ins
element is found in the p
element, whose content model is phrasing content. Thus, "Apples" is
allowed, as text is phrasing content.
When a transparent element has no parent, then the part of its content model that is "transparent" must instead be treated as accepting any flow content.
The term paragraph as defined in this section is used for
more than just the definition of the p
element. The paragraph concept defined here is used to
describe how to interpret documents. The p
element is merely one of several ways of
marking up a paragraph.
A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.
In the following example, there are two paragraphs in a section. There is also a heading, which contains phrasing content that is not a paragraph. Note how the comments and inter-element whitespace do not form paragraphs.
<section> <h1>Example of paragraphs</h1> This is the <em>first</em> paragraph in this example. <p>This is the second.</p> <!-- This is not a paragraph. --> </section>
Paragraphs in flow content are defined relative to what the
document looks like without the a
, ins
, del
, and map
elements complicating matters, since those
elements, with their hybrid content models, can straddle paragraph
boundaries, as shown in the first two examples below.
Generally, having elements straddle paragraph boundaries is best avoided. Maintaining such markup can be difficult.
The following example takes the markup from the earlier example
and puts ins
and del
elements around some of the markup to show
that the text was changed (though in this case, the changes
admittedly don't make much sense). Notice how this example has
exactly the same paragraphs as the previous one, despite the
ins
and del
elements — the ins
element straddles the heading and the first
paragraph, and the del
element straddles the boundary between the
two paragraphs.
<section> <ins><h1>Example of paragraphs</h1> This is the <em>first</em> paragraph in</ins> this example<del>. <p>This is the second.</p></del> <!-- This is not a paragraph. --> </section>
A paragraph is also formed explicitly by
p
elements.
The p
element can be used to wrap individual
paragraphs when there would otherwise not be any content other than
phrasing content to separate the paragraphs from each other.
In the following example, the link spans half of the first paragraph, all of the heading separating the two paragraphs, and half of the second paragraph. It straddles the paragraphs and the heading.
<header> Welcome! <a href="about.html"> This is home of... <h1>The Falcons!</h1> The Lockheed Martin multirole jet fighter aircraft! </a> This page discusses the F-16 Fighting Falcon's innermost secrets. </header>
Here is another way of marking this up, this time showing the paragraphs explicitly, and splitting the one link element into three:
<header> <p>Welcome! <a href="about.html">This is home of...</a></p> <h1><a href="about.html">The Falcons!</a></h1> <p><a href="about.html">The Lockheed Martin multirole jet fighter aircraft!</a> This page discusses the F-16 Fighting Falcon's innermost secrets.</p> </header>
It is possible for paragraphs to overlap when using certain elements that define fallback content. For example, in the following section:
<section> <h1>My Cats</h1> You can play with my cat simulator. <object data="cats.sim"> To see the cat simulator, use one of the following links: <ul> <li><a href="cats.sim">Download simulator file</a> <li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a> </ul> Alternatively, upgrade to the Mellblom Browser. </object> I'm quite proud of it. </section>
There are five paragraphs:
object
element.The first paragraph is overlapped by the other four. A user agent that supports the "cats.sim" resource will only show the first one, but a user agent that shows the fallback will confusingly show the first sentence of the first paragraph as if it was in the same paragraph as the second one, and will show the last paragraph as if it was at the start of the second sentence of the first paragraph.
To avoid this confusion, explicit p
elements can be used. For example:
<section> <h1>My Fish</h1> You can play with my fish simulator. <object data="fish.sim"> <p>To see the fish simulator, use one of the following links:</p> <ul> <li><a href="fish.sim">Download simulator file</a> <li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a> </ul> <p>Alternatively, upgrade to the Mellblom Browser.</p> </object> I'm quite proud of it. </section>