Every XML and HTML document in an HTML UA is represented by a
Document
object. [DOMCORE]
The document's address is an absolute URL
that is initially set when the Document
is created but
that can change during the lifetime of the Document
,
for example when the user navigates to
a fragment identifier on the
page or when the pushState()
method is called
with a new URL.
Interactive user agents typically expose the document's address in their user interface. This is the primary mechanism by which a user can tell if a site is attempting to impersonate another.
When a Document
is created by a script using the createDocument()
or createHTMLDocument()
APIs, the document's address is the same as the
document's address of the script's document, and
the Document
is both ready for post-load
tasks and completely loaded immediately.
Each Document
object has a reload override
flag that is originally unset. The flag is set by the document.open()
and document.write()
methods in certain
situations. When the flag is set, the Document
also has
a reload override buffer which is a Unicode string that
is used as the source of the document when it is reloaded.
When the user agent is to perform an overridden reload, it must act as follows:
Let source be the value of the browsing context's active document's reload override buffer.
Navigate the
browsing context to a resource whose source is source, with replacement enabled. When
the navigate algorithm creates a Document
object for this purpose, set that Document
's
reload override flag and set its reload override
buffer to source.
Document
objectThe DOM Core specification defines a Document
interface, which this specification
extends significantly:
[OverrideBuiltins] partial interface Document { // resource metadata management [PutForwards=href] readonly attribute Location? location; attribute DOMString domain; readonly attribute DOMString referrer; attribute DOMString cookie; readonly attribute DOMString lastModified; readonly attribute DOMString readyState; // DOM tree accessors getter object (DOMString name); attribute DOMString title; attribute DOMString dir; attribute HTMLElement? body; readonly attribute HTMLHeadElement? head; readonly attribute HTMLCollection images; readonly attribute HTMLCollection embeds; readonly attribute HTMLCollection plugins; readonly attribute HTMLCollection links; readonly attribute HTMLCollection forms; readonly attribute HTMLCollection scripts; NodeList getElementsByName(DOMString elementName); // dynamic markup insertion Document open(optional DOMString type, optional DOMString replace); WindowProxy open(DOMString url, DOMString name, DOMString features, optional boolean replace); void close(); void write(DOMString... text); void writeln(DOMString... text); // user interaction readonly attribute WindowProxy? defaultView; readonly attribute Element? activeElement; boolean hasFocus(); attribute DOMString designMode; boolean execCommand(DOMString commandId); boolean execCommand(DOMString commandId, boolean showUI); boolean execCommand(DOMString commandId, boolean showUI, DOMString value); boolean queryCommandEnabled(DOMString commandId); boolean queryCommandIndeterm(DOMString commandId); boolean queryCommandState(DOMString commandId); boolean queryCommandSupported(DOMString commandId); DOMString queryCommandValue(DOMString commandId); readonly attribute HTMLCollection commands; // event handler IDL attributes attribute EventHandler onabort; attribute EventHandler onblur; attribute EventHandler oncancel; attribute EventHandler oncanplay; attribute EventHandler oncanplaythrough; attribute EventHandler onchange; attribute EventHandler onclick; attribute EventHandler onclose; attribute EventHandler oncontextmenu; attribute EventHandler oncuechange; attribute EventHandler ondblclick; attribute EventHandler ondrag; attribute EventHandler ondragend; attribute EventHandler ondragenter; attribute EventHandler ondragleave; attribute EventHandler ondragover; attribute EventHandler ondragstart; attribute EventHandler ondrop; attribute EventHandler ondurationchange; attribute EventHandler onemptied; attribute EventHandler onended; attribute OnErrorEventHandler onerror; attribute EventHandler onfocus; attribute EventHandler oninput; attribute EventHandler oninvalid; attribute EventHandler onkeydown; attribute EventHandler onkeypress; attribute EventHandler onkeyup; attribute EventHandler onload; attribute EventHandler onloadeddata; attribute EventHandler onloadedmetadata; attribute EventHandler onloadstart; attribute EventHandler onmousedown; attribute EventHandler onmousemove; attribute EventHandler onmouseout; attribute EventHandler onmouseover; attribute EventHandler onmouseup; attribute EventHandler onmousewheel; attribute EventHandler onpause; attribute EventHandler onplay; attribute EventHandler onplaying; attribute EventHandler onprogress; attribute EventHandler onratechange; attribute EventHandler onreset; attribute EventHandler onscroll; attribute EventHandler onseeked; attribute EventHandler onseeking; attribute EventHandler onselect; attribute EventHandler onshow; attribute EventHandler onstalled; attribute EventHandler onsubmit; attribute EventHandler onsuspend; attribute EventHandler ontimeupdate; attribute EventHandler onvolumechange; attribute EventHandler onwaiting; // special event handler IDL attributes that only apply to Document objects [LenientThis] attribute EventHandler onreadystatechange; };
User agents must throw a
SecurityError
exception whenever any properties of a
Document
object are accessed by scripts whose
effective script origin is not the same as the Document
's effective
script origin.
referrer
Returns the address
of the Document
from which the user navigated to this
one, unless it was blocked or there was no such document, in which
case it returns the empty string.
The noreferrer
link
type can be used to block the referrer.
The referrer
attribute
must return either the address of the active document of the
source browsing context at the time the navigation
was started (that is, the page which navigated the browsing context
to the current document), with any <fragment> component removed; or
the empty string if there is no such originating page, or if the UA
has been configured not to report referrers in this case, or if the
navigation was initiated for a hyperlink with a noreferrer
keyword.
In the case of HTTP, the referrer
IDL attribute will
match the Referer
(sic) header
that was sent when fetching the current
page.
Typically user agents are configured to not report
referrers in the case where the referrer uses an encrypted protocol
and the current page does not (e.g. when navigating from an https:
page to an http:
page).
cookie
[ = value ]Returns the HTTP cookies that apply to the
Document
. If there are no cookies or cookies can't be
applied to this resource, the empty string will be returned.
Can be set, to add a new cookie to the element's set of HTTP cookies.
If the contents are sandboxed into a unique origin (e.g. in an
iframe
with the sandbox
attribute), a
SecurityError
exception will be thrown on getting and
setting.
The cookie
attribute represents the cookies of the resource identified by
the document's address.
A Document
object that falls into one of the
following conditions is a cookie-averse Document
object:
Document
that has no browsing
context.Document
whose address does not use a server-based naming
authority.On getting, if the document is a
cookie-averse Document
object, then the
user agent must return the empty string. Otherwise, if the
Document
's origin is not a
scheme/host/port tuple, the user agent must throw a
SecurityError
exception. Otherwise, the user agent must
first obtain the storage mutex and then return the
cookie-string for the document's address for a
"non-HTTP" API, decoded as UTF-8, with error handling.
[COOKIES]
On setting, if the document is a cookie-averse
Document
object, then the user agent must do
nothing. Otherwise, if the Document
's
origin is not a scheme/host/port tuple, the user agent
must throw a SecurityError
exception. Otherwise, the
user agent must obtain the storage mutex and then act
as it would when receiving a set-cookie-string for
the document's address via a "non-HTTP" API, consisting
of the new value encoded as UTF-8. [COOKIES] [RFC3629]
Since the cookie
attribute is accessible
across frames, the path restrictions on cookies are only a tool to
help manage which cookies are sent to which parts of the site, and
are not in any way a security feature.
lastModified
Returns the date of the last modification to the document, as
reported by the server, in the form "MM/DD/YYYY hh:mm:ss
", in the user's local
time zone.
If the last modification date is not known, the current time is returned instead.
The lastModified
attribute, on getting, must return the date and time of the
Document
's source file's last modification, in the
user's local time zone, in the following format:
All the numeric components above, other than the year, must be given as two digits in the range ASCII digits representing the number in base ten, zero-padded if necessary. The year must be given as the shortest possible string of four or more digits in the range ASCII digits representing the number in base ten, zero-padded if necessary.
The Document
's source file's last modification date
and time must be derived from relevant features of the networking
protocols used, e.g. from the value of the HTTP Last-Modified
header of the
document, or from metadata in the file system for local files. If
the last modification date and time are not known, the attribute
must return the current date and time in the above format.
readyState
Returns "loading
" while the Document
is loading, "interactive
" once it is finished parsing but still loading sub-resources, and "complete
" once it has loaded.
The readystatechange
event fires on the Document
object when this value changes.
Each document has a current document readiness. When a
Document
object is created, it must have its
current document readiness set to the string "loading
" if the document is associated with an
HTML parser, an XML parser, or an XSLT
processor, and to the string "complete
"
otherwise. Various algorithms during page loading affect this value.
When the value is set, the user agent must fire a simple
event named readystatechange
at the Document
object.
A Document
is said to have an active
parser if it is associated with an HTML parser or
an XML parser that has not yet been stopped or aborted.
The readyState
IDL
attribute must, on getting, return the current document
readiness.
The html
element of a document is the
document's root element, if there is one and it's an
html
element, or null otherwise.
head
Returns the head
element.
The head
element of a document is the
first head
element that is a child of the
html
element, if there is one, or null
otherwise.
The head
attribute, on getting, must return the head
element of the document (a head
element or
null).
title
[ = value ]Returns the document's title, as given by the
title
element.
Can be set, to update the document's title. If there is no
head
element,
the new value is ignored.
In SVG documents, the SVGDocument
interface's
title
attribute takes
precedence.
The title
element of a document is the
first title
element in the document (in tree order), if
there is one, or null otherwise.
The title
attribute must,
on getting, run the following algorithm:
If the root element is an svg
element in the "http://www.w3.org/2000/svg
"
namespace, and the user agent supports SVG, then return the value
that would have been returned by the IDL attribute of the same name
on the SVGDocument
interface. [SVG]
Otherwise, let value be a concatenation
of the data of all the child Text
nodes of the
title
element, in tree order, or
the empty string if the title
element is
null.
Replace any sequence of one or more consecutive space characters in value with a single U+0020 SPACE character.
Strip leading and trailing whitespace from value.
Return value.
On setting, the following algorithm must be run. Mutation events must be fired as appropriate.
If the root element is an svg
element in the "http://www.w3.org/2000/svg
"
namespace, and the user agent supports SVG, then the setter must
act as if it was the setter for the IDL attribute of the same name
on the Document
interface defined by the SVG
specification. Stop the algorithm here. [SVG]
title
element is null and
the head
element is null, then the
attribute must do nothing. Stop the algorithm here.title
element is null, then a
new title
element must be created and appended to
the head
element. Let element be that element. Otherwise, let element be the title
element.Text
node whose data is the new value
being assigned must be appended to element.The title
IDL attribute
defined above must replace the attribute of the same name on the
Document
interface defined by the SVG specification
when the user agent supports both HTML and SVG. [SVG]
body
[ = value ]Returns the body element.
Can be set, to replace the body element.
If the new value is not a body
or frameset
element, this will throw a HierarchyRequestError
exception.
The body element of a document is the first child of
the html
element that is either a
body
element or a frameset
element. If
there is no such element, it is null.
The body
attribute, on getting, must return the body element of
the document (either a body
element, a
frameset
element, or null). On setting, the following
algorithm must be run:
body
or
frameset
element, then throw a
HierarchyRequestError
exception and abort these
steps.replaceChild()
method had been
called with the new value and the
incumbent body element as its two arguments respectively,
then abort these steps.images
Returns an HTMLCollection
of the img
elements in the Document
.
embeds
plugins
Return an HTMLCollection
of the embed
elements in the Document
.
links
Returns an HTMLCollection
of the a
and area
elements in the Document
that have href
attributes.
forms
Return an HTMLCollection
of the form
elements in the Document
.
scripts
Return an HTMLCollection
of the script
elements in the Document
.
The images
attribute must return an HTMLCollection
rooted at the
Document
node, whose filter matches only
img
elements.
The embeds
attribute must return an HTMLCollection
rooted at the
Document
node, whose filter matches only
embed
elements.
The plugins
attribute must return the same object as that returned by the embeds
attribute.
The links
attribute must return an HTMLCollection
rooted at the
Document
node, whose filter matches only a
elements with href
attributes and area
elements with href
attributes.
The forms
attribute must return an HTMLCollection
rooted at the
Document
node, whose filter matches only
form
elements.
The scripts
attribute must return an HTMLCollection
rooted at the
Document
node, whose filter matches only
script
elements.
getElementsByName
(name)Returns a NodeList
of elements in the
Document
that have a name
attribute with the value name.
The getElementsByName(name)
method takes a string name, and must return a live
NodeList
containing all the HTML elements
in that document that have a name
attribute
whose value is equal to the name argument (in a
case-sensitive manner), in tree order.
When the method is invoked on a Document
object again
with the same argument, the user agent may return the same as the
object returned by the earlier call. In other cases, a new
NodeList
object must be returned.
The Document
interface supports named
properties. The supported property names at any
moment consist of the values of the name
content attributes of all the
applet
,
exposed embed
,
form
,
iframe
,
img
, and
exposed object
elements in the Document
that have name
content attributes, and the values of
the id
content attributes of all the
applet
and
exposed object
elements in the Document
that have id
content attributes, and the values of the
id
content attributes of all the
img
elements in the Document
that have both name
content attributes and id
content attributes.
To determine the value of a named property name when the
Document
object is indexed for property
retrieval, the user agent must return the value obtained using
the following steps:
Let elements be the list of named elements with
the name name in the Document
.
There will be at least one such element, by definition.
If elements has only one element, and that
element is an iframe
element, then return the
WindowProxy
object of the nested browsing
context represented by that iframe
element,
and abort these steps.
Otherwise, if elements has only one element, return that element and abort these steps.
Otherwise return an HTMLCollection
rooted at the
Document
node, whose filter matches only named elements with
the name name.
Named elements with the name name, for the purposes of the above algorithm, are those that are either:
applet
, exposed embed
,
form
, iframe
, img
, or
exposed object
elements that have a name
content attribute whose value is name, orapplet
or exposed object
elements that have an id
content
attribute whose value is name, orimg
elements that have an id
content attribute whose value is name, and that have a name
content attribute present also.An embed
or object
element is said to
be exposed if it has no exposed
object
ancestor, and, for object
elements,
is additionally either not showing its fallback content
or has no object
or embed
descendants.
The dir
attribute on the Document
interface is defined
along with the dir
content
attribute.
partial interface XMLDocument { boolean load(DOMString url); };
The load(url)
method must run the following
steps:
Let document be the XMLDocument
object on which the method was invoked.
Resolve the method's
first argument, relative to the entry script's base URL. If this is not
successful, throw a SyntaxError
exception and abort
these steps. Otherwise, let url be the
resulting absolute URL.
If the origin of url is not
the same as the origin of document, throw a SecurityError
exception and abort these steps.
Remove all child nodes of document, without firing any mutation events.
Set the current document readiness of document to "loading
".
Run the remainder of these steps asynchronously, and return true from the method.
Let result be a Document
object.
Let success be false.
Fetch url from the origin of document, using the entry script's referrer source, with the synchronous flag set and the force same-origin flag set.
If the fetch attempt was successful, and the resource's Content-Type metadata is an XML MIME type, then run these substeps:
Create a new XML parser associated with the result document.
Pass this parser the fetched document.
If there is an XML well-formedness or XML namespace well-formedness error, then remove all child nodes from result. Otherwise let success be true.
Queue a task to run the following steps.
Set the current document readiness of document to "complete
".
Replace all the children of document
by the children of result (even if it has no
children), firing mutation events as if a
DocumentFragment
containing the new children had
been inserted.
Fire a simple event named load
at document.
Elements, attributes, and attribute values in HTML are defined
(by this specification) to have certain meanings (semantics). For
example, the ol
element represents an ordered list, and
the lang
attribute represents the
language of the content.
These definitions allow HTML processors, such as Web browsers or search engines, to present and use documents and applications in a wide variety of contexts that the author might not have considered.
As a simple example, consider a Web page written by an author who only considered desktop computer Web browsers. Because HTML conveys meaning, rather than presentation, the same page can also be used by a small browser on a mobile phone, without any change to the page. Instead of headings being in large letters as on the desktop, for example, the browser on the mobile phone might use the same size text for the whole the page, but with the headings in bold.
But it goes further than just differences in screen size: the same page could equally be used by a blind user using a browser based around speech synthesis, which instead of displaying the page on a screen, reads the page to the user, e.g. using headphones. Instead of large text for the headings, the speech browser might use a different volume or a slower voice.
That's not all, either. Since the browsers know which parts of the page are the headings, they can create a document outline that the user can use to quickly navigate around the document, using keys for "jump to next heading" or "jump to previous heading". Such features are especially common with speech browsers, where users would otherwise find quickly navigating a page quite difficult.
Even beyond browsers, software can make use of this information. Search engines can use the headings to more effectively index a page, or to provide quick links to subsections of the page from their results. Tools can use the headings to create a table of contents (that is in fact how this very specification's table of contents is generated).
This example has focused on headings, but the same principle applies to all of the semantics in HTML.
Authors must not use elements, attributes, or attribute values for purposes other than their appropriate intended semantic purpose, as doing so prevents software from correctly processing the page.
For example, the following document is non-conforming, despite being syntactically correct:
<!DOCTYPE HTML> <html lang="en-GB"> <head> <title> Demonstration </title> </head> <body> <table> <tr> <td> My favourite animal is the cat. </td> </tr> <tr> <td> —<a href="http://example.org/~ernest/"><cite>Ernest</cite></a>, in an essay from 1992 </td> </tr> </table> </body> </html>
...because the data placed in the cells is clearly not tabular
data (and the cite
element mis-used). This would make
software that relies on these semantics fail: for example, a speech
browser that allowed a blind user to navigate tables in the
document would report the quote above as a table, confusing the
user; similarly, a tool that extracted titles of works from pages
would extract "Ernest" as the title of a work, even though it's
actually a person's name, not a title.
A corrected version of this document might be:
<!DOCTYPE HTML> <html lang="en-GB"> <head> <title> Demonstration </title> </head> <body> <blockquote> <p> My favourite animal is the cat. </p> </blockquote> <p> —<a href="http://example.org/~ernest/">Ernest</a>, in an essay from 1992 </p> </body> </html>
This next document fragment, intended to represent the heading of a corporate site, is similarly non-conforming because the second line is not intended to be a heading of a subsection, but merely a subheading or subtitle (a subordinate heading for the same section).
<body> <h1>ABC Company</h1> <h2>Leading the way in widget design since 1432</h2> ...
The hgroup
element is intended for these kinds of
situations:
<body> <hgroup> <h1>ABC Company</h1> <h2>Leading the way in widget design since 1432</h2> </hgroup> ...
Authors must not use elements, attributes, or attribute values that are not permitted by this specification or other applicable specifications, as doing so makes it significantly harder for the language to be extended in the future.
In the next example, there is a non-conforming attribute value ("carpet") and a non-conforming attribute ("texture"), which is not permitted by this specification:
<label>Carpet: <input type="carpet" name="c" texture="deep pile"></label>
Here would be an alternative and correct way to mark this up:
<label>Carpet: <input type="text" class="carpet" name="c" data-texture="deep pile"></label>
Through scripting and using other mechanisms, the values of attributes, text, and indeed the entire structure of the document may change dynamically while a user agent is processing it. The semantics of a document at an instant in time are those represented by the state of the document at that instant in time, and the semantics of a document can therefore change over time. User agents must update their presentation of the document as this occurs.
HTML has a progress
element that
describes a progress bar. If its "value" attribute is dynamically
updated by a script, the UA would update the rendering to show the
progress changing.
The nodes representing HTML elements in the DOM must implement, and expose to scripts, the interfaces listed for them in the relevant sections of this specification. This includes HTML elements in XML documents, even when those documents are in another context (e.g. inside an XSLT transform).
Elements in the DOM represent things; that is, they have intrinsic meaning, also known as semantics.
For example, an ol
element
represents an ordered list.
The basic interface, from which all the HTML
elements' interfaces inherit, and which
must be used by elements that have no additional
requirements, is the HTMLElement
interface.
interface HTMLElement : Element { // metadata attributes attribute DOMString title; attribute DOMString lang; attribute boolean translate; attribute DOMString dir; readonly attribute DOMStringMap dataset; // user interaction attribute boolean hidden; void click(); attribute long tabIndex; void focus(); void blur(); attribute DOMString accessKey; readonly attribute DOMString accessKeyLabel; attribute boolean draggable; [PutForwards=value] readonly attribute DOMSettableTokenList dropzone; attribute DOMString contentEditable; readonly attribute boolean isContentEditable; attribute HTMLMenuElement? contextMenu; attribute boolean spellcheck; // command API readonly attribute DOMString? commandType; readonly attribute DOMString? commandLabel; readonly attribute DOMString? commandIcon; readonly attribute boolean? commandHidden; readonly attribute boolean? commandDisabled; readonly attribute boolean? commandChecked; // styling readonly attribute CSSStyleDeclaration style; // event handler IDL attributes attribute EventHandler onabort; attribute EventHandler onblur; attribute EventHandler oncancel; attribute EventHandler oncanplay; attribute EventHandler oncanplaythrough; attribute EventHandler onchange; attribute EventHandler onclick; attribute EventHandler onclose; attribute EventHandler oncontextmenu; attribute EventHandler oncuechange; attribute EventHandler ondblclick; attribute EventHandler ondrag; attribute EventHandler ondragend; attribute EventHandler ondragenter; attribute EventHandler ondragleave; attribute EventHandler ondragover; attribute EventHandler ondragstart; attribute EventHandler ondrop; attribute EventHandler ondurationchange; attribute EventHandler onemptied; attribute EventHandler onended; attribute OnErrorEventHandler onerror; attribute EventHandler onfocus; attribute EventHandler oninput; attribute EventHandler oninvalid; attribute EventHandler onkeydown; attribute EventHandler onkeypress; attribute EventHandler onkeyup; attribute EventHandler onload; attribute EventHandler onloadeddata; attribute EventHandler onloadedmetadata; attribute EventHandler onloadstart; attribute EventHandler onmousedown; attribute EventHandler onmousemove; attribute EventHandler onmouseout; attribute EventHandler onmouseover; attribute EventHandler onmouseup; attribute EventHandler onmousewheel; attribute EventHandler onpause; attribute EventHandler onplay; attribute EventHandler onplaying; attribute EventHandler onprogress; attribute EventHandler onratechange; attribute EventHandler onreset; attribute EventHandler onscroll; attribute EventHandler onseeked; attribute EventHandler onseeking; attribute EventHandler onselect; attribute EventHandler onshow; attribute EventHandler onstalled; attribute EventHandler onsubmit; attribute EventHandler onsuspend; attribute EventHandler ontimeupdate; attribute EventHandler onvolumechange; attribute EventHandler onwaiting; }; interface HTMLUnknownElement : HTMLElement { };
The HTMLElement
interface holds methods and
attributes related to a number of disparate features, and the
members of this interface are therefore described in various
different sections of this specification.
The HTMLUnknownElement
interface must be used for
HTML elements that are not defined by this
specification (or other applicable specifications).
The following attributes are common to and may be specified on all HTML elements (even those not defined in this specification):
accesskey
class
contenteditable
contextmenu
dir
draggable
dropzone
hidden
id
lang
spellcheck
style
tabindex
title
translate
These attributes are only defined by this specification as attributes for HTML elements. When this specification refers to elements having these attributes, elements from namespaces that are not defined as having these attributes must not be considered as being elements with these attributes.
For example, in the following XML fragment, the "bogus
" element does not have a dir
attribute as defined in this
specification, despite having an attribute with the literal name
"dir
". Thus, the directionality
of the inner-most span
element is 'rtl', inherited from the
div
element indirectly through the "bogus
" element.
<div xmlns="http://www.w3.org/1999/html" dir="rtl"> <bogus xmlns="http://example.net/ns" dir="ltr"> <span xmlns="http://www.w3.org/1999/html"> </span> </bogus> </div>
The following event handler content attributes may be specified on any HTML element:
onabort
onblur
*oncancel
oncanplay
oncanplaythrough
onchange
onclick
onclose
oncontextmenu
oncuechange
ondblclick
ondrag
ondragend
ondragenter
ondragleave
ondragover
ondragstart
ondrop
ondurationchange
onemptied
onended
onerror
*onfocus
*oninput
oninvalid
onkeydown
onkeypress
onkeyup
onload
*onloadeddata
onloadedmetadata
onloadstart
onmousedown
onmousemove
onmouseout
onmouseover
onmouseup
onmousewheel
onpause
onplay
onplaying
onprogress
onratechange
onreset
onscroll
*onseeked
onseeking
onselect
onshow
onstalled
onsubmit
onsuspend
ontimeupdate
onvolumechange
onwaiting
The attributes marked with an asterisk have a
different meaning when specified on body
elements as
those elements expose event handlers of the
Window
object with the same names.
While these attributes apply to all elements, they
are not useful on all elements. For example, only media elements will ever receive a volumechange
event fired by
the user agent.
Custom data attributes
(e.g. data-foldername
or data-msgid
) can be specified on any HTML element, to store custom data
specific to the page.
In HTML documents, elements in the HTML
namespace may have an xmlns
attribute
specified, if, and only if, it has the exact value
"http://www.w3.org/1999/xhtml
". This does not apply to
XML documents.
In HTML, the xmlns
attribute
has absolutely no effect. It is basically a talisman. It is allowed
merely to make migration to and from XHTML mildly easier. When
parsed by an HTML parser, the attribute ends up in no
namespace, not the "http://www.w3.org/2000/xmlns/
"
namespace like namespace declaration attributes in XML do.
In XML, an xmlns
attribute is
part of the namespace declaration mechanism, and an element cannot
actually have an xmlns
attribute in no
namespace specified.
The XML specification also allows the use of the xml:space
attribute in the XML
namespace on any element in an XML document. This attribute has no effect on
HTML elements, as the default behavior in HTML is to
preserve whitespace. [XML]
There is no way to serialize the xml:space
attribute on HTML
elements in the text/html
syntax.
To enable assistive technology products to expose a more
fine-grained interface than is otherwise possible with HTML elements
and attributes, a set of annotations for
assistive technology products can be specified (the ARIA
role
and aria-*
attributes).
id
attributeThe id
attribute specifies its
element's unique identifier (ID). [DOMCORE]
The value must be unique amongst all the IDs in the element's home subtree and must contain at least one character. The value must not contain any space characters.
An element's unique identifier can be used for a variety of purposes, most notably as a way to link to specific parts of a document using fragment identifiers, as a way to target an element when scripting, and as a way to style a specific element from CSS.
Identifiers are opaque strings. Particular meanings should not be
derived from the value of the id
attribute.
title
attributeThe title
attribute
represents advisory information for the element, such
as would be appropriate for a tooltip. On a link, this could be the
title or a description of the target resource; on an image, it could
be the image credit or a description of the image; on a paragraph,
it could be a footnote or commentary on the text; on a citation, it
could be further information about the source; on interactive
content, it could be a label for, or instructions for, use of
the element; and so forth. The value is text.
Relying on the title
attribute is currently discouraged as many user agents do not expose
the attribute in an accessible manner as required by this
specification (e.g. requiring a pointing device such as a mouse to
cause a tooltip to apear, which excludes keyboard-only users and
touch-only users, such as anyone with a modern phone or tablet).
If this attribute is omitted from an element, then it implies
that the title
attribute of the
nearest ancestor HTML element
with a title
attribute set is also
relevant to this element. Setting the attribute overrides this,
explicitly stating that the advisory information of any ancestors is
not relevant to this element. Setting the attribute to the empty
string indicates that the element has no advisory information.
If the title
attribute's value
contains "LF" (U+000A) characters, the content is split into
multiple lines. Each "LF" (U+000A) character represents a
line break.
Caution is advised with respect to the use of newlines in title
attributes.
For instance, the following snippet actually defines an abbreviation's expansion with a line break in it:
<p>My logs show that there was some interest in <abbr title="Hypertext Transport Protocol">HTTP</abbr> today.</p>
Some elements, such as link
, abbr
, and
input
, define additional semantics for the title
attribute beyond the semantics
described above.
The advisory information of an element is the value that the following algorithm returns, with the algorithm being aborted once a value is returned. When the algorithm returns the empty string, then there is no advisory information.
If the element is a link
, style
,
dfn
, abbr
, or title
element,
then: if the element has a title
attribute,
return the value of that attribute, otherwise, return the empty
string.
Otherwise, if the element has a title
attribute, then return its
value.
Otherwise, if the element has a parent element, then return the parent element's advisory information.
Otherwise, return the empty string.
User agents should inform the user when elements have advisory information, otherwise the information would not be discoverable.
The title
IDL attribute
must reflect the title
content attribute.
lang
and xml:lang
attributesThe lang
attribute (in
no namespace) specifies the primary language for the element's
contents and for any of the element's attributes that contain
text. Its value must be a valid BCP 47 language tag, or the empty
string. Setting the attribute to the empty string indicates that the
primary language is unknown. [BCP47]
The lang
attribute in the XML namespace is defined in XML. [XML]
If these attributes are omitted from an element, then the language of this element is the same as the language of its parent element, if any.
The lang
attribute in no namespace
may be used on any HTML
element.
The lang
attribute in the XML namespace may be used on
HTML elements in XML documents, as well as
elements in other namespaces if the relevant specifications allow it
(in particular, MathML and SVG allow lang
attributes in the
XML namespace to be specified on their
elements). If both the lang
attribute
in no namespace and the lang
attribute in the XML
namespace are specified on the same element, they must
have exactly the same value when compared in an ASCII
case-insensitive manner.
Authors must not use the lang
attribute in the XML
namespace on HTML elements in HTML
documents. To ease migration to and from XHTML, authors may
specify an attribute in no namespace with no prefix and with the
literal localname "xml:lang
" on HTML
elements in HTML documents, but such attributes
must only be specified if a lang
attribute in no namespace is also specified, and both attributes
must have the same value when compared in an ASCII
case-insensitive manner.
The attribute in no namespace with no prefix and
with the literal localname "xml:lang
" has no
effect on language processing.
To determine the language of a node, user agents must
look at the nearest ancestor element (including the element itself
if the node is an element) that has a lang
attribute in the
XML namespace set or is an HTML element and has a lang
in no namespace attribute set. That
attribute specifies the language of the node (regardless of its
value).
If both the lang
attribute in no
namespace and the lang
attribute in the XML
namespace are set on an element, user agents must use
the lang
attribute
in the XML namespace, and the lang
attribute in no namespace must be
ignored for the purposes of determining
the element's language.
If neither the node nor any of the node's ancestors, including the root element, have either attribute set, but there is a pragma-set default language set, then that is the language of the node. If there is no pragma-set default language set, then language information from a higher-level protocol (such as HTTP), if any, must be used as the final fallback language instead. In the absence of any such language information, and in cases where the higher-level protocol reports multiple languages, the language of the node is unknown, and the corresponding language tag is the empty string.
If the resulting value is not a recognized language tag, then it must be treated as an unknown language having the given language tag, distinct from all other languages. For the purposes of round-tripping or communicating with other services that expect language tags, user agents should pass unknown language tags through unmodified.
Thus, for instance, an element with lang="xyzzy"
would be matched by the selector :lang(xyzzy)
(e.g. in CSS), but it would not be
matched by :lang(abcde)
, even though both are
equally invalid. Similarly, if a Web browser and screen reader
working in unison communicated about the language of the element,
the browser would tell the screen reader that the language was
"xyzzy", even if it knew it was invalid, just in case the screen
reader actually supported a language with that tag after all.
If the resulting value is the empty string, then it must be interpreted as meaning that the language of the node is explicitly unknown.
User agents may use the element's language to determine proper processing or rendering (e.g. in the selection of appropriate fonts or pronunciations, for dictionary selection, or for the user interfaces of form controls such as date pickers).
The lang
IDL attribute
must reflect the lang
content attribute in no namespace.
translate
attributeThe translate
attribute is an enumerated attribute that is used to
specify whether an element's attribute values and the values of its
Text
node children are to be translated when the page
is localized, or whether to leave them unchanged.
The attribute's keywords are the empty string, yes
, and no
. The empty string
and the yes
keyword map to the yes
state. The no
keyword maps to the no
state. In addition, there is a third state, the inherit
state, which is the missing value default (and the invalid
value default).
Each element has a translation mode, which is in
either the translate-enabled state or the
no-translate state. If the element's translate
attribute is in the
yes state, then the element's translation mode
is in the translate-enabled state. Otherwise, if the
element's translate
attribute is
in the no state, then the element's translation
mode is in the no-translate state. Otherwise,
the element's translate
attribute is in the inherit state; in that case, the
element's translation mode is in the same state as its
parent element, if any, or in the translate-enabled
state, if the element is a root element.
When an element is in the translate-enabled state, the
element's attribute values and the values of its Text
node children are to be translated when the page is localized.
When an element is in the no-translate state, the
element's attribute values and the values of its Text
node children are to be left as-is when the page is localized, e.g.
because the element contains a person's name or a the name of a
computer program.
The translate
IDL
attribute must, on getting, return true if the element's
translation mode is translate-enabled, and
false otherwise. On setting, it must set the content attribute's
value to "yes
" if the new value is true, and
set the content attribute's value to "no
"
otherwise.
In this example, everything in the document is to be translated when the page is localised, except the sample keyboard input and sample program output:
<!DOCTYPE HTML> <html> <!-- default on the root element is translate=yes --> <head> <title>The Bee Game</title> <!-- implied translate=yes inherited from ancestors --> </head> <body> <p>The Bee Game is a text adventure game in English.</p> <p>When the game launches, the first thing you should do is type <kbd translate=no>eat honey</kbd>. The game will respond with:</p> <pre><samp translate=no>Yum yum! That was some good honey!</samp></pre> </body> </html>
xml:base
attribute (XML only)The xml:base
attribute is
defined in XML Base. [XMLBASE]
The xml:base
attribute may be
used on HTML elements of XML documents.
Authors must not use the xml:base
attribute on HTML elements in HTML
documents.
dir
attributeThe dir
attribute specifies the
element's text directionality. The attribute is an enumerated
attribute with the following keywords and states:
ltr
keyword, which maps to the ltr stateIndicates that the contents of the element are explicitly directionally embedded left-to-right text.
rtl
keyword, which maps to the rtl stateIndicates that the contents of the element are explicitly directionally embedded right-to-left text.
auto
keyword, which maps to the auto stateIndicates that the contents of the element are explicitly embedded text, but that the direction is to be determined programmatically using the contents of the element (as described below).
The heuristic used by this state is very crude (it just looks at the first character with a strong directionality, in a manner analogous to the Paragraph Level determination in the bidirectional algorithm). Authors are urged to only use this value as a last resort when the direction of the text is truly unknown and no better server-side heuristic can be applied. [BIDI]
For textarea
and pre
elements, the heuristic is applied on a per-paragraph level.
The attribute has no invalid value default and no missing value default.
The directionality of an element is either 'ltr' or 'rtl', and is determined as per the first appropriate set of steps from the following list:
dir
attribute is
in the ltr stateThe directionality of the element is 'ltr'.
dir
attribute is
in the rtl stateThe directionality of the element is 'rtl'.
input
element whose type
attribute is in the Text, Search, Telephone, URL, or E-mail state, and the dir
attribute is in the auto statetextarea
element and the dir
attribute is in the auto stateIf the element's value contains a character of bidirectional character type AL or R, and there is no character of bidirectional character type L anywhere before it in the element's value, then the directionality of the element is 'rtl'. Otherwise, the directionality of the element is 'ltr'. [BIDI]
dir
attribute is
in the auto statebdi
element and the dir
attribute is not in a defined state
(i.e. it is not present or has an invalid value)Find the first character in tree order that matches the following criteria:
The character is from a Text
node that is a
descendant of the element whose directionality is being
determined.
The character is of bidirectional character type L, AL, or R. [BIDI]
The character is not in a Text
node that has an
ancestor element that is a descendant of the element whose directionality is being
determined and that is either:
If such a character is found and it is of bidirectional character type AL or R, the directionality of the element is 'rtl'.
Otherwise, the directionality of the element is 'ltr'.
dir
attribute is not in a defined state
(i.e. it is not present or has an invalid value)The directionality of the element is 'ltr'.
dir
attribute is not in a defined state
(i.e. it is not present or has an invalid value)The directionality of the element is the same as the element's parent element's directionality.
The effect of this attribute is primarily on the presentation layer. For example, the rendering section in this specification defines a mapping from this attribute to the CSS 'direction' and 'unicode-bidi' properties, and CSS defines rendering in terms of those properties.
dir
[ = value ]Returns the html
element's dir
attribute's value, if any.
Can be set, to either "ltr
", "rtl
", or "auto
" to replace the html
element's dir
attribute's value.
If there is no html
element, returns the empty string and ignores new values.
The dir
IDL attribute on
an element must reflect the dir
content attribute of that element,
limited to only known values.
The dir
IDL
attribute on Document
objects must
reflect the dir
content
attribute of the html
element, if any,
limited to only known values. If there is no such
element, then the attribute must return the empty string and do
nothing on setting.
Authors are strongly encouraged to use the dir
attribute to indicate text direction
rather than using CSS, since that way their documents will continue
to render correctly even in the absence of CSS (e.g. as interpreted
by search engines).
This markup fragment is of an IM conversation.
<p dir=auto class="u1"><b><bdi>Student</bdi>:</b> How do you write "What's your name?" in Arabic?</p> <p dir=auto class="u2"><b><bdi>Teacher</bdi>:</b> ما اسمك؟</p> <p dir=auto class="u1"><b><bdi>Student</bdi>:</b> Thanks.</p> <p dir=auto class="u2"><b><bdi>Teacher</bdi>:</b> That's written "شكرًا".</p> <p dir=auto class="u2"><b><bdi>Teacher</bdi>:</b> Do you know how to write "Please"?</p> <p dir=auto class="u1"><b><bdi>Student</bdi>:</b> "من فضلك", right?</p>
Given a suitable style sheet and the default alignment styles
for the p
element, namely to align the text to the
start edge of the paragraph, the resulting rendering could
be as follows:
As noted earlier, the auto
value is not a panacea. The final paragraph in this example is
misinterpreted as being right-to-left text, since it begins with an
Arabic character, which causes the "right?" to be to the left of
the Arabic text.
class
attributeEvery HTML element may have a
class
attribute specified.
The attribute, if specified, must have a value that is a set of space-separated tokens representing the various classes that the element belongs to.
The classes that an HTML
element has assigned to it consists of all the classes
returned when the value of the class
attribute is split on
spaces. (Duplicates are ignored.)
Assigning classes to an element affects class
matching in selectors in CSS, the getElementsByClassName()
method in the DOM, and other such features.
There are no additional restrictions on the tokens authors can
use in the class
attribute, but
authors are encouraged to use values that describe the nature of the
content, rather than values that describe the desired presentation
of the content.
The className
and classList
IDL attributes,
defined in the DOM Core specification, reflect the
class
content attribute. [DOMCORE]
style
attributeAll HTML elements may have the style
content attribute set. This is a
CSS styling attribute as defined by the CSS Styling
Attribute Syntax specification. [CSSATTR]
In user agents that support CSS, the attribute's value must be parsed when the attribute is added or has its value changed, according to the rules given for CSS styling attributes. [CSSATTR]
Documents that use style
attributes on any of their elements must still be comprehensible and
usable if those attributes were removed.
In particular, using the style
attribute to hide and show content,
or to convey meaning that is otherwise not included in the document,
is non-conforming. (To hide and show content, use the hidden
attribute.)
style
Returns a CSSStyleDeclaration
object for the element's style
attribute.
The style
IDL attribute
must return a CSSStyleDeclaration
whose value
represents the declarations specified in the attribute. (If the
attribute is absent, the object represents an empty declaration.)
Mutating the CSSStyleDeclaration
object must create a
style
attribute on the element (if
there isn't one already) and then change its value to be a value
representing the serialized form of the
CSSStyleDeclaration
object. The same object must be
returned each time. [CSSOM]
In the following example, the words that refer to colors are
marked up using the span
element and the style
attribute to make those words show
up in the relevant colors in visual media.
<p>My sweat suit is <span style="color: green; background: transparent">green</span> and my eyes are <span style="color: blue; background: transparent">blue</span>.</p>
data-*
attributesA custom data attribute is an attribute in no
namespace whose name starts with the string "data-
", has at least one
character after the hyphen, is XML-compatible, and
contains no characters in the range U+0041 to U+005A (LATIN CAPITAL
LETTER A to LATIN CAPITAL LETTER Z).
All attribute names on HTML elements in HTML documents get ASCII-lowercased automatically, so the restriction on ASCII uppercase letters doesn't affect such documents.
Custom data attributes are intended to store custom data private to the page or application, for which there are no more appropriate attributes or elements.
These attributes are not intended for use by software that is independent of the site that uses the attributes.
For instance, a site about music could annotate list items representing tracks in an album with custom data attributes containing the length of each track. This information could then be used by the site itself to allow the user to sort the list by track length, or to filter the list for tracks of certain lengths.
<ol> <li data-length="2m11s">Beyond The Sea</li> ... </ol>
It would be inappropriate, however, for the user to use generic software not associated with that music site to search for tracks of a certain length by looking at this data.
This is because these attributes are intended for use by the site's own scripts, and are not a generic extension mechanism for publicly-usable metadata.
Every HTML element may have any number of custom data attributes specified, with any value.
dataset
Returns a DOMStringMap
object for the element's data-*
attributes.
Hyphenated names become camel-cased. For example, data-foo-bar=""
becomes element.dataset.fooBar
.
The dataset
IDL
attribute provides convenient accessors for all the data-*
attributes on an element. On
getting, the dataset
IDL attribute
must return a DOMStringMap
object, associated with the
following algorithms, which expose these attributes on their
element:
data-
" and whose
remaining characters (if any) do not include any characters in
the range U+0041 to U+005A (LATIN CAPITAL LETTER A to LATIN
CAPITAL LETTER Z), add a name-value pair to list whose name is the attribute's name with the
first five characters removed and whose value is the attribute's
value.SyntaxError
exception and abort these
steps.data-
at the front of
name.setAttribute()
would have thrown an
exception when setting an attribute with the name name, then this must throw the same
exception.data-
at the front of
name.The same object must be returned each time.
If a Web page wanted an element to represent a space ship,
e.g. as part of a game, it would have to use the class
attribute along with data-*
attributes:
<div class="spaceship" data-ship-id="92432" data-weapons="laser 2" data-shields="50%" data-x="30" data-y="10" data-z="90"> <button class="fire" onclick="spaceships[this.parentNode.dataset.shipId].fire()"> Fire </button> </div>
Notice how the hyphenated attribute name becomes camel-cased in the API.
Authors should carefully design such extensions so that when the attributes are ignored and any associated CSS dropped, the page is still usable.
User agents must not derive any implementation behavior from these attributes or values. Specifications intended for user agents must not define these attributes to have any meaningful values.
JavaScript libraries may use the custom data attributes, as they are considered to be part of the page on which they are used. Authors of libraries that are reused by many authors are encouraged to include their name in the attribute names, to reduce the risk of clashes. Where it makes sense, library authors are also encouraged to make the exact name used in the attribute names customizable, so that libraries whose authors unknowingly picked the same name can be used on the same page, and so that multiple versions of a particular library can be used on the same page even when those versions are not mutually compatible.
For example, a library called "DoQuery" could use attribute
names like data-doquery-range
, and a library
called "jJo" could use attributes names like data-jjo-range
. The jJo library could also provide
an API to set which prefix to use (e.g. J.setDataPrefix('j2')
, making the attributes have
names like data-j2-range
).
Each element in this specification has a definition that includes the following information:
A list of categories to which the element belongs. These are used when defining the content models for each element.
A non-normative description of where the element can be used. This information is redundant with the content models of elements that allow this one as a child, and is provided only as a convenience.
For simplicity, only the most specific expectations are listed. For example, an element that is both flow content and phrasing content can be used anywhere that either flow content or phrasing content is expected, but since anywhere that flow content is expected, phrasing content is also expected (since all phrasing content is flow content), only "where phrasing content is expected" will be listed.
A normative description of what content must be included as children and descendants of the element.
A normative list of attributes that may be specified on the element (except where otherwise disallowed).
A normative definition of a DOM interface that such elements must implement.
This is then followed by a description of what the element represents, along with any additional normative conformance criteria that may apply to authors and implementations. Examples are sometimes also included.
Except where otherwise specified, attributes on HTML elements may have any string value, including the empty string. Except where explicitly stated, there is no restriction on what text can be specified in such attributes.
Each element defined in this specification has a content model: a description of the element's expected contents. An HTML element must have contents that match the requirements described in the element's content model.
The space characters are
always allowed between elements. User agents represent these
characters between elements in the source markup as
Text
nodes in the DOM. Empty Text
nodes and
Text
nodes consisting of just sequences of those
characters are considered inter-element whitespace.
Inter-element whitespace, comment nodes, and processing instruction nodes must be ignored when establishing whether an element's contents match the element's content model or not, and must be ignored when following algorithms that define document and element semantics.
Thus, an element A is said to be
preceded or followed by a second element B if A and B
have the same parent node and there are no other element nodes or
Text
nodes (other than inter-element
whitespace) between them. Similarly, a node is the only
child of an element if that element contains no other nodes
other than inter-element whitespace, comment nodes, and
processing instruction nodes.
Authors must not use HTML elements anywhere except where they are explicitly allowed, as defined for each element, or as explicitly required by other specifications. For XML compound documents, these contexts could be inside elements from other namespaces, if those elements are defined as providing the relevant contexts.
For example, the Atom specification defines a content
element. When its type
attribute has the value xhtml
, the Atom specification requires that it
contain a single HTML div
element. Thus, a
div
element is allowed in that context, even though
this is not explicitly normatively stated by this specification. [ATOM]
In addition, HTML elements may be orphan nodes (i.e. without a parent node).
For example, creating a td
element and storing it
in a global variable in a script is conforming, even though
td
elements are otherwise only supposed to be used
inside tr
elements.
var data = { name: "Banana", cell: document.createElement('td'), };
Each element in HTML falls into zero or more categories that group elements with similar characteristics together. The following broad categories are used in this specification:
Some elements also fall into other categories, which are defined in other parts of this specification.
These categories are related as follows:
Sectioning content, heading content, phrasing content, embedded content, and interactive content are all types of flow content. Metadata is sometimes flow content. Metadata and interactive content are sometimes phrasing content. Embedded content is also a type of phrasing content, and sometimes is interactive content.
Other categories are also used for specific purposes, e.g. form controls are specified using a number of categories to define common requirements. Some elements have unique requirements and do not fit into any particular category.
Metadata content is content that sets up the presentation or behavior of the rest of the content, or that sets up the relationship of the document with other documents, or that conveys other "out of band" information.
Elements from other namespaces whose semantics are primarily metadata-related (e.g. RDF) are also metadata content.
Thus, in the XML serialization, one can use RDF, like this:
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:r="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <head> <title>Hedral's Home Page</title> <r:RDF> <Person xmlns="http://www.w3.org/2000/10/swap/pim/contact#" r:about="http://hedral.example.com/#"> <fullName>Cat Hedral</fullName> <mailbox r:resource="mailto:hedral@damowmow.com"/> <personalTitle>Sir</personalTitle> </Person> </r:RDF> </head> <body> <h1>My home page</h1> <p>I like playing with string, I guess. Sister says squirrels are fun too so sometimes I follow her to play with them.</p> </body> </html>
This isn't possible in the HTML serialization, however.
Most elements that are used in the body of documents and applications are categorized as flow content.
a
abbr
address
area
(if it is a descendant of a map
element)article
aside
audio
b
bdi
bdo
blockquote
br
button
canvas
cite
code
command
datalist
del
details
dfn
dialog
div
dl
em
embed
fieldset
figure
footer
form
h1
h2
h3
h4
h5
h6
header
hgroup
hr
i
iframe
img
input
ins
kbd
keygen
label
map
mark
math
menu
meter
nav
noscript
object
ol
output
p
pre
progress
q
ruby
s
samp
script
section
select
small
span
strong
style
(if the scoped
attribute is present)sub
sup
svg
table
textarea
time
u
ul
var
video
wbr
Sectioning content is content that defines the scope of headings and footers.
Each sectioning content element potentially has a heading and an outline. See the section on headings and sections for further details.
There are also certain elements that are sectioning roots. These are distinct from sectioning content, but they can also have an outline.
Heading content defines the header of a section (whether explicitly marked up using sectioning content elements, or implied by the heading content itself).
Phrasing content is the text of the document, as well as elements that mark up that text at the intra-paragraph level. Runs of phrasing content form paragraphs.
a
abbr
area
(if it is a descendant of a map
element)audio
b
bdi
bdo
br
button
canvas
cite
code
command
datalist
del
dfn
em
embed
i
iframe
img
input
ins
kbd
keygen
label
map
mark
math
meter
noscript
object
output
progress
q
ruby
s
samp
script
select
small
span
strong
sub
sup
svg
textarea
time
u
var
video
wbr
As a general rule, elements whose content model allows any
phrasing content should have either at least one
descendant Text
node that is not inter-element
whitespace, or at least one descendant element node that is
embedded content. For the purposes of this requirement,
nodes that are descendants of del
elements must not be
counted as contributing to the ancestors of the del
element.
Most elements that are categorized as phrasing content can only contain elements that are themselves categorized as phrasing content, not any flow content.
Text, in the context of content
models, means Text
nodes. Text is sometimes used as a content model on its
own, but is also phrasing content, and can be
inter-element whitespace (if the Text
nodes are empty or contain just space
characters).
Text
nodes and attribute values must consist of
Unicode characters, must not
contain U+0000 characters, must not contain permanently undefined
Unicode characters (noncharacters), and must not contain control
characters other than space
characters.
This specification includes extra constraints on the exact value of
Text
nodes and attribute values depending on their
precise context.
Embedded content is content that imports another resource into the document, or content from another vocabulary that is inserted into the document.
Elements that are from namespaces other than the HTML namespace and that convey content but not metadata, are embedded content for the purposes of the content models defined in this specification. (For example, MathML, or SVG.)
Some embedded content elements can have fallback content: content that is to be used when the external resource cannot be used (e.g. because it is of an unsupported format). The element definitions state what the fallback is, if any.
Interactive content is content that is specifically intended for user interaction.
a
audio
(if the controls
attribute is present)button
details
embed
iframe
img
(if the usemap
attribute is present)input
(if the type
attribute is not in the Hidden state)keygen
label
menu
(if the type
attribute is in the toolbar state)object
(if the usemap
attribute is present)select
textarea
video
(if the controls
attribute is present)Certain elements in HTML have an activation
behavior, which means that the user can activate them. This
triggers a sequence of events dependent on the activation mechanism,
and normally culminating in a click
event, as described below.
The user agent should allow the user to manually trigger elements that have an activation behavior, for instance using keyboard or voice input, or through mouse clicks. When the user triggers an element with a defined activation behavior in a manner other than clicking it, the default action of the interaction event must be to run synthetic click activation steps on the element.
Each element has a click in progress flag, initially set to false.
When a user agent is to run synthetic click activation steps on an element, the user agent must run the following steps:
If the element's click in progress flag is set to true, then abort these steps.
Set the click in progress flag on the element to true.
Run pre-click activation steps on the element.
Fire a click
event at the element. If the
run synthetic click activation steps algorithm was invoked because the click()
method was invoked, then the isTrusted
attribute must be initialized to false.
If this click
event is not
canceled, run post-click activation steps on the
element.
If the event is canceled, the user agent must run canceled activation steps on the element instead.
Set the click in progress flag on the element to false.
When a pointing device is clicked, the user agent must run these steps:
If the element's click in progress flag is set to true, then abort these steps.
Set the click in progress flag on the element to true.
Let e be the nearest activatable element of the element designated by the user (defined below), if any.
If there is an element e, run pre-click activation steps on it.
Dispatch the required click
event.
If there is an element e and the click
event is not canceled, run
post-click activation steps on element e.
If there is an element e and the event is canceled, run canceled activation steps on element e.
Set the click in progress flag on the element to false.
The above doesn't happen for arbitrary synthetic
events dispatched by author script. However, the click()
method can be used to make it
happen programmatically.
Click-focusing behavior (e.g. the focusing of a text field when user clicks in one) typically happens before the click, when the mouse button is first depressed, and is therefore not discussed here.
Given an element target, the nearest activatable element is the element returned by the following algorithm:
If target has a defined activation behavior, then return target and abort these steps.
If target has a parent element, then set target to that parent element and return to the first step.
Otherwise, there is no nearest activatable element.
When a user agent is to run pre-click activation steps on an element, it must run the pre-click activation steps defined for that element, if any.
When a user agent is to run canceled activation steps on an element, it must run the canceled activation steps defined for that element, if any.
When a user agent is to run post-click activation
steps on an element, it must run the activation
behavior defined for that element, if any. Activation
behaviors can refer to the click
event that was fired by the steps above leading up to this
point.
As a general rule, elements whose content model allows any
flow content or phrasing content should
have at least one child node that is palpable content
and that does not have the hidden
attribute specified.
This requirement is not a hard requirement, however, as there are many cases where an element can be empty legitimately, for example when it is used as a placeholder which will later be filled in by a script, or when the element is part of a template and would on most pages be filled in but on some pages is not relevant.
Conformance checkers are encouraged to provide a mechanism for authors to find elements that fail to fulfill this requirement, as an authoring aid.
The following elements are palpable content:
a
abbr
address
article
aside
audio
(if the controls
attribute is present)b
bdi
bdo
blockquote
button
canvas
cite
code
details
dfn
div
dl
(if the element's children include at least one name-value group)em
embed
fieldset
figure
footer
form
h1
h2
h3
h4
h5
h6
header
hgroup
i
iframe
img
input
(if the type
attribute is not in the Hidden state)ins
kbd
keygen
label
map
mark
math
menu
(if the type
attribute is in the toolbar state or the list state)meter
nav
object
ol
(if the element's children include at least one li
element)output
p
pre
progress
q
ruby
s
samp
section
select
small
span
strong
sub
sup
svg
table
textarea
time
u
ul
(if the element's children include at least one li
element)var
video
Some elements are described as transparent; they have "transparent" in the description of their content model. The content model of a transparent element is derived from the content model of its parent element: the elements required in the part of the content model that is "transparent" are the same elements as required in the part of the content model of the parent of the transparent element in which the transparent element finds itself.
For instance, an ins
element inside a
ruby
element cannot contain an rt
element, because the part of the ruby
element's
content model that allows ins
elements is the part
that allows phrasing content, and the rt
element is not phrasing content.
In some cases, where transparent elements are nested in each other, the process has to be applied iteratively.
Consider the following markup fragment:
<p><ins><map><a href="/">Apples</a></map></ins></p>
To check whether "Apples" is allowed inside the a
element, the content models are examined. The a
element's content model is transparent, as is the map
element's, as is the ins
element's. The ins
element is found in the
p
element, whose content model is phrasing
content. Thus, "Apples" is allowed, as text is phrasing
content.
When a transparent element has no parent, then the part of its content model that is "transparent" must instead be treated as accepting any flow content.
The term paragraph as defined in this
section is used for more than just the definition of the
p
element. The paragraph concept defined
here is used to describe how to interpret documents. The
p
element is merely one of several ways of marking up a
paragraph.
A paragraph is typically a run of phrasing content that forms a block of text with one or more sentences that discuss a particular topic, as in typography, but can also be used for more general thematic grouping. For instance, an address is also a paragraph, as is a part of a form, a byline, or a stanza in a poem.
In the following example, there are two paragraphs in a section. There is also a heading, which contains phrasing content that is not a paragraph. Note how the comments and inter-element whitespace do not form paragraphs.
<section> <h1>Example of paragraphs</h1> This is the <em>first</em> paragraph in this example. <p>This is the second.</p> <!-- This is not a paragraph. --> </section>
Paragraphs in flow content are defined relative to
what the document looks like without the a
,
ins
, del
, and map
elements
complicating matters, since those elements, with their hybrid
content models, can straddle paragraph boundaries, as shown in the
first two examples below.
Generally, having elements straddle paragraph boundaries is best avoided. Maintaining such markup can be difficult.
The following example takes the markup from the earlier example
and puts ins
and del
elements around some
of the markup to show that the text was changed (though in this
case, the changes admittedly don't make much sense). Notice how
this example has exactly the same paragraphs as the previous one,
despite the ins
and del
elements —
the ins
element straddles the heading and the first
paragraph, and the del
element straddles the boundary
between the two paragraphs.
<section> <ins><h1>Example of paragraphs</h1> This is the <em>first</em> paragraph in</ins> this example<del>. <p>This is the second.</p></del> <!-- This is not a paragraph. --> </section>
Let view be a view of the DOM that replaces
all a
, ins
, del
, and
map
elements in the document with their contents. Then,
in view, for each run of sibling phrasing
content nodes uninterrupted by other types of content, in an
element that accepts content other than phrasing
content as well as phrasing content, let first be the first node of the run, and let last be the last node of the run. For each such run
that consists of at least one node that is neither embedded
content nor inter-element whitespace, a
paragraph exists in the original DOM from immediately before first to immediately after last. (Paragraphs can thus span across
a
, ins
, del
, and
map
elements.)
Conformance checkers may warn authors of cases where they have
paragraphs that overlap each other (this can happen with
object
, video
, audio
, and
canvas
elements, and indirectly through elements in
other namespaces that allow HTML to be further embedded therein,
like svg
or math
).
A paragraph is also formed explicitly by
p
elements.
The p
element can be used to wrap
individual paragraphs when there would otherwise not be any content
other than phrasing content to separate the paragraphs from each
other.
In the following example, the link spans half of the first paragraph, all of the heading separating the two paragraphs, and half of the second paragraph. It straddles the paragraphs and the heading.
<header> Welcome! <a href="about.html"> This is home of... <h1>The Falcons!</h1> The Lockheed Martin multirole jet fighter aircraft! </a> This page discusses the F-16 Fighting Falcon's innermost secrets. </header>
Here is another way of marking this up, this time showing the paragraphs explicitly, and splitting the one link element into three:
<header> <p>Welcome! <a href="about.html">This is home of...</a></p> <h1><a href="about.html">The Falcons!</a></h1> <p><a href="about.html">The Lockheed Martin multirole jet fighter aircraft!</a> This page discusses the F-16 Fighting Falcon's innermost secrets.</p> </header>
It is possible for paragraphs to overlap when using certain elements that define fallback content. For example, in the following section:
<section> <h1>My Cats</h1> You can play with my cat simulator. <object data="cats.sim"> To see the cat simulator, use one of the following links: <ul> <li><a href="cats.sim">Download simulator file</a> <li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a> </ul> Alternatively, upgrade to the Mellblom Browser. </object> I'm quite proud of it. </section>
There are five paragraphs:
object
element.The first paragraph is overlapped by the other four. A user agent that supports the "cats.sim" resource will only show the first one, but a user agent that shows the fallback will confusingly show the first sentence of the first paragraph as if it was in the same paragraph as the second one, and will show the last paragraph as if it was at the start of the second sentence of the first paragraph.
To avoid this confusion, explicit p
elements can be
used. For example:
<section> <h1>My Fish</h1> You can play with my fish simulator. <object data="fish.sim"> <p>To see the fish simulator, use one of the following links:</p> <ul> <li><a href="fish.sim">Download simulator file</a> <li><a href="http://sims.example.com/watch?v=LYds5xY4INU">Use online simulator</a> </ul> <p>Alternatively, upgrade to the Mellblom Browser.</p> </object> I'm quite proud of it. </section>
Text content in HTML elements with
child Text
nodes, and text in attributes of HTML
elements that allow free-form text, may contain characters in
the range U+202A to U+202E (the bidirectional-algorithm formatting
characters). However, the use of these characters is restricted so
that any embedding or overrides generated by these characters do not
start and end with different parent elements, and so that all such
embeddings and overrides are explicitly terminated by a U+202C POP
DIRECTIONAL FORMATTING character. This helps reduce incidences of
text being reused in a manner that has unforeseen effects on the
bidirectional algorithm. [BIDI]
The aforementioned restrictions are defined by specifying that certain parts of documents form bidirectional-algorithm formatting character ranges, and then imposing a requirement on such ranges.
The strings resulting from applying the following algorithm to an HTML element element are bidirectional-algorithm formatting character ranges:
Let output be an empty list of strings.
Let string be an empty string.
Let node be the first child node of element, if any, or null otherwise.
Loop: If node is null, jump to the step labeled end.
Process node according to the first matching step from the following list:
Text
nodeAppend the text data of node to string.
br
elementIf string is not the empty string, push string onto output, and let string be empty string.
Let node be node's next sibling, if any, or null otherwise.
Jump to the step labeled loop.
End: If string is not the empty string, push string onto output.
Return output as the bidirectional-algorithm formatting character ranges.
The value of a namespace-less attribute of an HTML element is a bidirectional-algorithm formatting character range.
Any strings that, as described above, are
bidirectional-algorithm formatting character ranges must
match the string
production in the following
ABNF, the character set for which is Unicode. [ABNF]
string = *( plaintext ( embedding / override ) ) plaintext embedding = ( lre / rle ) string pdf override = ( lro / rlo ) string pdf lre = %x202A ; U+202A LEFT-TO-RIGHT EMBEDDING rle = %x202B ; U+202B RIGHT-TO-LEFT EMBEDDING lro = %x202D ; U+202D LEFT-TO-RIGHT OVERRIDE rlo = %x202E ; U+202E RIGHT-TO-LEFT OVERRIDE pdf = %x202C ; U+202C POP DIRECTIONAL FORMATTING plaintext = *( %x0000-2029 / %x202F-10FFFF ) ; any string with no bidirectional-algorithm formatting characters
Authors are encouraged to use the dir
attribute, the bdo
element,
and the bdi
element, rather than maintaining the
bidirectional-algorithm formatting characters manually. The
bidirectional-algorithm formatting characters interact poorly with
CSS.
Authors may use the ARIA role
and aria-*
attributes on HTML
elements, in accordance with the requirements described in
the ARIA specifications, except where these conflict with the
strong native semantics
described below. These exceptions are intended to prevent authors
from making assistive technology products report nonsensical states
that do not represent the actual state of the document. [ARIA]
User agents are required to implement ARIA semantics on all HTML elements, as defined in the ARIA specifications. The implicit ARIA semantics defined below must be recognized by implementations for the purposes of ARIA processing. [ARIAIMPL]
The ARIA attributes defined in the ARIA specifications, and the strong native semantics and default implicit ARIA semantics defined below, do not have any effect on CSS pseudo-class matching, user interface modalities that don't use assistive technologies, or the default actions of user interaction events as described in this specification.
Every HTML element may have an ARIA role
attribute specified. This is an
ARIA Role attribute as defined by [ARIA] Section
5.4 Definition of Roles.
The attribute, if specified, must have a value that is a set of space-separated tokens representing the various WAI-ARIA roles that the element belongs to.
The WAI-ARIA role that an HTML element has assigned to it is the
first non-abstract role found in the list of values generated when the
role
attribute is split on
spaces.
Every HTML element may have ARIA state and property attributes specified. These attributes are defined by [ARIA] in Section 6.6, Definitions of States and Properties (all aria-* attributes).
These attributes, if specified, must have a value that is the ARIA value type in the "Value" field of the definition for the state or property, mapped to the appropriate HTML value type according to [ARIA] Section 10.2 Mapping WAI-ARIA Value types to languages using the HTML 5 mapping.
ARIA State and Property attributes can be used on any element. They are not always meaningful, however, and in such cases user agents might not perform any processing aside from including them in the DOM. State and property attributes are processed according to the requirements of the sections Strong Native Semantics and Implicit ARIA semantics, as well as [ARIA] and [ARIAIMPL].
The following table defines the strong native semantics and corresponding default implicit ARIA semantics that apply to HTML elements. Each language feature (element or attribute) in a cell in the first column implies the ARIA semantics (role, states, and/or properties) given in the cell in the second column of the same row. When multiple rows apply to an element, the role from the last row to define a role must be applied, and the states and properties from all the rows must be combined.
Language feature | Strong native semantics and default implied ARIA semantics |
---|---|
area element that creates a hyperlink
| link role
|
base element
| No role |
datalist element
| listbox role, with the aria-multiselectable property set to "false"
|
details element
| aria-expanded state set to "true" if the element's open attribute is present, and set to "false" otherwise
|
dialog element without an open attribute
| The aria-hidden state set to "true"
|
head element
| No role |
hgroup element
| heading role, with the aria-level property set to the element's outline depth
|
hr element
| separator role
|
html element
| No role |
img element whose alt attribute's value is empty
| presentation role
|
input element with a type attribute in the Checkbox state
| aria-checked state set to "mixed" if the element's indeterminate IDL attribute is true, or "true" if the element's checkedness is true, or "false" otherwise
|
input element with a type attribute in the Color state
| No role |
input element with a type attribute in the Date state
| No role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the Date and Time state
| No role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the Local Date and Time state
| No role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the E-mail state with no suggestions source element
| textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the File Upload state
| No role |
input element with a type attribute in the Hidden state
| No role |
input element with a type attribute in the Month state
| No role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the Number state
| spinbutton role, with the aria-readonly property set to "true" if the element has a readonly attribute, the aria-valuemax property set to the element's maximum, the aria-valuemin property set to the element's minimum, and, if the result of applying the rules for parsing floating-point number values to the element's value is a number, with the aria-valuenow property set to that number
|
input element with a type attribute in the Password state
| textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the Radio Button state
| aria-checked state set to "true" if the element's checkedness is true, or "false" otherwise
|
input element with a type attribute in the Range state
| slider role, with the aria-valuemax property set to the element's maximum, the aria-valuemin property set to the element's minimum, and the aria-valuenow property set to the result of applying the rules for parsing floating-point number values to the element's value, if that results in a number, or the default value otherwise
|
input element with a type attribute in the Reset Button state
| button role
|
input element with a type attribute in the Search state with no suggestions source element
| textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the Submit Button state
| button role
|
input element with a type attribute in the Telephone state with no suggestions source element
| textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the Text state with no suggestions source element
| textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the Text, Search, Telephone, URL, or E-mail states with a suggestions source element
| combobox role, with the aria-owns property set to the same value as the list attribute, and the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the Time state
| No role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the URL state with no suggestions source element
| textbox role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element with a type attribute in the Week state
| No role, with the aria-readonly property set to "true" if the element has a readonly attribute
|
input element that is required
| The aria-required state set to "true"
|
keygen element
| No role |
label element
| No role |
link element that creates a hyperlink
| link role
|
menu element with a type attribute in the context menu state
| No role |
menu element with a type attribute in the list state
| menu role
|
menu element with a type attribute in the toolbar state
| toolbar role
|
meta element
| No role |
meter element
| No role |
nav element
| navigation role
|
noscript element
| No role |
optgroup element
| No role |
option element that is in a list of options or that represents a suggestion in a datalist element
| option role, with the aria-selected state set to "true" if the element's selectedness is true, or "false" otherwise.
|
param element
| No role |
progress element
| progressbar role, with, if the progress bar is determinate, the aria-valuemax property set to the maximum value of the progress bar, the aria-valuemin property set to zero, and the aria-valuenow property set to the current value of the progress bar
|
script element
| No role |
select element with a multiple attribute
| listbox role, with the aria-multiselectable property set to "true"
|
select element with no multiple attribute
| listbox role, with the aria-multiselectable property set to "false"
|
select element with a required attribute
| The aria-required state set to "true"
|
source element
| No role |
style element
| No role |
summary element
| No role |
textarea element
| textbox role, with the aria-multiline property set to "true", and the aria-readonly property set to "true" if the element has a readonly attribute
|
textarea element with a required attribute
| The aria-required state set to "true"
|
title element
| No role |
An element that defines a command, whose Type facet is "checkbox", and that is a descendant of a menu element whose type attribute in the list state
| menuitemcheckbox role, with the aria-checked state set to "true" if the command's Checked State facet is true, and "false" otherwise
|
An element that defines a command, whose Type facet is "command", and that is a descendant of a menu element whose type attribute in the list state
| menuitem role
|
An element that defines a command, whose Type facet is "radio", and that is a descendant of a menu element whose type attribute in the list state
| menuitemradio role, with the aria-checked state set to "true" if the command's Checked State facet is true, and "false" otherwise
|
Element that is disabled | The aria-disabled state set to "true"
|
Element that is inert | The aria-disabled state set to "true"
|
Element with a hidden attribute
| The aria-hidden state set to "true"
|
Element that is a candidate for constraint validation but that does not satisfy its constraints | The aria-invalid state set to "true"
|
Some HTML elements have native semantics that can be
overridden. The following table lists these elements and their
default implicit ARIA semantics, along with the
restrictions that apply to those elements. Each language feature
(element or attribute) in a cell in the first column implies, unless
otherwise overridden, the ARIA semantic (role, state, or property)
given in the cell in the second column of the same row, but this
semantic may be overridden under the conditions listed in the cell
in the third column of that row. In addition, any element may be
given the presentation
role,
regardless of the restrictions below.
Language feature | Default implied ARIA semantic | Restrictions |
---|---|---|
a element that creates a hyperlink
| link role
| Role must be either link , button , checkbox , menuitem , menuitemcheckbox , menuitemradio , tab , or treeitem
|
address element
| No role | If specified, role must be contentinfo
|
article element
| article role
| Role must be either article , document , application , or main
|
aside element
| complementary role
| Role must be either complementary , note , or search
|
audio element
| No role | If specified, role must be application
|
button element
| button role
| Role must be either button , link , menuitem , menuitemcheckbox , menuitemradio , radio
|
details element
| group role
| Role must be a role that supports aria-expanded
|
dialog element
| dialog role
| Role must be either
alert ,
alertdialog ,
application ,
contentinfo ,
dialog ,
document ,
log ,
main ,
marquee ,
region ,
search , or
status
|
embed element
| No role | If specified, role must be either application , document , or img
|
footer element
| No role | If specified, role must be contentinfo
|
h1 element that does not have an hgroup ancestor
| heading role, with the aria-level property set to the element's outline depth
| Role must be either heading or tab
|
h2 element that does not have an hgroup ancestor
| heading role, with the aria-level property set to the element's outline depth
| Role must be either heading or tab
|
h3 element that does not have an hgroup ancestor
| heading role, with the aria-level property set to the element's outline depth
| Role must be either heading or tab
|
h4 element that does not have an hgroup ancestor
| heading role, with the aria-level property set to the element's outline depth
| Role must be either heading or tab
|
h5 element that does not have an hgroup ancestor
| heading role, with the aria-level property set to the element's outline depth
| Role must be either heading or tab
|
h6 element that does not have an hgroup ancestor
| heading role, with the aria-level property set to the element's outline depth
| Role must be either heading or tab
|
header element
| No role | If specified, role must be banner
|
iframe element
| No role | If specified, role must be either application , document , or img
|
img element whose alt attribute's value is absent
| img role
| No restrictions |
img element whose alt attribute's value is present and not empty
| img role
| No restrictions |
input element with a type attribute in the Button state
| button role
| Role must be either button , link , menuitem , menuitemcheckbox , menuitemradio , radio
|
input element with a type attribute in the Checkbox state
| checkbox role
| Role must be either checkbox or menuitemcheckbox
|
input element with a type attribute in the Image Button state
| button role
| Role must be either button , link , menuitem , menuitemcheckbox , menuitemradio , radio
|
input element with a type attribute in the Radio Button state
| radio role
| Role must be either radio or menuitemradio
|
li element whose parent is an ol or ul element
| listitem role
| Role must be either listitem , menuitemcheckbox , menuitemradio , option , tab , or treeitem
|
object element
| No role | If specified, role must be either application , document , or img
|
ol element
| list role
| Role must be either directory , list , listbox , menu , menubar , tablist , toolbar , tree
|
output element
| status role
| No restrictions |
section element
| region role
| Role must be either
alert ,
alertdialog ,
application ,
contentinfo ,
dialog ,
document ,
log ,
main ,
marquee ,
region ,
search , or
status
|
ul element
| list role
| Role must be either directory , list , listbox , menu , menubar , tablist , toolbar , tree
|
video element
| No role | If specified, role must be application
|
The body element | document role
| Role must be either document or application
|
The entry "no role", when
used as a strong native
semantic, means that no role other than presentation
can be used.
When used as a default
implied ARIA semantic, it means the user agent has no default
mapping to ARIA roles. (However, it probably will have its own
mappings to the accessibility layer.)
The WAI-ARIA specification neither requires or forbids user agents from enhancing native presentation and interaction behaviors on the basis of WAI- ARIA markup. Even mainstream user agents might choose to expose metadata or navigational features directly or via user-installed extensions; for example, exposing required form fields or landmark navigation. User agents are encouraged to maximize their usefulness to users, including users without disabilities.
Conformance checkers are encouraged to phrase errors such that
authors are encouraged to use more appropriate elements rather than
remove accessibility annotations. For example, if an a
element is marked as having the button
role, a conformance
checker could say "Use a more appropriate element to represent a
button, for example a button
element or an
input
element" rather than "The button
role cannot be used with
a
elements".
These features can be used to make accessibility tools render content to their users in more useful ways. For example, ASCII art, which is really an image, appears to be text, and in the absence of appropriate annotations would end up being rendered by screen readers as a very painful reading of lots of punctuation. Using the features described in this section, one can instead make the ATs skip the ASCII art and just read the caption:
<figure role="img" aria-labelledby="fish-caption"> <pre> o .'`/ ' / ( O .-'` ` `'-._ .') _/ (o) '. .' / ) ))) >< < `\ |_\ _.' '. \ '-._ _ .-' '.) jgs `\__\ </pre> <figcaption id="fish-caption"> Joan G. Stark, "<cite>fish</cite>". October 1997. ASCII on electrons. 28×8. </figcaption> </figure>
Implementations of XPath 1.0 that
operate on HTML documents parsed or created in the
manners described in this specification (e.g. as part of the document.evaluate()
API) must act as if the
following edit was applied to the XPath 1.0 specification.
First, remove this paragraph:
A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. This is the same way expansion is done for element type names in start and end-tags except that the default namespace declared with
xmlns
is not used: if the QName does not have a prefix, then the namespace URI is null (this is the same way attribute names are expanded). It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.
Then, insert in its place the following:
A QName in the node test is expanded into an expanded-name using the namespace declarations from the expression context. If the QName has a prefix, then there must be a namespace declaration for this prefix in the expression context, and the corresponding namespace URI is the one that is associated with this prefix. It is an error if the QName has a prefix for which there is no namespace declaration in the expression context.
If the QName has no prefix and the principal node type of the axis is element, then the default element namespace is used. Otherwise if the QName has no prefix, the namespace URI is null. The default element namespace is a member of the context for the XPath expression. The value of the default element namespace when executing an XPath expression through the DOM3 XPath API is determined in the following way:
- If the context node is from an HTML DOM, the default element namespace is "http://www.w3.org/1999/xhtml".
- Otherwise, the default element namespace URI is null.
This is equivalent to adding the default element namespace feature of XPath 2.0 to XPath 1.0, and using the HTML namespace as the default element namespace for HTML documents. It is motivated by the desire to have implementations be compatible with legacy HTML content while still supporting the changes that this specification introduces to HTML regarding the namespace used for HTML elements, and by the desire to use XPath 1.0 rather than XPath 2.0.
This change is a willful violation of the XPath 1.0 specification, motivated by desire to have implementations be compatible with legacy content while still supporting the changes that this specification introduces to HTML regarding which namespace is used for HTML elements. [XPATH10]
XSLT 1.0 processors outputting to a DOM when the output method is "html" (either explicitly or via the defaulting rule in XSLT 1.0) are affected as follows:
If the transformation program outputs an element in no namespace, the processor must, prior to constructing the corresponding DOM element node, change the namespace of the element to the HTML namespace, ASCII-lowercase the element's local name, and ASCII-lowercase the names of any non-namespaced attributes on the element.
This requirement is a willful violation of the XSLT 1.0 specification, required because this specification changes the namespaces and case-sensitivity rules of HTML in a manner that would otherwise be incompatible with DOM-based XSLT transformations. (Processors that serialize the output are unaffected.) [XSLT10]
This specification does not specify precisely how XSLT processing
interacts with the HTML parser infrastructure (for
example, whether an XSLT processor acts as if it puts any elements
into a stack of open elements). However, XSLT
processors must stop parsing if they successfully
complete, and must set the current document readiness
first to "interactive
" and then to "complete
" if they are aborted.
This specification does not specify how XSLT interacts with the navigation algorithm, how it fits in with the event loop, nor how error pages are to be handled (e.g. whether XSLT errors are to replace an incremental XSLT output, or are rendered inline, etc).
There are also additional non-normative comments
regarding the interaction of XSLT and HTML in the script
element
section.
APIs for dynamically inserting markup into the document interact with the parser, and thus their behavior varies depending on whether they are used with HTML documents (and the HTML parser) or XHTML in XML documents (and the XML parser).
The open()
method comes in several variants with different numbers of
arguments.
open
( [ type [, replace ] ] )Causes the Document
to be replaced in-place, as if
it was a new Document
object, but reusing the
previous object, which is then returned.
If the type argument is omitted or has the
value "text/html
", then the resulting
Document
has an HTML parser associated with it, which
can be given data to parse using document.write()
. Otherwise, all
content passed to document.write()
will be parsed
as plain text.
If the replace argument is present and has
the value "replace
", the existing entries in
the session history for the Document
object are
removed.
The method has no effect if the Document
is still
being parsed.
Throws an InvalidStateError
exception if the
Document
is an XML
document.
open
( url, name, features [, replace ] )Works like the window.open()
method.
Document
objects have an
ignore-opens-during-unload counter, which is used to
prevent scripts from invoking the document.open()
method (directly or
indirectly) while the document is being unloaded. Initially, the counter must be set
to zero.
When called with two or fewer arguments, the document.open()
method must act as
follows:
Document
object is not flagged as an HTML document, throw an
InvalidStateError
exception and abort these
steps.Let type be the value of the first
argument, if there is one, or "text/html
"
otherwise.
Let replace be true if there is a second argument and it is an ASCII case-insensitive match for the value "replace", and false otherwise.
If the Document
has an active parser
that isn't a script-created parser, and the
insertion point associated with that parser's
input stream is not undefined (that is, it
does point to somewhere in the input stream), then the
method does nothing. Abort these steps and return the
Document
object on which the method was invoked.
This basically causes document.open()
to be ignored
when it's called in an inline script found during the parsing of
data sent over the network, while still letting it have an effect
when called asynchronously or on a document that is itself being
spoon-fed using these APIs.
Similarly, if the Document
's
ignore-opens-during-unload counter is greater than
zero, then the method does nothing. Abort these steps and return
the Document
object on which the method was
invoked.
This basically causes document.open()
to be ignored
when it's called from a beforeunload
pagehide
, or unload
event handler while the
Document
is being unloaded.
Release the storage mutex.
Set the Document
's salvageable state to
false.
Prompt to
unload the Document
object. If the user
refused to allow the document to be unloaded, then
abort these steps and return the Document
object on
which the method was invoked.
Unload the
Document
object, with the recycle
parameter set to true.
Unregister all event listeners registered on the
Document
node and its descendants.
Remove any tasks
associated with the Document
in any task
source.
Remove all child nodes of the document, without firing any mutation events.
Replace the Document
's singleton objects with
new instances of those objects. (This includes in particular the
Window
, Location
, History
,
ApplicationCache
, and Navigator
, objects,
the various BarProp
objects, the two
Storage
objects, the various
HTMLCollection
objects, and objects defined by other
specifications, like Selection
and the document's
UndoManager
. It also includes all the Web IDL
prototypes in the JavaScript binding, including the
Document
object's prototype.)
Change the document's character encoding to UTF-8.
Set the Document
object's reload override
flag and set the Document
's reload
override buffer to the empty string.
Set the Document
's salvageable state back
to true.
Change the document's address to the entry script's document's address.
Create a new HTML parser and associate it with
the document. This is a script-created parser (meaning
that it can be closed by the document.open()
and document.close()
methods, and
that the tokenizer will wait for an explicit call to document.close()
before emitting
an end-of-file token). The encoding confidence is
irrelevant.
Set the current document readiness of the
document to "loading
".
If the type string contains a ";" (U+003B) character, remove the first such character and all characters from it up to the end of the string.
Strip leading and trailing whitespace from type.
If type is not now an ASCII
case-insensitive match for the string
"text/html
", then act as if the tokenizer had emitted
a start tag token with the tag name "pre" followed by a single
"LF" (U+000A) character, then
switch the HTML parser's tokenizer to the
PLAINTEXT state.
Remove all the entries in the browsing context's session history after the current entry. If the current entry is the last entry in the session history, then no entries are removed.
This doesn't necessarily have to affect the user agent's user interface.
Remove any tasks queued by
the history traversal task source that are associated
with any Document
objects in the top-level
browsing context's document family.
Document
.If replace is false, then add a new
entry, just before the last entry, and associate with the new entry
the text that was parsed by the previous parser associated with the
Document
object, as well as the state of the document
at the start of these steps. This allows the user to step backwards
in the session history to see the page before it was blown away by
the document.open()
call.
This new entry does not have a Document
object, so a
new one will be created if the session history is traversed to that
entry.
Finally, set the insertion point to point at just before the end of the input stream (which at this point will be empty).
Return the Document
on which the method was
invoked.
The document.open()
method does not
affect whether a Document
is ready for post-load
tasks or completely loaded.
When called with three or more arguments, the open()
method on the
Document
object must call the open()
method on the Window
object of the Document
object, with the same
arguments as the original call to the open()
method, and return whatever
that method returned. If the Document
object has no
Window
object, then the method must throw an
InvalidAccessError
exception.
close
()Closes the input stream that was opened by the document.open()
method.
Throws an InvalidStateError
exception if the
Document
is an XML
document.
The close()
method must run the following steps:
If the Document
object is not flagged as an
HTML document, throw an
InvalidStateError
exception and abort these
steps.
If there is no script-created parser associated with the document, then abort these steps.
Insert an explicit "EOF" character at the end of the parser's input stream.
If there is a pending parsing-blocking script, then abort these steps.
Run the tokenizer, processing resulting tokens as they are emitted, and stopping when the tokenizer reaches the explicit "EOF" character or spins the event loop.
document.write()
write
(text...)In general, adds the given string(s) to the
Document
's input stream.
This method has very idiosyncratic behavior. In
some cases, this method can affect the state of the HTML
parser while the parser is running, resulting in a DOM that
does not correspond to the source of the document (e.g. if the
string written is the string "<plaintext>
" or "<!--
"). In other cases, the call can clear the
current page first, as if document.open()
had been called.
In yet more cases, the method is simply ignored, or throws an
exception. To make matters worse, the exact behavior of this
method can in some cases be dependent on network latency, which can lead to failures that are very hard to debug.
For all these reasons, use of this method is strongly
discouraged.
This method throws an InvalidStateError
exception
when invoked on XML documents.
Document
objects have an
ignore-destructive-writes counter, which is used in
conjunction with the processing of script
elements to
prevent external scripts from being able to use document.write()
to blow away the
document by implicitly calling document.open()
. Initially, the
counter must be set to zero.
The document.write(...)
method must act as follows:
If the method was invoked on an XML
document, throw an InvalidStateError
exception and abort these steps.
If the insertion point is undefined and either the
Document
's ignore-opens-during-unload
counter is greater than zero or the Document
's
ignore-destructive-writes counter is greater than
zero, abort these steps.
If the insertion point is undefined, call the
open()
method on the document
object (with no arguments). If
the user refused to allow the document to be
unloaded, then abort these steps. Otherwise, the
insertion point will point at just before the end of
the (empty) input stream.
Insert the string consisting of the concatenation of all the arguments to the method into the input stream just before the insertion point.
If the Document
object's reload override
flag is set, then append the string consisting of the
concatenation of all the arguments to the method to the
Document
's reload override buffer.
If there is no pending parsing-blocking script,
have the HTML parser process the characters that were
inserted, one at a time, processing resulting tokens as they are
emitted, and stopping when the tokenizer reaches the insertion
point or when the processing of the tokenizer is aborted by the
tree construction stage (this can happen if a script
end tag token is emitted by the tokenizer).
If the document.write()
method was
called from script executing inline (i.e. executing because the
parser parsed a set of script
tags), then this is a
reentrant invocation of the
parser.
Finally, return from the method.
document.writeln()
writeln
(text...)Adds the given string(s) to the Document
's input
stream, followed by a newline character. If necessary, calls the
open()
method implicitly
first.
This method throws an InvalidStateError
exception
when invoked on XML documents.
The document.writeln(...)
method, when invoked, must act as if the document.write()
method had been
invoked with the same argument(s), plus an extra argument consisting
of a string containing a single line feed character (U+000A).