This is a work in progress! For the latest updates from the HTML WG, possibly including important bug fixes, please look at the editor's draft instead.
ISSUE-56 (urls-webarch) blocks progress to Last Call
A URL is a string used to identify a resource.
A URL is a valid URL if at least one of the following conditions holds:
A string is a valid non-empty URL if it is a valid URL but it is not the empty string.
A string is a valid URL potentially surrounded by spaces if, after stripping leading and trailing whitespace from it, it is a valid URL.
A string is a valid non-empty URL potentially surrounded by spaces if, after stripping leading and trailing whitespace from it, it is a valid non-empty URL.
To parse a URL url into its component parts, the user agent must use the parse an address algorithm defined by the IRI specification. [RFC3987]
Parsing a URL can fail. If it does not, then it results in the following components, again as defined by the IRI specification:
To resolve a URL to an absolute URL relative to either another absolute URL or an element, the user agent must use the following steps. Resolving a URL can result in an error, in which case the URL is not resolvable.
Let url be the URL being resolved.
Let encoding be determined as follows:
Document, and the URL character encoding is the document's character encoding.
If encoding is a UTF-16 encoding, then change the value of encoding to UTF-8.
Otherwise, let base be the base URI of
the element, as defined by the XML Base specification, with
the base URI of the document entity being defined as the
document base URL of the
owns the element. [XMLBASE]
For the purposes of the XML Base specification, user agents
must act as if all
Document objects represented XML
It is possible for
xml:base attributes to be present
even in HTML fragments, as such attributes can be added
dynamically using script. (Such scripts would not be conforming,
are not allowed in HTML documents.)
Let fallback base url be the document's address.
The document base URL is the result of the previous step if it was successful; otherwise it is fallback base url.
Return the result of applying the resolve an address algorithm defined by the IRI specification to resolve url relative to base using encoding encoding. [RFC3987]
A URL is an absolute URL if resolving it results in the same output regardless of what it is resolved relative to, and that output is not a failure.
An absolute URL is a hierarchical URL if, when resolved and then parsed, there is a character immediately after the <scheme> component and it is a U+002F SOLIDUS character (/).
An absolute URL is an authority-based URL if, when resolved and then parsed, there are two characters immediately after the <scheme> component and they are both U+002F SOLIDUS characters (//).
This specification defines the URL
about:legacy-compat as a reserved, though
about: URI, for use in DOCTYPEs in HTML
documents when needed for compatibility with XML tools. [ABOUT]
This specification defines the URL
about:srcdoc as a reserved, though
about: URI, that is used as
the document's address of
srcdoc documents. [ABOUT]
The term "URL" in this specification is used in a manner distinct from the precise technical meaning it is given in RFC 3986. Readers familiar with that RFC will find it easier to read this specification if they pretend the term "URL" as used herein is really called something else altogether. This is a willful violation of RFC 3986. [RFC3986]
When an element is affected by a base URL change, it must act as described in the following list:
If the absolute URL identified by the hyperlink is
being shown to the user, or if any data derived from that URL is
affecting the display, then the
href attribute should be re-resolved relative to the element
and the UI updated appropriately.
delelement with a
If the absolute URL identified by the
cite attribute is being shown to the user, or if
any data derived from that URL is affecting the display, then the
URL should be re-resolved relative to the element and the UI updated
The element is not directly affected.
Changing the base URL doesn't affect the image
img elements, although subsequent
accesses of the
src IDL attribute
from script will return a new absolute URL that might
no longer correspond to the image being shown.
An interface that has a complement of URL decomposition IDL attributes will have seven attributes with the following definitions:
attribute DOMString protocol; attribute DOMString host; attribute DOMString hostname; attribute DOMString port; attribute DOMString pathname; attribute DOMString search; attribute DOMString hash;
protocol[ = value ]
Returns the current scheme of the underlying URL.
Can be set, to change the underlying URL's scheme.
host[ = value ]
Returns the current host and port (if it's not the default port) in the underlying URL.
Can be set, to change the underlying URL's host and port.
The host and the port are separated by a colon. The port part, if omitted, will be assumed to be the current scheme's default port.
hostname[ = value ]
Returns the current host in the underlying URL.
Can be set, to change the underlying URL's host.
port[ = value ]
Returns the current port in the underlying URL.
Can be set, to change the underlying URL's port.
pathname[ = value ]
Returns the current path in the underlying URL.
Can be set, to change the underlying URL's path.
search[ = value ]
Returns the current query component in the underlying URL.
Can be set, to change the underlying URL's query component.
hash[ = value ]
Returns the current fragment identifier in the underlying URL.
Can be set, to change the underlying URL's fragment identifier.
The attributes defined to be URL decomposition IDL attributes must act as described for the attributes with the same corresponding names in this section.
In addition, an interface with a complement of URL decomposition IDL attributes will define an input, which is a URL that the attributes act on, and a common setter action, which is a set of steps invoked when any of the attributes' setters are invoked.
The seven URL decomposition IDL attributes have similar requirements.
On getting, if the input is an absolute URL that fulfills the condition given in the "getter condition" column corresponding to the attribute in the table below, the user agent must return the part of the input URL given in the "component" column, with any prefixes specified in the "prefix" column appropriately added to the start of the string and any suffixes specified in the "suffix" column appropriately added to the end of the string. Otherwise, the attribute must return the empty string.
On setting, the new value must first be mutated as described by the "setter preprocessor" column, then mutated by %-escaping any characters in the new value that are not valid in the relevant component as given by the "component" column. Then, if the input is an absolute URL and the resulting new value fulfills the condition given in the "setter condition" column, the user agent must make a new string output by replacing the component of the URL given by the "component" column in the input URL with the new value; otherwise, the user agent must let output be equal to the input. Finally, the user agent must invoke the common setter action with the value of output.
When replacing a component in the URL, if the component is part of an optional group in the URL syntax consisting of a character followed by the component, the component (including its prefix character) must be included even if the new value is the empty string.
The previous paragraph applies in particular to the
:" before a <port> component, the "
?" before a <query> component, and the "
#" before a <fragment> component.
For the purposes of the above definitions, URLs must be parsed using the URL parsing rules defined in this specification.
|Attribute||Component||Getter Condition||Prefix||Suffix||Setter Preprocessor||Setter Condition|
|<scheme>||—||—||U+003A COLON (:)||Remove all trailing U+003A COLON characters (:)||The new value is not the empty string|
|<hostport>||input is an authority-based URL||—||—||—||The new value is not the empty string and input is an authority-based URL|
|<host>||input is an authority-based URL||—||—||Remove all leading U+002F SOLIDUS characters (/)||The new value is not the empty string and input is an authority-based URL|
|<port>||input is an authority-based URL, and contained a <port> component (possibly an empty one)||—||—||Remove all characters in the new value from the first that is not in the range U+0030 DIGIT ZERO (0) to U+0039 DIGIT NINE (9), if any. Remove any leading U+0030 DIGIT ZERO characters (0) in the new value. If the resulting string is empty, set it to a single U+0030 DIGIT ZERO character (0).||input is an authority-based URL, and the new value, when interpreted as a base-ten integer, is less than or equal to 65535|
|<path>||input is a hierarchical URL||—||—||If it has no leading U+002F SOLIDUS character (/), prepend a U+002F SOLIDUS character (/) to the new value||input is hierarchical|
|<query>||input is a hierarchical URL, and contained a <query> component (possibly an empty one)||U+003F QUESTION MARK (?)||—||Remove one leading U+003F QUESTION MARK character (?), if any||input is a hierarchical URL|
|<fragment>||input contained a non-empty <fragment> component||U+0023 NUMBER SIGN (#)||—||Remove one leading U+0023 NUMBER SIGN character (#), if any||—|
The table below demonstrates how the getter condition for
search results in different results
depending on the exact original syntax of the URL:
|Input URL|| ||Explanation|
| ||empty string||No <query> component in input URL.|
| || ||There is a <query> component, but it is empty. The question mark in the resulting value is the prefix.|
| || || The <query> component has the value "|
| || ||The (empty) <fragment> component is not part of the <query> component.|
The following table is similar; it provides a list of what each of the URL decomposition IDL attributes returns for a given input URL.
|(empty string)||(empty string)|