This specification defines the term URL, and defines various algorithms for dealing with URLs, because for historical reasons the rules defined by the URI and IRI specifications are not a complete description of what HTML user agents need to implement to be compatible with Web content.
The term "URL" in this specification is used in a manner distinct from the precise technical meaning it is given in RFC 3986. Readers familiar with that RFC will find it easier to read this specification if they pretend the term "URL" as used herein is really called something else altogether. This is a willful violation of RFC 3986. [RFC3986]
A URL is a string used to identify a resource.
A URL is a valid URL if at least one of the following conditions holds:
The URL is a valid IRI reference and it has no query component. [RFC3987]
The URL is a valid IRI reference and its query component contains no unescaped non-ASCII characters. [RFC3987]
The URL is a valid IRI reference and the character encoding of the URL's Document
is UTF-8 or UTF-16. [RFC3987]
A string is a valid non-empty URL if it is a valid URL but it is not the empty string.
A string is a valid URL potentially surrounded by spaces if, after stripping leading and trailing whitespace from it, it is a valid URL.
A string is a valid non-empty URL potentially surrounded by spaces if, after stripping leading and trailing whitespace from it, it is a valid non-empty URL.
This specification defines the URL about:legacy-compat
as a reserved, though unresolvable, about:
URI, for use in DOCTYPEs in HTML documents when needed for compatibility with XML tools. [ABOUT]
This specification defines the URL about:srcdoc
as a reserved, though unresolvable, about:
URI, that is used as the document's address of iframe
srcdoc
documents. [ABOUT]
Resolving a URL is the process of taking a relative URL and obtaining the absolute URL that it implies.
A URL is an absolute URL if resolving it results in the same output regardless of what it is resolved relative to, and that output is not a failure.
An absolute URL is a hierarchical URL if, when resolved and then parsed, there is a character immediately after the <scheme> component and it is a U+002F SOLIDUS character (/).
An absolute URL is an authority-based URL if, when resolved and then parsed, there are two characters immediately after the <scheme> component and they are both U+002F SOLIDUS characters (//).
An interface that has a complement of URL decomposition IDL attributes has seven attributes with the following definitions:
attribute DOMString protocol; attribute DOMString host; attribute DOMString hostname; attribute DOMString port; attribute DOMString pathname; attribute DOMString search; attribute DOMString hash;
protocol
[ = value ]Returns the current scheme of the underlying URL.
Can be set, to change the underlying URL's scheme.
host
[ = value ]Returns the current host and port (if it's not the default port) in the underlying URL.
Can be set, to change the underlying URL's host and port.
The host and the port are separated by a colon. The port part, if omitted, will be assumed to be the current scheme's default port.
hostname
[ = value ]Returns the current host in the underlying URL.
Can be set, to change the underlying URL's host.
port
[ = value ]Returns the current port in the underlying URL.
Can be set, to change the underlying URL's port.
pathname
[ = value ]Returns the current path in the underlying URL.
Can be set, to change the underlying URL's path.
search
[ = value ]Returns the current query component in the underlying URL.
Can be set, to change the underlying URL's query component.
hash
[ = value ]Returns the current fragment identifier in the underlying URL.
Can be set, to change the underlying URL's fragment identifier.
The table below demonstrates how the getter for search
results in different results depending on the exact original syntax of the URL:
Input URL | search value |
Explanation |
---|---|---|
http://example.com/ |
empty string | No <query> component in input URL. |
http://example.com/? |
? |
There is a <query> component, but it is empty. |
http://example.com/?test |
?test |
The <query> component has the value "test ". |
http://example.com/?test# |
?test |
The (empty) <fragment> component is not part of the <query> component. |
The following table is similar; it provides a list of what each of the URL decomposition IDL attributes returns for a given input URL.
Input | protocol |
host |
hostname |
port |
pathname |
search |
hash |
---|---|---|---|---|---|---|---|
http://example.com/carrot#question%3f |
http: |
example.com |
example.com |
(empty string) | /carrot |
(empty string) | #question%3f |
https://www.example.com:4443? |
https: |
www.example.com:4443 |
www.example.com |
4443 |
/ |
? |
(empty string) |