A vocabulary and associated APIs for HTML and XHTML
text/html
This registration has been filed successfully with IANA.
charset
The charset
parameter may be provided to definitively specify the
document's character encoding, overriding any character encoding declarations in the document. The parameter's value
must be one of the labels of the character encoding used to serialize the file. [ENCODING]
Entire novels have been written about the security considerations that apply to HTML documents. Many are listed in this document, to which the reader is referred for more details. Some general concerns bear mentioning here, however:
HTML is scripted language, and has a large number of APIs (some of which are described in this document). Script can expose the user to potential risks of information leakage, credential leakage, cross-site scripting attacks, cross-site request forgeries, and a host of other problems. While the designs in this specification are intended to be safe if implemented correctly, a full implementation is a massive undertaking and, as with any software, user agents are likely to have security bugs.
Even without scripting, there are specific features in HTML which, for historical reasons,
are required for broad compatibility with legacy content but that expose the user to unfortunate
security problems. In particular, the img
element can be used in conjunction with
some other features as a way to effect a port scan from the user's location on the Internet.
This can expose local network topologies that the attacker would otherwise not be able to
determine.
HTML relies on a compartmentalization scheme sometimes known as the same-origin policy. An origin in most cases consists of all the pages served from the same host, on the same port, using the same protocol.
It is critical, therefore, to ensure that any untrusted content that forms part of a site be hosted on a different origin than any sensitive content on that site. Untrusted content can easily spoof any other page on the same origin, read data from that origin, cause scripts in that origin to execute, submit forms to and from that origin even if they are protected from cross-site request forgery attacks by unique tokens, and make use of any third-party resources exposed to or rights granted to that origin.
multipart/x-mixed-replace
This registration has been filed successfully with IANA.
boundary
(defined in RFC2046) [RFC2046]
multipart/x-mixed-replace
resource can be of any type, including types with non-trivial
security implications such as text/html
.
application/xhtml+xml
This registration has been filed with IANA and is currently in expert review.
xhtml
" and "xht
"
are sometimes used as extensions for XML resources that have a
root element from the HTML namespace.application/x-www-form-urlencoded
This registration has been filed successfully with IANA.
In isolation, an application/x-www-form-urlencoded
payload poses no security
risks. However, as this type is usually used as part of a form submission, all the risks that
apply to HTML forms need to be considered in the context of this type.
These risks fall into multiple categories that pertain to the same-origin policy and cross-origin reach of HTML forms (http://www.w3.org/TR/cors/#security), general HTML application threats (http://www.w3.org/TR/html/single-page.html#writing-secure-applications-with-html), and not relying on client-side validation for anything other than user feedback (http://www.w3.org/TR/html/single-page.html#security-forms).
application/x-www-form-urlencoded
payloads are
defined in this specification.
text/cache-manifest
This registration has been filed successfully with IANA.
charset
The charset
parameter may be provided. The parameter's value must be
"utf-8
". This parameter serves no purpose; it is only allowed for
compatibility with legacy servers.
Cache manifests themselves contain no executable content and pose no immediate risk unless sensitive information is included within the manifest.
Implementations however, are required to follow specific rules when populating a cache based on a cache manifest, to ensure that certain origin-based restrictions are honoured. Failure to correctly implement these rules can result in information leakage, cross-site scripting attacks, and the like.
Caching mechanisms are typically subjects of poisoning attacks and the one that this file type supports is no exception. The published specification includes steps intended to mitigate such issues (notably non-malicious cache poisoning from captive portals) but implementers are advised to exercise caution in caching.
Additionally, the permanence of this caching mechanism requires care to be taken with respect to users' privacy (http://www.w3.org/TR/html/single-page.html#expiring-application-caches) and storage resources (http://www.w3.org/TR/html/single-page.html#disk-space).
web+
scheme prefixThis section describes a convention for use with the IANA URI scheme registry. It does not itself register a specific scheme. [RFC4395]
web+
" followed by one or more letters in the range
a
-z
.
web+
" schemes should use UTF-8 encodings where relevant.web+
" schemes. As
such, these schemes must not be used for features intended to be core platform features (e.g.
network transfer protocols like HTTP or FTP). Similarly, such schemes must not store
confidential information in their URLs, such as usernames, passwords, personal information, or
confidential project names.