Abstract

This specification defines a mechanism by which user agents may verify that a fetched resource has been delivered without unexpected manipulation.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

A list of changes to this document may be found at https://github.com/w3c/webappsec.

This document was published by the Web Application Security Working Group as a First Public Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-webappsec@w3.org (subscribe, archives) with [Integrity] at the start of your email's subject. All comments are welcome.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Table of Contents

1. Introduction

This section is non-normative.

Sites and applications on the web are rarely composed of resources from only a single origin. Authors pull scripts, images, fonts, etc. from a wide variety of services and content delivery networks, and must trust that the delivered representation is, in fact, what they expected to load. If an attacker can trick a user into downloading content from a hostile server (via DNS poisoning, or other such means), the author has no recourse. Likewise, an attacker who can replace the file on the CDN server has the ability to inject arbitrary content.

Delivering resources over a secure channel mitigates some of this risk: with TLS, HSTS, and pinned public keys, a user agent can be fairly certain that it is indeed speaking with the server it believes it’s talking to. These mechanisms, however, authenticate only the server, not the content. An attacker (or admin!) with access to the server can manipulate content with impunity. Ideally, authors would not only be able to pin the keys of a server, but also pin the content, ensuring that an exact representation of a resource, and only that representation, loads and executes.

This document specifies such a validation scheme, extending several HTML elements with a integrity attribute that contains a cryptographic hash of the representation of the resource the author expects to load. For instance, an author may wish to load jQuery from a shared server rather than hosting it on their own origin. Specifying that the expected SHA-256 hash of https://code.jquery.com/jquery-1.10.2.min.js is C6CB9UYIS9UJeqinPHWTHVqh_E1uhG5Twh-Y5qFQmYg means that the user agent can verify that the data it loads from that URL matches that expected hash before executing the JavaScript it contains. This integrity verification significantly reduces the risk that an attacker can substitute malicious content.

This example can be communicated to a user agent by adding the hash to a script element, like so:

Example 1
<script src="https://code.jquery.com/jquery-1.10.2.min.js"
        integrity="ni:///sha-256;C6CB9UYIS9UJeqinPHWTHVqh_E1uhG5Twh-Y5qFQmYg?ct=application/javascript">

Scripts, of course, are not the only resource type which would benefit from integrity validation. The scheme specified here applies to all HTML elements which trigger fetches, as well as to fetches triggered from CSS and JavaScript.

Moreover, integrity metadata may also be useful for purposes other than validation. User agents may decide to use the integrity metadata as an identifier in a local cache, for instance, meaning that common resources (for example, JavaScript libraries) could be cached and retrieved once, regardless of the URL from which they are loaded.

1.1 Goals

  1. Compromise of the third-party service should not automatically mean compromise of every site which includes its scripts. Content authors will have a mechanism by which they can specify expectations for content they load, meaning for example that they could load a specific script, and not any script that happens to have a particular URL.

  2. The verification mechanism should extend to all resource types that a page may fetch in the course of its execution and rendering. Active content (scripts, style, iframe contents, etc) are, of course, critical, but inactive content such as images and fonts will also be covered.

  3. The verification mechanism should have reporting functionality which would inform the author that an invalid resource was downloaded. Further it should be possible for an author to choose to run only the reporting functionality, allowing potentially corrupt resources to run on her site, but flagging violations for manual review.

  4. The metadata provided for verification may enable improvements to user agents’ caching schemes: common resources such as JavaScript libraries can be downloaded once, and only once, even if multiple instances with distinct URLs are requested.

  5. (potentially) Relax mixed-content warnings for resources whose integrity is verified. If the integrity metadata for a resource is delivered over a secure channel, the user agent might choose to allow loading the resource over an insecure channel.

  6. (potentially) Allow resources to be downloaded from non-canonical sources (for instance, over an insecure channel) for performance, but fall back to a canonical source if the non-canonical source fails an integrity check.

Issue 1

I’m not sure about #5 and #6. Get more detail from the WG about the benefits that such a fallback system would enable. (mkwst)

1.2 Use Cases/Examples

1.2.1 Resource Integrity

  • An author wants to include JavaScript provided by a third-party analytics service on her site. She wants, however, to ensure that only the code she’s carefully reviewed is executed. She can do so by generating integrity metadata for the script she’s planning on including, and adding it to the script element she includes on her page:

    Example 2
    <script src="https://analytics-r-us.com/v1.0/include.js"
            integrity="ni:///sha-256;SDfwewFAE...wefjijfE?ct=application/javascript"></script>
  • An advertising network wishes to ensure that advertisements delivered via third-party servers matches the code which they reviewed in order to reduce the risk of accidental or malicious substitution of unreviewed content. By adding integrity metadata to the iframe element wrapping the advertisement, they can ensure that the third-party server delivers only the agreed-upon content.

    Example 3
    <iframe src="https://awesome-ads.com/advertisement1.html"
            integrity="ni:///sha-256;kasfdsaffs...eoirW-e?ct=text/html"></iframe>
  • A user agent wishes to ensure that pieces of its UI which are rendered via HTML (for example, Chrome’s New Tab Page) aren’t manipulated before display. Integrity metadata mitigates the risk that altered JavaScript will run in these page’s high-privilege context.

  • The author of a mash-up wants to make sure her creation remains in a working state. Adding integrity metadata to external subresources defines an expected revision of the included files. The author can then use the reporting functionality to be notified of changes to the included resources.

1.2.2 Downloads

  • A software distribution service wants to ensure that files are correctly downloaded. It can do so by adding integrity metadata to the a elements which users click on to trigger a download:

    Example 4
    <a href="https://software-is-nice.com/awesome.exe"
       integrity="ni:///sha-256;fkfrewFRFEFHJR...wfjfrErw?ct=application/octet-stream"
       download>...</a>

1.2.3 Fallback

  • An author wishes to load a resource over an insecure channel for performance reasons, but fall back to a secure channel if the insecurely-loaded resource is manipulated. She can do this by adding integrity metadata and a non-canonical source to the script element:

    Example 5
    <script src="https://rockin-resources.com/script.js"
            noncanonical-src="http://insecurity-is-inherent.net/script.js"
            integrity="ni:///sha-256;asijfiqu4t12...woeji3W?ct=application/javascript"></script>

2. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MUST, MUST NOT, REQUIRED, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this specification are to be interpreted as described in [RFC2119].

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

2.1 Key Concepts and Terminology

This section defines several terms used throughout the document.

The term digest refers to the base64url-encoded result of executing a cryptographic hash function on an arbitrary block of data.

A secure channel is any communication mechanism that the user agent has defined as “secure” (typically limited to HTTP over Transport Layer Security (TLS) [RFC2818]).

An insecure channel is any communication mechanism other than those the user agent has defined as “secure”.

The term origin is defined in the Origin specification. [RFC6454]

The MIME type of a resource is a technical hint about the use and format of that resource. [MIMETYPE]

The entity body, transfer encoding, content encoding and message body of a resource is defined by the HTTP 1.1 specification, section 7.2. [HTTP11]

A base64url encoding is defined in RFC 4648, section 5. In a nutshell, it replaces the characters U+002B PLUS SIGN (+) and U+002F SOLIDUS (/) characters in normal base64 encoding with the U+002D HYPHEN-MINUS (-) and U+005F LOW LINE (_) characters, respectively. [RFC4648]

The Augmented Backus-Naur Form (ABNF) notation used in this document is specified in RFC 5234. [ABNF]

The SHA-256, SHA-384, and SHA-512 are part of the SHA-2 set of cryptographic hash functions defined by the NIST in “Descriptions of SHA-256, SHA-384, and SHA-512”.

3. Framework

The integrity verification mechanism specified here boils down to the process of generating a sufficiently strong cryptographic digest for a resource, and transmitting that digest to a user agent so that it may be used when fetching the resource.

3.1 Integrity metadata

To verify the integrity of a resource, a user agent requires integrity metadata, which consists of the following pieces of information:

The hash function and digest MUST be provided in order to validate a resource’s integrity. The MIME type SHOULD be provided, as it mitigates the risk of certain attack vectors (see MIME Type confusion in this document’s Security Considerations section).

This metadata is generally encoded as a “named information” (ni) URI, as defined in RFC6920. [RFC6920]

For example, given a resource containing only the string “Hello, world!”, an author might choose SHA-256 as a hash function. -MO_YqmqPm_BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8 is the base64url-encoded digest that results. This can be encoded as an ni URI as follows:

Example 6
ni:///sha-256;-MO_YqmqPm_BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8

Or, if the author further wishes to specify the content type (text/plain):

Example 7
ni:///sha-256;-MO_YqmqPm_BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8?ct=text/plain
Note

Digests may be generated using any number of utilities. OpenSSL, for example, is quite commonly available. The example in this section is the result of the following command line:

echo -n "Hello, world." | openssl dgst -sha256 -binary | openssl enc -base64 | sed -e 's/+/-/g' -e 's/\//_/g'

3.2 Cryptographic hash functions

Conformant user agents MUST support the SHA-256 and SHA-512 cryptographic hash functions for use as part of a resource’s integrity metadata.

3.3 Resource verification algorithms

3.3.1 Apply algorithm to resource

  1. If algorithm is not a hash function recognized and supported by the user agent, return null.
  2. Let result be the result of applying algorithm to the content of the entity body of resource, including any content coding that has been applied, but not including any transfer encoding applied to the message body.
  3. Let encodedResult be result of base64url-encoding result.
  4. Strip any trailing U+003D EQUALS SIGN (=) characters from encodedResult.
  5. Return encodedResult.
Issue 2

Step 2 is pulled from the content-md5 definition in [HTTP11]. It’s unclear that it’s what we want. See bzbarsky’s WG post on this topic

3.3.2 Is resource eligible for integrity validation

In order to mitigate an attacker’s ability to read data cross-origin by brute-forcing values via integrity checks, resources are only eligible for such checks if they are same-origin, publicly cachable, or are the result of explicit access granted to the loading origin via CORS. [CORS]

Certain HTTP headers can also change the way the resource behaves in ways which integrity checking cannot account for. If the resource contains these headers, it is ineligible for integrity validation:

  • WWW-Authenticate hides resources behind a login; such non-public resources are excluded from integrity checks.
  • Refresh can cause IFrame contents to transparently redirect to an unintended target, bypassing the integrity check.
Issue 3

Consider the impact of other headers: Content-Length, Content-Range, etc. Is there danger there?

The following algorithm details these restrictions:

  1. Let request be the request that fetched resource.
  2. If resource contains any of the following HTTP headers, return false:
    • WWW-Authenticate
    • Refresh
  3. If the mode of request is CORS, return true.
  4. If the origin of request is resource’s origin, return true.
  5. If resource is cachable by a shared cache, as defined in [HTTP11], return true.
  6. Return false.
Note

Step 2 returns true if the resource was a CORS-enabled request. If the resource failed the CORS checks, it won’t be available to us for integrity checking because it won’t have loaded successfully.

3.3.3 Does resource match metadata?

  1. If metadata is the empty string, return true.
  2. If resource’s URL’s scheme is about, return true.
  3. If metadata is not a valid “named information” (ni) URI, return false.
  4. If resource is not eligible for integrity valiation, return false.
  5. Let algorithm be the alg component of metadata.
  6. Let expectedValue be the val component of metadata.
  7. Let expectedType be the value of metadata’s ct query string parameter.
  8. If expectedType is not the empty string, and is not a case-insensitive match for resource’s MIME type, return false.
  9. Let actualValue be the result of applying algorithm to resource.
  10. If actualValue is null, return false.
  11. If actualValue is a case-sensitive match for expectedValue, return true. Otherwise, return false.
Note

If expectedType is the empty string in #6, it would be reasonable for the user agent to warn the page’s author about the dangers of MIME type confusion attacks via its developer console.

3.4 Modifications to Fetch

The Fetch specification should contain the following modifications in order to enable the rest of this specification’s work:

  1. The following text should be added to section 2.2: “A request has an associated integrity metadata. Unless stated otherwise, a request’s integrity metadata is the empty string.”

  2. The following text should be added to section 2.3: “A response has an associated integrity state, which is one of indeterminate, pending, corrupt, and intact. Unless stated otherwise, it is indeterminate.

  3. Perform the following steps before executing both the “basic fetch” and “CORS fetch with preflight” algorithms:

    1. If request’s integrity metadata is the empty string, set response’s integrity state to indeterminate. Otherwise:

      1. Set response’s integrity state to pending.
      2. Include a Cache-Control header whose value is “no-transform”.
      3. If request’s integrity metadata contains a content type:
        1. Set request’s Accept header value to the value of request’s integrity metadata’s content type.
  4. Add the following step before step #1 of the handling of 401 status codes for both “basic fetch” and “CORS fetch with preflight” algorithms:

    1. If request’s integrity state is pending, set response’s integrity state to corrupt and return response.
  5. Before firing the process request end-of-file event for any request:

    1. If the request’s integrity metadata is the empty string, set the response’s integrity state to indeterminate and skip directly to firing the event.

    2. If response matches the request’s integrity metadata, set the response’s integrity state to intact and skip directly to firing the event.

    3. Set the response’s integrity state to corrupt and skip directly to firing the event.

3.5 Verification of HTML document subresources

A variety of HTML elements result in requests for resources that are to be embedded into the document, or executed in its context. To support integrity metadata for each of these, and new elements that are added in the future, a new integrity attribute is added to the list of content attributes for the a, audio, embed, iframe, link, object, script, source, track, and video elements.

A corresponding integrity IDL attribute which reflects the value each element’s integrity content attribute is added to the HTMLAnchorElement, HTMLMediaElement, HTMLEmbedElement, HTMLIframeElement, HTMLLinkElement, HTMLObjectElement, HTMLScriptElement, HTMLSourceElement, and HTMLTrackElement interfaces.

3.5.1 The integrity attribute

The integrity attribute represents integrity metadata for an element. The value of the attribute MUST be either the empty string, or one valid “named information” (ni) URI [RFC6920], as described by the following ABNF grammar:

integrity-metatata = "" / 1#( *WSP NI-URL ) *WSP ]

The NI-URL rule is defined in RFC6920, section 3, figure 4.

The integrity IDL attribute must reflect the integrity content attribute.

Issue 4

We should consider supporting multiple ni URLs, which could allow migration between algorithms.

3.5.2 The noncanonical-src attribute (TODO)

Authors MAY opt-in to a fallback mechanism whereby user agents would initially attempt to load resources from a non-canonical source (perhaps over HTTP, for performance and caching reasons). If that fetch failed an integrity check, the user agent would report a violation, and retry the fetch using a canonical URL (perhaps over HTTPS).

The non-canonical URL is specified via a noncanonical-src attribute. For example:

Example 8
<script src="http://example.com/script.js"
        noncanonical-src="http://cdn.example.com/script.js"
        integrity="ni:///sha-256;jsdfhiuwergn...vaaetgoifq?ct=application/javascript"></script>

The noncanonicalSrc IDL attribute MUST reflect the noncanonical-src content attribute.

The noncanonical resource MUST be fetched with its omit credentials mode set to always, to prevent leakage of cookies across insecure channels.

Issue 5

This attribute (and fallback in general) only makes sense if we care about allowing cache-friendly (read “HTTP”) URLs to load in an HTTPS context without warnings. I’m not sure we do, so I’m not going to put too much thought into the details here before we discuss things a bit more. (mkwst)

3.5.3 Element interface extensions

3.5.3.1 HTMLAnchorElement
partial interface HTMLAnchorElement {
                attribute DOMString integrity;
};
3.5.3.1.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute
3.5.3.2 HTMLEmbedElement
partial interface HTMLObjectElement {
                attribute DOMString integrity;
};
3.5.3.2.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute
3.5.3.3 HTMLIFrameElement
partial interface HTMLIFrameElement {
                attribute DOMString integrity;
};
3.5.3.3.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute
3.5.3.4 HTMLImageElement
partial interface HTMLImageElement {
                attribute DOMString integrity;
};
3.5.3.4.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute
3.5.3.5 HTMLLinkElement
partial interface HTMLLinkElement {
                attribute DOMString integrity;
};
3.5.3.5.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute
3.5.3.6 HTMLMediaElement
partial interface HTMLMediaElement {
                attribute DOMString integrity;
};
3.5.3.6.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute
3.5.3.7 HTMLObjectElement
partial interface HTMLObjectElement {
                attribute DOMString integrity;
};
3.5.3.7.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute
3.5.3.8 HTMLScriptElement
partial interface HTMLScriptElement {
                attribute DOMString integrity;
};
3.5.3.8.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute
3.5.3.9 HTMLTrackElement
partial interface HTMLTrackElement {
                attribute DOMString integrity;
};
3.5.3.9.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute

3.5.4 Handling integrity violations

Documents may specify the behavior of a failed integrity check by delivering a Content Security Policy which contains an integrity-policy directive, defined by the following ABNF grammar:

directive-name  = "integrity-policy"
directive-value = 1#failure-mode [ "require-for-all" ]
failure-mode    = ( "block" / "report" / "fallback" )

A document’s integrity policy is the value of the integrity-policy directive, if explicitly provided as part of the document’s Content Security Policy, or block otherwise.

If the document’s integrity policy contains block, the user agent MUST refuse to render or execute resources that fail an integrity check, and MUST report a violation.

If the document’s integrity policy contains report, the user agent MAY render or execute resources that fail an integrity check, but MUST report a violation.

Issue 6

If the document’s integrity policy contains fallback, the user agent MUST refuse to render or execute resources that fail an integrity check, and MUST report a violation. The user agent MAY additionally choose to load a fallback resource as specified for each relevant element. If the fallback resource fails an integrity check, the user agent MUST refuse to render or execute the resource, and MUST report a(nother) violation. (See the noncanonical-src attribute for a strawman of how that might look).

Issue 7

If the document’s integrity policy contains require-for-all, the user agent MUST treat the lack of integrity metadata for an resource as automatic failure, refuse to fetch the resource, and report a violation.

3.5.5 Elements

3.5.5.1 The a element

If an a element has both integrity and download attributes, the user agent has all the data it needs in order to verify the integrity of the downloaded resource. This is the only type of download we can safely make promises about, so it is the only type of download that we support. If integrity metadata is added to any a element that does not explicitly request that the resource it points to be downloaded, user agents MUST treat the link as broken.

Before following a hyperlink, the user agent MUST run the following steps:

  1. If subject has an integrity attribute whose value is not the empty string, then:
    1. The user agent MAY report an error to the user in a user-agent-specific manner.
    2. Abort the following a hyperlink algorithm.

Replace step 6 of the downloads a hyperlink algorithm with the following:

  1. If the integrity attribute of that element is not the empty string, and the element does not have a download attribute, abort these steps.
  2. Fetch URL with integrity metadata set to the value of the integrity attribute of that element, and handle the resulting resource as a download.

When handling a resource as a download, perform the following step before providing a user with a way to save the resource for later use:

Note

Note that this will cover only downloads triggered explicitly by adding a download attribute to an a element. Such a link might look like the following:

Example 9
<a href="https://example.com/file.zip"
   integrity="ni:///sha256;skjdsfkafinqfb...ihja_gqg?ct=application/octet-stream"
   download>Download!</a>
3.5.5.2 The embed element

When fetching an URL via step 2 of the embed element setup steps algorithm:

  1. Set the integrity metadata of the request to the value of the element’s integrity attribute.

Before running the task queued by the networking task source once the URL has been fetched, first perform the following steps:

  1. If the response’s integrity state is corrupt:
    1. If the document’s integrity policy is block:
      1. Set the element’s type attribute to the empty string.
      2. Skip to step 4 of the algorithm.
    2. Report a violation.
3.5.5.3 The iframe element

When content is to be loaded into the child browsing context created by an iframe, perform fetches with the integrity metadata set to the value of the iframe element’s integrity attribute. Moreover:

Note

Note that this will only check the integrity of the iframe’s document source. No subsequent verification for the document’s subresources is perfomed. If integrity checks for the document’s subresources are desirable, the document loaded into the iframe needs to include integrity metadata for its subresources.

Issue 8

How does this effect things like the preload scanner? How much work is it going to be for vendors to change the “display whatever we’ve got, ASAP!” behavior that makes things fast for users? How much impact will there be on user experience, especially for things like ads, where this kind of validation has the most value?

Issue 9

How do we deal with navigations in the child browsing context? Are they simply disallowed? If so, does that make sense? It might for ads, but what about other use-cases?

3.5.5.4 The img element

When fetching an image via step 12 of the update the image data algorithm:

  1. Set the integrity metadata of the request to the value of the element’s integrity attribute.

Before jumping one of the entries from the list in step 14 of the update the image data algorithm, first perform the following steps:

  1. If the response’s integrity state is corrupt:
    1. If the document’s integrity policy is block:
      1. Abort the jump in progress.
      2. Perform the steps in the entry labeled “Otherwise” under step 14.
    2. Report a violation.
3.5.5.6 The object element

When fetching an image via step 4 of step 4 of the “determine what the object element represents” algorithm:

  1. Set the integrity metadata of the request to the value of the element’s integrity attribute.

Before step 7 of the “determine what the object element represents” algorithm, first perform the following steps:

  1. If the response’s integrity state is corrupt:
    1. If the document’s integrity policy is block:
      1. Fire a simple event named error at the element.
      2. Jump to the step labeled fallback.
    2. Report a violation.
3.5.5.7 The script element

When executing step 5 of step 14 of HTML5’s “prepare a script” algorithm:

  1. Set the integrity metadata of the request to the value of the element’s integrity attribute.

Insert the following steps after step 5 of step 14 of HTML5’s “prepare a script” algorithm:

  1. Once the fetching algorithm has completed:
    1. If the response’s integrity state is corrupt:
      1. If the document’s integrity policy is block:
        1. If resource is same origin with the link element’s Document’s origin, then queue a task to fire a simple event named error at the element, and abort these steps.
      2. Report a violation.
3.5.5.8 The track element

When fetching the track URL in step 10 of the start the track processing model algorithm:

  1. Set the integrity metadata of the request to the value of the element’s integrity attribute.

Additionally, perform the following steps before performing the steps specified for a successful track fetch:

  1. If the response’s integrity state is corrupt:
    1. If the document’s integrity policy is block:
      1. Perform the steps specified for a failed track fetch.
      2. Abort the steps specified for a successful track fetch.
    2. Report a violation.
3.5.5.9 The audio element (TODO)
Issue 10

TODO: Write this section? Might want to delay media elements until we have a solution to streaming.

3.5.5.10 The source element (TODO)
Issue 11

TODO: Write this section? Might want to delay media elements until we have a solution to streaming.

3.5.5.11 The video element (TODO)
Issue 12

TODO: Write this section? Might want to delay media elements until we have a solution to streaming.

3.6 Verification of CSS-loaded subresources

Issue 13

Tab and Anne are poking at adding fetch() to some spec somewhere which would allow CSS files to specify various arguments to the fetch algorithm while requesting resources. Detail on the proposal is at http://lists.w3.org/Archives/Public/public-webappsec/2014Jan/0129.html. Once that is specified, we can proceed defining an integrity argument that would allow integrity checks in CSS.

3.7 Verification of JS-loaded subresources

Issue 14

These sections are less fleshed out and debated than the HTML sections, where the WG has concentrated most of its time thus far.

3.7.1 Workers

To validate the integrity of scripts which are to be run as workers, a new constructor is added for Worker and SharedWorker which accepts a second argument containing integrity metadata. This information is used when running a worker to perform validation, as outlined in the following sections: [WEBWORKERS]

3.7.1.1 Worker extension
[Constructor (DOMString scriptURL, DOMString integrityMetadata)]
partial interface Worker : EventTarget {
                attribute DOMString integrity;
};
3.7.1.1.1 Attributes
integrity of type DOMString,
The value of the Worker’s integrity attribute. Defaults to the empty string.

When the Worker(scriptURL, integrityMetadata) constructor is invoked:

  1. If integrityMetadata is not a valid “named information” (ni) URL, throw a SyntaxError exception and abort these steps.
  2. Execute the Worker(scriptURL) constructor, and set the newly created Worker object’s integrity attribute to integrityMetadata.
3.7.1.2 SharedWorker extension
[Constructor (DOMString scriptURL, DOMString name, DOMString integrityMetadata)]
partial interface Worker : EventTarget {
                attribute DOMString integrity;
};
3.7.1.2.1 Attributes
integrity of type DOMString,
The value of the SharedWorker’s integrity attribute. Defaults to the empty string.

When the SharedWorker(scriptURL, name, integrityMetadata) constructor is invoked:

  1. If integrityMetadata is not a valid “named information” (ni) URL, throw a SyntaxError exception and abort these steps.
  2. Execute the SharedWorker(scriptURL, name) constructor, and set the newly created SharedWorker object’s integrity attribute to integrityMetadata.
3.7.1.3 Validation

Add the following step directly after step 4 of the run a worker algorithm:

  1. If the script resource fetched in step 4 has an integrity status of corrupt, then for each Worker or SharedWorker object associated with worker global scope, queue a task to fire a simple event named error at that object. Abort these steps.

3.7.2 XMLHttpRequest

To validate the integrity of resources loaded via XMLHttpRequest, a new integrity attribute is added to the XMLHttpRequest object. If set, the integrity metadata in this attribute is used to validate the resource before triggering the load event. [XMLHTTPREQUEST]

3.7.2.1 The integrity attribute

The integrity attribute must return its value. Initially its value MUST be the empty string.

Setting the integrity attribute MUST run these steps:

  1. If the state is not UNSENT or OPENED, throw an InvalidStateError exception and abort these steps.
  2. If the value provided is not a valid “named information” (ni) URL, throw a “SyntaxError` exception and abort these steps.
  3. Set the integrity attribute’s value to the value provided.
3.7.2.2 Progress events

Validation only takes place when the entire resource body has been downloaded. Data processed before the resource has completely loaded (or failed to load) is unvalidated, and potentially corrupt. For that reason, if the document’s integrity policy is block, progress events will not fire until the fetch has completed, one way or another.

If the document’s integrity policy is not block, developers who care about integrity validation SHOULD still ignore progress events fired while the resource is downloading, and instead listen only for the load, abort, and error events.

If the document’s integrity policy is block, then:

  • Before executing step 3.2 of the “process response” algorithm in step 13 of XMLHttpRequest’s send(data) method:
    1. If the object’s integrity attribute is not the empty string the user agent MUST abort the “process response” algorithm, and MUST NOT fire the readystatechange event.
  • Before executing step 2.2 of the “process response body” algorithm in step 13 of XMLHttpRequest’s send(data) method:
    1. If the object’s integrity attribute is not the empty string the user agent MUST abort the “process response body” algorithm, and MUST NOT fire the readystatechange event.
  • Before executing step 4 of the “process response body” algorithm in step 13 of XMLHttpRequest’s send(data) method:
    1. If the object’s integrity attribute is not the empty string the user agent MUST abort the “process response body” algorithm, and MUST NOT fire the progress event.
3.7.2.3 Validation

Whenever the user agent would switch an XMLHttpRequest object to the DONE state, then perform the following steps before switching state:

  1. If the response’s integrity state is intact or indeterminate, then abort these steps, and continue to switch to the DONE state.
  2. Otherwise, report a violation, and run the following steps if the document’s integrity policy is block:
    1. Set the response entity body to null
    2. Run the request error steps for exception NetworkError and event error.
    3. Do not continue to switch to the DONE state.

4. Caching (Optional)

The caching mechanism described in this section is OPTIONAL.

JavaScript libraries are a good example of resources that are often loaded and reloaded from different locations as users browse the web: http://cdnjs.cloudflare.com/ajax/libs/jquery/1.10.2/jquery.min.js is exactly the same file as https://ajax.googleapis.com/ajax/libs/jquery/1.10.2/jquery.min.js. Both files are identifiable via the ni URL ni:///sha-256;iaFenEC8axSAnyNu6M0-0epCOTwfbKVceFXNd5s_ki4?ct=application/javascript.

To reduce the performance impact of reloading the same data, user agents MAY use integrity metadata as a new index to a local cache, meaning that a user who had already loaded a version of the file from ajax.googleapis.com wouldn’t have to touch the network to load the cdnjs.cloudflare.com version. The user agent knows that the content is the same, and would be free to treat the latter as a cache hit, regardless of the location mismatch.

4.1 Risks

This approach is good for performance, but can have security implications. See the origin confusion and MIME type confusion sections below for some details.

4.1.1 Origin confusion

User agents which set up a caching mechanism that uses only the integrity metadata to identify a resource are vulnerable to attacks which bypass same-origin restrictions unless they are very careful when choosing whether or not to read data straight from the cache.

For instance:

  • Runtime script errors are sanitized for resources that are CORS-cross-origin to the page into which they are loaded. [HTML5]

  • XMLHttpRequest may only load data from same-origin resources, or from resources delivered with proper CORS headers. [XMLHTTPREQUEST]

  • Content Security Policy performs origin-based security checks. [CSP]

Issue 15

More?

Note

The simple cache-poisoning version of this attack can be mitigated by requiring strong hash functions for cachable resources. More complex variants are more difficult to mitigate. Consider the following:

  1. An attacker lures Alice to a page containing the following code:

    Example 10
    <script src="http://evil.com/evil.js" digest="ni://sha-256;123...789">
  2. Alice’s user agent loads evil.js, and stores it in her cache.

  3. Though bank.com is protected by a CSP which allows only script from bank.com, the attacker may still be able to exploit an XSS vulnerability in bank.com which allows the injection of:

    Example 11
    <script src="http://bank.com/awesome.js" digest="ni://sha-256;123...789">

    Since the script appears to come from bank.com, CSP allows it, even though it doesn’t actually exist on that server.

4.1.2 MIME type confusion

User agents which set up a caching mechanism that uses only the integrity metadata to identify a resource are vulnerable to attacks which create resources that behave differently based on the context in which they are loaded. Gifar is the canonical example of such an attack.

Authors SHOULD mitigate this risk by specifying the expected content type along with the digest, as specified in RFC 6920, section 3.1. This means that the content type will be verified along with the digest when determining whether a resource matches certain integrity metadata.

4.2 Recommendations

To mitigate the risk of cross-origin data leakage or type-sniffing exploitation, user agents that take this approach to caching MUST NOT use integrity metadata as a cache identifier unless the following are all true:

Issue 16

More ideas? Limiting to resources with wide-open CORS headers and strong hash functions seems like a reasonable start…

5. Proxies

Optimizing proxies and other intermediate servers which modify the content of fetched resources MUST ensure that the digest associated with those resources stays in sync with the new content. One option is to ensure that the integrity metadata associated with resources is updated along with the resource itself. Another would be simply to deliver only the canonical version of resources for which a page author has requested integrity verification. To support this latter option, user agents MAY send a Cache-Control header with a value of no-transform.

Issue 17

Think about how integrity checks would effect vary headers in general.

6. Security Considerations

6.1 Insecure channels remain insecure

Integrity metadata delivered over an insecure channel provides no security benefit. Attackers can alter the digest in-flight (or remove it entirely (or do absolutely anything else to the document)), just as they could alter the resource the hash is meant to validate. Authors who desire any sort of security whatsoever SHOULD deliver resources containing digests over secure channels.

6.2 Hash collision attacks

Digests are only as strong as the hash function used to generate them. User agents SHOULD refuse to support known-weak hashing functions like MD5, and SHOULD restrict supported hashing functions to those known to be collision-resistant. At the time of writing, SHA-256 is a good baseline. Moreover, user agents SHOULD reevaluate their supported hashing functions on a regular basis, and deprecate support for those functions shown to be insecure.

6.3 Cross-origin data leakage

Attackers can determine whether some cross-origin resource has certain content by attempting to load it with a known digest, and watching for load failure. If the load fails, the attacker can surmise that the resource didn’t match the hash, and thereby gain some insight into its contents. This might reveal, for example, whether or not a user is logged into a particular service.

Moreover, attackers can brute-force specific values in an otherwise static resource: consider a document that looks like this:

Example 12
<html>{static content}<h1>Hello, $username!</h1>{static content}</html>

An attacker can precompute hashes for the page with a variety of common usernames, and specify those hashes while repeatedly attempting to load the document. By examining the reported violations, the attacker can obtain a user’s username.

User agents SHOULD mitigate the risk by refusing to fire error events on elements which loaded cross-origin resources, but some side-channels will likely be difficult to avoid (image’s naturalHeight and naturalWidth for instance).

7. Acknowledgements

None of this is new. Much of the content here is inspired heavily by Gervase Markham’s Link Fingerprints concept, as well as WHATWG’s Link Hashes.

A. References

A.1 Normative references

[ABNF]
D. Crocker; P. Overell. Augmented BNF for Syntax Specifications: ABNF. January 2008. STD. URL: http://www.ietf.org/rfc/rfc5234.txt
[CORS]
Anne van Kesteren. Cross-Origin Resource Sharing. 16 January 2014. W3C Recommendation. URL: http://www.w3.org/TR/cors/
[CSP]
Adam Barth; Dan Veditz; Mike West. Content Security Policy 1.1. Working Draft. URL: http://w3.org/TR/CSP11
[HTML5]
Robin Berjon; Steve Faulkner; Travis Leithead; Erika Doyle Navara; Edward O'Connor; Silvia Pfeiffer. HTML5. 4 February 2014. W3C Candidate Recommendation. URL: http://www.w3.org/TR/html5/
[HTTP11]
R. Fielding et al. Hypertext Transfer Protocol - HTTP/1.1. June 1999. RFC. URL: http://www.ietf.org/rfc/rfc2616.txt
[MIMETYPE]
Ned Freed; Nathaniel S. Borenstein. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. Draft Standard. URL: http://tools.ietf.org/html/rfc2046
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Internet RFC 2119. URL: http://www.ietf.org/rfc/rfc2119.txt
[RFC2818]
E. Rescorla. HTTP Over TLS. May 2000. RFC. URL: http://www.ietf.org/rfc/rfc2818.txt
[RFC4648]
Simon Josefsson. The Base16, Base32, and Base64 Data Encodings. Proposed Standard. URL: http://tools.ietf.org/html/rfc4648
[RFC6454]
A. Barth. The Web Origin Concept. December 2011. RFC. URL: http://www.ietf.org/rfc/rfc6454.txt
[RFC6920]
Stephen Farrell; Dirk Kutscher; Christian Dannewitz; Borje Ohlman; Ari Keranen; Phillip Hallam-Baker. Naming Things with Hashes. Proposed Standard. URL: http://tools.ietf.org/html/rfc6920
[WEBWORKERS]
Ian Hickson. Web Workers. 1 May 2012. W3C Candidate Recommendation. URL: http://www.w3.org/TR/workers/
[XMLHTTPREQUEST]
Anne van Kesteren; Julian Aubourg; Jungkee Song; Hallvord Steen et al. XMLHttpRequest Level 1. 30 January 2014. W3C Working Draft. URL: http://www.w3.org/TR/XMLHttpRequest/