Subresource Integrity

Abstract

This specification defines a mechanism by which user agents may verify that a fetched resource has been delivered without unexpected manipulation.

1. Introduction

This section is non-normative.

Sites and applications on the web are rarely composed of resources from only a single origin. For example, authors pull scripts and styles from a wide variety of services and content delivery networks, and must trust that the delivered representation is, in fact, what they expected to load. If an attacker can trick a user into downloading content from a hostile server (via DNS poisoning, or other such means), the author has no recourse. Likewise, an attacker who can replace the file on the CDN server has the ability to inject arbitrary content.

Delivering resources over a secure channel mitigates some of this risk: with TLS, HSTS, and pinned public keys, a user agent can be fairly certain that it is indeed speaking with the server it believes it’s talking to. These mechanisms, however, authenticate only the server, not the content. An attacker (or admin!) with access to the server can manipulate content with impunity. Ideally, authors would not only be able to pin the keys of a server, but also pin the content, ensuring that an exact representation of a resource, and only that representation, loads and executes.

This document specifies such a validation scheme, extending several HTML elements with an integrity attribute that contains a cryptographic hash of the representation of the resource the author expects to load. For instance, an author may wish to load jQuery from a shared server rather than hosting it on their own origin. Specifying that the expected SHA-256 hash of https://code.jquery.com/jquery-1.10.2.min.js is C6CB9UYIS9UJeqinPHWTHVqh/E1uhG5Twh+Y5qFQmYg= means that the user agent can verify that the data it loads from that URL matches that expected hash before executing the JavaScript it contains. This integrity verification significantly reduces the risk that an attacker can substitute malicious content.

This example can be communicated to a user agent by adding the hash to a script element, like so:

<script src="https://code.jquery.com/jquery-1.10.2.min.js"
        integrity="sha256-C6CB9UYIS9UJeqinPHWTHVqh/E1uhG5Twh+Y5qFQmYg=">

Scripts, of course, are not the only resource type which would benefit from integrity validation. The scheme specified here applies to all HTML elements which trigger fetches, as well as to fetches triggered from CSS and JavaScript.

1.1 Goals

Compromise of the third-party service should not automatically mean compromise of every site which includes its scripts. Content authors will have a mechanism by which they can specify expectations for content they load, meaning for example that they could load a specific script, and not any script that happens to have a particular URL.
The verification mechanism should have reporting functionality which would inform the author that an invalid resource was downloaded. Further it should be possible for an author to choose to run only the reporting functionality, allowing potentially corrupt resources to run on her site, but flagging violations for manual review.

1.2 Use Cases/Examples

1.2.1 Resource Integrity

An author wishes to use a content delivery network to improve performance for her globally-distributed users. She wishes to ensure, however, that the CDN’s servers deliver only the code she expects them to deliver. She can mitigate the risk that CDN compromise (or unexpectedly malicious behavior) would change her code in unfortunate ways by adding integrity metadata to the link element included on her page:
Example 1
```
<link rel="stylesheet" href="https://site53.cdn.net/style.css"
      integrity="sha256-SDfwewFAE...wefjijfE">
```
An author wants to include JavaScript provided by a third-party analytics service on her site. She wants, however, to ensure that only the code she’s carefully reviewed is executed. She can do so by generating integrity metadata for the script she’s planning on including, and adding it to the script element she includes on her page:
Example 2
```
<script src="https://analytics-r-us.com/v1.0/include.js"
        integrity="sha256-SDfwewFAE...wefjijfE"></script>
```
A user agent wishes to ensure that pieces of its UI which are rendered via HTML (for example, Chrome’s New Tab Page) aren’t manipulated before display. Integrity metadata mitigates the risk that altered JavaScript will run in these page’s high-privilege context.
The author of a mash-up wants to make sure her creation remains in a working state. Adding integrity metadata to external subresources defines an expected revision of the included files. The author can then use the reporting functionality to be notified of changes to the included resources.

3. Framework

The integrity verification mechanism specified here boils down to the process of generating a sufficiently strong cryptographic digest for a resource, and transmitting that digest to a user agent so that it may be used when fetching the resource.

3.1 Integrity metadata

To verify the integrity of a resource, a user agent requires integrity metadata, which consists of the following pieces of information:

cryptographic hash function (“alg”)
digest (“val”)

The hash function and digest MUST be provided in order to validate a resource’s integrity.

This metadata MUST be encoded in the same format as the hash-source in section 4.2 of the Content Security Policy Level 2 specification.

For example, given a script resource containing only the string “alert(‘Hello, world.’);”, an author might choose SHA-256 as a hash function. qznLcsROx4GACP2dm0UCKCzCG+HiZ1guq6ZZDob/Tng= is the base64-encoded digest that results. This can be encoded as follows:

Example 3

sha256-qznLcsROx4GACP2dm0UCKCzCG+HiZ1guq6ZZDob/Tng=

Note

Digests may be generated using any number of utilities. OpenSSL, for example, is quite commonly available. The example in this section is the result of the following command line:

echo -n "alert('Hello, world.');" | openssl dgst -sha256 -binary | openssl enc -base64 -A

3.2 Cryptographic hash functions

Conformant user agents MUST support the SHA-256, SHA-384 and SHA-512 cryptographic hash functions for use as part of a resource’s integrity metadata, and MAY support additional hash functions.

3.2.1 Agility

Multiple sets of integrity metadata may be associated with a single resource in order to provide agility in the face of future discoveries. For example, the “Hello, world.” resource described above may be described either of the following hash expressions:

Example 4

sha256-+MO/YqmqPm/BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8=
sha512-rQw3wx1psxXzqB8TyM3nAQlK2RcluhsNwxmcqXE2YbgoDW735o8TPmIR4uWpoxUERddvFwjgRSGw7gNPCwuvJg==

Authors may choose to specify both, for example:

<script src="hello_world.js"
   integrity="sha256-+MO/YqmqPm/BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8=
              sha512-rQw3wx1psxXzqB8TyM3nAQlK2RcluhsNwxmcqXE2YbgoDW735o8TPmIR4uWpoxUERddvFwjgRSGw7gNPCwuvJg==
    "></script>

In this case, the user agent will choose the strongest hash function in the list, and use that metadata to validate the resource (as described below in the “parse metadata” and “get the strongest metadata from set” algorithms).

When a hash function is determined to be insecure, user agents MUST deprecate and eventually remove support for integrity validation using that hash function.

To allow authors to switch to stronger hash functions without being held back by older user agents, validation using unsupported hash functions acts like no integrity value was provided (see the “Does resource match metadataList” algorithm below). Authors are encouraged to use strong hash functions, and to begin migrating to stronger hash functions as they become available.

3.2.2 Priority

User agents MUST provide a mechanism of determining the relative priority of two hash functions and return the empty string if the priority is equal. That is, if a user agent implemented a function like getPrioritizedHashFunction(a, b) it would return the hash function the user agent considers the most collision-resistant. For example, getPrioritizedHashFunction('SHA-256', 'SHA-512') would return SHA-512 and getPrioritizedHashFunction('SHA-256', 'SHA-256') would return the empty string.

3.3 Resource verification algorithms

3.3.1 Apply `algorithm` to `resource`

Let result be the result of applying algorithm to the representation data without any content-codings applied, except when the user agent intends to consumes the content with content-encodings applied (e.g., saving a gzip’d file to disk). In the latter case, let result be the result of applying algorithm to the representation data.
Let encodedResult be result of base64-encoding result.
Return encodedResult.

3.3.2 Is `resource` eligible for integrity validation

In order to mitigate an attacker’s ability to read data cross-origin by brute-forcing values via integrity checks, resources are only eligible for such checks if they are same-origin, publicly cachable, or are the result of explicit access granted to the loading origin via CORS. [CORS]

Note

As noted in RFC6454, section 4, some user agents use globally unique identifiers for each file URI. This means that resources accessed over a file scheme URL are unlikely to be eligible for integrity checks.

One should note that being a privileged document (e.g. a document delivered over HTTPS) is not necessary for the use of integrity validation. Because resource integrity is only an application level security tool, and it does not change the security state of the user agent, a privileged document is unnecessary. However, if integrity is used in an unprivileged document (e.g. a document delivered over HTTP), authors should be aware that the integrity provides no security guarantees at all. For this reason, authors should only deliver integrity metadata on a potentially secure origin. See Unprivileged contexts remain unprivileged for more discussion.

Certain HTTP headers can also change the way the resource behaves in ways which integrity checking cannot account for. If the resource contains these headers, it is ineligible for integrity validation:

Authorization or WWW-Authenticate hide resources behind a login; such non-public resources are excluded from integrity checks.
Refresh can cause IFrame contents to transparently redirect to an unintended target, bypassing the integrity check.

Issue 3

Consider the impact of other headers: Content-Length, Content-Range, etc. Is there danger there?

The following algorithm details these restrictions:

Let request be the request that fetched resource.
If resource contains any of the following HTTP headers, return false:
- Authorization
- WWW-Authenticate
- Refresh
If the mode of request is CORS, return true.
If the origin of request is resource’s origin, return true.
If resource is cachable by a shared cache, as defined in [RFC7234], return true.
Return false.

Note

Step 3 returns true if the resource was a CORS-enabled request. If the resource failed the CORS checks, it won’t be available to us for integrity checking because it won’t have loaded successfully.

3.3.3 Parse `metadata`.

This algorithm accepts a string, and returns either no metadata, or a set of valid hash expressions whose hash functions are understood by the user agent.

If metadata is the empty string, return no metadata.
Let result be the empty set.
For each token returned by splitting metadata on spaces:
1. If token is not a valid metadata, skip the remaining steps, and proceed to the next token.
2. Let algorithm be the alg component of token.
3. If algorithm is a hash function recognized by the user agent, add token to result.
Return no metadata if result is empty, otherwise return result.

3.3.4 Get the strongest metadata from `set`.

Let result be the empty set and strongest be the empty string.
For each item in set:
1. If result is the empty set, add item to result and set strongest to item, skip to the next item.
2. Let currentAlgorithm be the alg component of strongest.
3. Let newAlgorithm be the alg component of item.
4. If the result of getPrioritizedHashFunction(currentAlgorithm, newAlgorithm) is the empty string, add item to result. If the result is newAlgorithm, set strongest to item, set result to the empty set, and add item to result.
Return result.

3.3.5 Does `resource` match `metadataList`?

If resource’s URL’s scheme is about, return true.
If resource is not eligible for integrity validation, return false.
Let parsedMetadata be the result of parsing metadataList.
If parsedMetadata is no metadata, return true.
Let metadata be the result of getting the strongest metadata from parsedMetadata.
For each item in metadata:
1. Let algorithm be the alg component of metadata.
2. Let expectedValue be the val component of metadata.
3. Let actualValue be the result of applying algorithm to resource.
4. If actualValue is a case-sensitive match for expectedValue, return true.
Return false.

This algorithm allows the user agent to accept multiple, valid strong hash functions. For example, a developer might write a script element such as:

<script src="https://foobar.com/content-changes.js"
        integrity="sha256-C6CB9UYIS9UJeqinPHWTHVqh/E1uhG5Twh+Y5qFQmYg=
                   sha256-qznLcsROx4GACP2dm0UCKCzCG+HiZ1guq6ZZDob/Tng=">

which would allow the user agent to accept two different content payloads, one of which matches the first SHA256 hash value and the other matches the second SHA256 hash value.

Note

User agents may allow users to modify the result of this algorithm via user preferences, bookmarklets, third-party additions to the user agent, and other such mechanisms. For example, redirects generated by an extension like HTTPSEverywhere could load and execute correctly, even if the HTTPS version of a resource differs from the HTTP version.

3.4 Modifications to Fetch

The Fetch specification should contain the following modifications in order to enable the rest of this specification’s work [FETCH]:

The following text should be added to section 2.1.4: “A request has an associated integrity metadata. Unless stated otherwise, a request’s integrity metadata is the empty string.”
The following text should be added to section 2.1.5: “A response has an associated integrity state, which is one of indeterminate, pending, corrupt, and intact. Unless stated otherwise, it is indeterminate.
Perform the following steps before executing both the “basic fetch” and “CORS fetch with preflight” algorithms:
1. If request’s integrity metadata is the empty string, set response’s integrity state to indeterminate. Otherwise:
  1. Set response’s integrity state to pending.
  2. Include a Cache-Control header whose value is “no-transform”.
Add the following step before step #1 of the handling of 401 status codes in the HTTP fetch algorithm:
1. If request’s integrity state is pending, set response’s integrity state to corrupt and return response.
Before firing the process request end-of-file event for any request:
1. If the request’s integrity metadata is the empty string, set the response’s integrity state to indeterminate and skip directly to firing the event.
2. If response matches the request’s integrity metadata, set the response’s integrity state to intact and skip directly to firing the event.
3. Set the response’s integrity state to corrupt and skip directly to firing the event.

3.5 Verification of HTML document subresources

A variety of HTML elements result in requests for resources that are to be embedded into the document, or executed in its context. To support integrity metadata for each of these, and new elements that are added in the future, a new integrity attribute is added to the list of content attributes for the link and script elements.

A corresponding integrity IDL attribute which reflects the value each element’s integrity content attribute is added to the HTMLLinkElement and HTMLScriptElement interfaces.

Note

A future revision of this specification is likely to include SRI support for all possible subresources, i.e., a, audio, embed, iframe, img, link, object, script, source, -track, and video elements.

3.6 The `integrity` attribute

The integrity attribute represents integrity metadata for an element. The value of the attribute MUST be either the empty string, or at least one valid metadata as described by the following ABNF grammar:

integrity-metadata = *WSP hash-with-options *( 1*WSP hash-with-options ) *WSP / *WSP
hash-with-options  = hash-expression *("?" option-expression)
option-expression  = option-name "=" option-value
option-name        = 1*option-name-char
option-name-char   = ALPHA / DIGIT / "-"
option-value       = *option-value-char
option-value-char  = ALPHA / DIGIT / "-" / "+" / "." / "/"
hash-algo          = <hash-algo production from [Content Security Policy Level 2, section 4.2]>
base64-value       = <base64-value production from [Content Security Policy Level 2, section 4.2]>
hash-expression    = hash-algo "-" base64-value

The integrity IDL attribute must reflect the integrity content attribute.

option-expressions are associated on a per hash-expression basis and are applied only to the hash-expression that immediately precedes it.

Note

At the moment, no option-expressions are defined. However, future versions of the spec make define options, such as MIME types [MIMETYPE].

3.7 Element interface extensions

3.7.1 HTMLLinkElement

partial interface HTMLLinkElement {
                attribute DOMString integrity;
};

3.7.1.1 Attributes

integrity of type DOMString,: The value of this element’s integrity attribute

3.7.2 HTMLScriptElement

partial interface HTMLScriptElement {
                attribute DOMString integrity;
};

3.7.2.1 Attributes

integrity of type DOMString,: The value of this element’s integrity attribute

3.8 Handling integrity violations

Documents may specify the behavior of a failed integrity check by delivering a Content Security Policy which contains an integrity-policy directive, defined by the following ABNF grammar:

directive-name  = "integrity-policy"
directive-value = 1#failure-mode
failure-mode    = ( "block" / "report" )

A document’s integrity policy is the value of the integrity-policy directive, if explicitly provided as part of the document’s Content Security Policy, or block otherwise.

If the document’s integrity policy contains block, the user agent MUST refuse to render or execute resources that fail an integrity check, and MUST report a violation.

If the document’s integrity policy contains report, the user agent MAY render or execute resources that fail an integrity check, but MUST report a violation.

3.9 Elements

3.9.1 The `link` element for stylesheets

Whenever a user agent attempts to obtain a resource pointed to by a link element that has a rel attribute with the value of stylesheet:

Set the integrity metadata of the request to the value of the element’s integrity attribute.

Additionally, perform the following steps before firing a load event at the element:

If the response’s integrity state is corrupt:
1. If the document’s integrity policy is block:
  1. Abort the load event, and treat the resource as having failed to load.
  2. If resource is same origin with the origin of the link element’s Document, then queue a task to fire a simple event named error at the link element.
2. Report a violation.

3.9.2 The `script` element

When executing step 5 of step 14 of HTML5’s “prepare a script” algorithm:

Set the integrity metadata of the request to the value of the element’s integrity attribute.

Insert the following steps after step 5 of step 14 of HTML5’s “prepare a script” algorithm:

Once the fetching algorithm has completed:
1. If the response’s integrity state is corrupt:
  1. If the document’s integrity policy is block:
    1. If resource is same origin with the script element’s Document’s origin, then queue a task to fire a simple event named error at the element, and abort these steps.
  2. Report a violation.

3.10 Verification of CSS-loaded subresources

Issue 13

Tab and Anne are poking at adding fetch() to some spec somewhere which would allow CSS files to specify various arguments to the fetch algorithm while requesting resources. Detail on the proposal is at https://lists.w3.org/Archives/Public/public-webappsec/2014Jan/0129.html. Once that is specified, we can proceed defining an integrity argument that would allow integrity checks in CSS.

Abstract

Status of This Document

Table of Contents

1. Introduction

1.1 Goals

1.2 Use Cases/Examples

1.2.1 Resource Integrity

2. Conformance

2.1 Key Concepts and Terminology

3. Framework

3.1 Integrity metadata

3.2 Cryptographic hash functions

3.2.1 Agility

3.2.2 Priority

3.3 Resource verification algorithms

3.3.1 Apply algorithm to resource

3.3.2 Is resource eligible for integrity validation

3.3.3 Parse metadata.

3.3.4 Get the strongest metadata from set.

3.3.5 Does resource match metadataList?

3.4 Modifications to Fetch

3.5 Verification of HTML document subresources

3.6 The integrity attribute

3.7 Element interface extensions

3.7.1 HTMLLinkElement

3.7.1.1 Attributes

3.7.2 HTMLScriptElement

3.7.2.1 Attributes

3.8 Handling integrity violations

3.9 Elements

3.9.1 The link element for stylesheets

3.9.2 The script element

3.10 Verification of CSS-loaded subresources

4. Proxies

5. Security Considerations

5.1 Unprivileged contexts remain unprivileged

5.2 Hash collision attacks

5.3 Cross-origin data leakage

6. Acknowledgements

A. References

A.1 Normative references

3.3.1 Apply `algorithm` to `resource`

3.3.2 Is `resource` eligible for integrity validation

3.3.3 Parse `metadata`.

3.3.4 Get the strongest metadata from `set`.

3.3.5 Does `resource` match `metadataList`?

3.6 The `integrity` attribute

3.9.1 The `link` element for stylesheets

3.9.2 The `script` element