Abstract

This specification defines a mechanism by which user agents may verify that a fetched resource has been delivered without unexpected manipulation.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

A list of changes to this document may be found at https://github.com/w3c/webappsec.

This document was published by the Web Application Security Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-webappsec@w3.org (subscribe, archives) with [Integrity] at the start of your email's subject. All comments are welcome.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 August 2014 W3C Process Document.

Table of Contents

1. Introduction

This section is non-normative.

Sites and applications on the web are rarely composed of resources from only a single origin. For example, authors pull scripts and styles from a wide variety of services and content delivery networks, and must trust that the delivered representation is, in fact, what they expected to load. If an attacker can trick a user into downloading content from a hostile server (via DNS poisoning, or other such means), the author has no recourse. Likewise, an attacker who can replace the file on the CDN server has the ability to inject arbitrary content.

Delivering resources over a secure channel mitigates some of this risk: with TLS, HSTS, and pinned public keys, a user agent can be fairly certain that it is indeed speaking with the server it believes it’s talking to. These mechanisms, however, authenticate only the server, not the content. An attacker (or admin!) with access to the server can manipulate content with impunity. Ideally, authors would not only be able to pin the keys of a server, but also pin the content, ensuring that an exact representation of a resource, and only that representation, loads and executes.

This document specifies such a validation scheme, extending several HTML elements with an integrity attribute that contains a cryptographic hash of the representation of the resource the author expects to load. For instance, an author may wish to load jQuery from a shared server rather than hosting it on their own origin. Specifying that the expected SHA-256 hash of https://code.jquery.com/jquery-1.10.2.min.js is C6CB9UYIS9UJeqinPHWTHVqh/E1uhG5Twh+Y5qFQmYg= means that the user agent can verify that the data it loads from that URL matches that expected hash before executing the JavaScript it contains. This integrity verification significantly reduces the risk that an attacker can substitute malicious content.

This example can be communicated to a user agent by adding the hash to a script element, like so:

<script src="https://code.jquery.com/jquery-1.10.2.min.js"
        integrity="sha256-C6CB9UYIS9UJeqinPHWTHVqh/E1uhG5Twh+Y5qFQmYg=">

Scripts, of course, are not the only resource type which would benefit from integrity validation. The scheme specified here applies to all HTML elements which trigger fetches, as well as to fetches triggered from CSS and JavaScript.

1.1 Goals

  1. Compromise of the third-party service should not automatically mean compromise of every site which includes its scripts. Content authors will have a mechanism by which they can specify expectations for content they load, meaning for example that they could load a specific script, and not any script that happens to have a particular URL.

  2. The verification mechanism should have reporting functionality which would inform the author that an invalid resource was downloaded. Further it should be possible for an author to choose to run only the reporting functionality, allowing potentially corrupt resources to run on her site, but flagging violations for manual review.

1.2 Use Cases/Examples

1.2.1 Resource Integrity

  • An author wishes to use a content delivery network to improve performance for her globally-distributed users. She wishes to ensure, however, that the CDN’s servers deliver only the code she expects them to deliver. She can mitigate the risk that CDN compromise (or unexpectedly malicious behavior) would change her code in unfortunate ways by adding integrity metadata to the link element included on her page:

    Example 1
    <link rel="stylesheet" href="https://site53.cdn.net/style.css"
          integrity="sha256-SDfwewFAE...wefjijfE">
  • An author wants to include JavaScript provided by a third-party analytics service on her site. She wants, however, to ensure that only the code she’s carefully reviewed is executed. She can do so by generating integrity metadata for the script she’s planning on including, and adding it to the script element she includes on her page:

    Example 2
    <script src="https://analytics-r-us.com/v1.0/include.js"
            integrity="sha256-SDfwewFAE...wefjijfE"></script>
  • A user agent wishes to ensure that pieces of its UI which are rendered via HTML (for example, Chrome’s New Tab Page) aren’t manipulated before display. Integrity metadata mitigates the risk that altered JavaScript will run in these page’s high-privilege context.

  • The author of a mash-up wants to make sure her creation remains in a working state. Adding integrity metadata to external subresources defines an expected revision of the included files. The author can then use the reporting functionality to be notified of changes to the included resources.

2. Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY, MUST, and SHOULD are to be interpreted as described in [RFC2119].

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

2.1 Key Concepts and Terminology

This section defines several terms used throughout the document.

The term digest refers to the base64-encoded result of executing a cryptographic hash function on an arbitrary block of data.

The term origin is defined in the Origin specification. [RFC6454]

The terms privileged document, unprivileged document, and privileged context are defined in section 2 of the Privileged Contexts specification. An example of a privileged document is a document loaded over HTTPS. An example of an unprivileged document and an unprivileged context are a document loaded over HTTP.

A potentially secure origin is defined in section 2 of the Mixed Content specification. An example of a potentially secure origin is an origin whose scheme component is HTTPS.

The message body and the transfer encoding of a resource are defined by RFC7230, section 3. [RFC7230]

The representation data and content encoding of a resource are defined by RFC7231, section 3. [RFC7231]

A base64 encoding is defined in RFC 4648, section 4. [RFC4648]

The Augmented Backus-Naur Form (ABNF) notation used in this document is specified in RFC 5234. [ABNF]

The SHA-256, SHA-384, and SHA-512 are part of the SHA-2 set of cryptographic hash functions defined by the NIST in “Descriptions of SHA-256, SHA-384, and SHA-512”.

3. Framework

The integrity verification mechanism specified here boils down to the process of generating a sufficiently strong cryptographic digest for a resource, and transmitting that digest to a user agent so that it may be used when fetching the resource.

3.1 Integrity metadata

To verify the integrity of a resource, a user agent requires integrity metadata, which consists of the following pieces of information:

The hash function and digest MUST be provided in order to validate a resource’s integrity.

This metadata MUST be encoded in the same format as the hash-source in section 4.2 of the Content Security Policy Level 2 specification.

For example, given a script resource containing only the string “alert(‘Hello, world.’);”, an author might choose SHA-256 as a hash function. qznLcsROx4GACP2dm0UCKCzCG+HiZ1guq6ZZDob/Tng= is the base64-encoded digest that results. This can be encoded as follows:

Example 3
sha256-qznLcsROx4GACP2dm0UCKCzCG+HiZ1guq6ZZDob/Tng=
Note

Digests may be generated using any number of utilities. OpenSSL, for example, is quite commonly available. The example in this section is the result of the following command line:

echo -n "alert('Hello, world.');" | openssl dgst -sha256 -binary | openssl enc -base64 -A

3.2 Cryptographic hash functions

Conformant user agents MUST support the SHA-256, SHA-384 and SHA-512 cryptographic hash functions for use as part of a resource’s integrity metadata, and MAY support additional hash functions.

3.2.1 Agility

Multiple sets of integrity metadata may be associated with a single resource in order to provide agility in the face of future discoveries. For example, the “Hello, world.” resource described above may be described either of the following hash expressions:

Example 4
sha256-+MO/YqmqPm/BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8=
sha512-rQw3wx1psxXzqB8TyM3nAQlK2RcluhsNwxmcqXE2YbgoDW735o8TPmIR4uWpoxUERddvFwjgRSGw7gNPCwuvJg==

Authors may choose to specify both, for example:

<script src="hello_world.js"
   integrity="sha256-+MO/YqmqPm/BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8=
              sha512-rQw3wx1psxXzqB8TyM3nAQlK2RcluhsNwxmcqXE2YbgoDW735o8TPmIR4uWpoxUERddvFwjgRSGw7gNPCwuvJg==
    "></script>

In this case, the user agent will choose the strongest hash function in the list, and use that metadata to validate the resource (as described below in the “parse metadata” and “get the strongest metadata from set” algorithms).

When a hash function is determined to be insecure, user agents MUST deprecate and eventually remove support for integrity validation using that hash function.

To allow authors to switch to stronger hash functions without being held back by older user agents, validation using unsupported hash functions acts like no integrity value was provided (see the “Does resource match metadataList” algorithm below). Authors are encouraged to use strong hash functions, and to begin migrating to stronger hash functions as they become available.

3.2.2 Priority

User agents MUST provide a mechanism of determining the relative priority of two hash functions and return the empty string if the priority is equal. That is, if a user agent implemented a function like getPrioritizedHashFunction(a, b) it would return the hash function the user agent considers the most collision-resistant. For example, getPrioritizedHashFunction('SHA-256', 'SHA-512') would return SHA-512 and getPrioritizedHashFunction('SHA-256', 'SHA-256') would return the empty string.

3.3 Resource verification algorithms

3.3.1 Apply algorithm to resource

  1. Let result be the result of applying algorithm to the representation data without any content-codings applied, except when the user agent intends to consumes the content with content-encodings applied (e.g., saving a gzip’d file to disk). In the latter case, let result be the result of applying algorithm to the representation data.
  2. Let encodedResult be result of base64-encoding result.
  3. Return encodedResult.

3.3.2 Is resource eligible for integrity validation

In order to mitigate an attacker’s ability to read data cross-origin by brute-forcing values via integrity checks, resources are only eligible for such checks if they are same-origin, publicly cachable, or are the result of explicit access granted to the loading origin via CORS. [CORS]

Note

As noted in RFC6454, section 4, some user agents use globally unique identifiers for each file URI. This means that resources accessed over a file scheme URL are unlikely to be eligible for integrity checks.

One should note that being a privileged document (e.g. a document delivered over HTTPS) is not necessary for the use of integrity validation. Because resource integrity is only an application level security tool, and it does not change the security state of the user agent, a privileged document is unnecessary. However, if integrity is used in an unprivileged document (e.g. a document delivered over HTTP), authors should be aware that the integrity provides no security guarantees at all. For this reason, authors should only deliver integrity metadata on a potentially secure origin. See Unprivileged contexts remain unprivileged for more discussion.

Certain HTTP headers can also change the way the resource behaves in ways which integrity checking cannot account for. If the resource contains these headers, it is ineligible for integrity validation:

  • Authorization or WWW-Authenticate hide resources behind a login; such non-public resources are excluded from integrity checks.
  • Refresh can cause IFrame contents to transparently redirect to an unintended target, bypassing the integrity check.
Issue 3

Consider the impact of other headers: Content-Length, Content-Range, etc. Is there danger there?

The following algorithm details these restrictions:

  1. Let request be the request that fetched resource.
  2. If resource contains any of the following HTTP headers, return false:
    • Authorization
    • WWW-Authenticate
    • Refresh
  3. If the mode of request is CORS, return true.
  4. If the origin of request is resource’s origin, return true.
  5. If resource is cachable by a shared cache, as defined in [RFC7234], return true.
  6. Return false.
Note

Step 3 returns true if the resource was a CORS-enabled request. If the resource failed the CORS checks, it won’t be available to us for integrity checking because it won’t have loaded successfully.

3.3.3 Parse metadata.

This algorithm accepts a string, and returns either no metadata, or a set of valid hash expressions whose hash functions are understood by the user agent.

  1. If metadata is the empty string, return no metadata.
  2. Let result be the empty set.
  3. For each token returned by splitting metadata on spaces:
    1. If token is not a valid metadata, skip the remaining steps, and proceed to the next token.
    2. Let algorithm be the alg component of token.
    3. If algorithm is a hash function recognized by the user agent, add token to result.
  4. Return no metadata if result is empty, otherwise return result.

3.3.4 Get the strongest metadata from set.

  1. Let result be the empty set and strongest be the empty string.
  2. For each item in set:
    1. If result is the empty set, add item to result and set strongest to item, skip to the next item.
    2. Let currentAlgorithm be the alg component of strongest.
    3. Let newAlgorithm be the alg component of item.
    4. If the result of getPrioritizedHashFunction(currentAlgorithm, newAlgorithm) is the empty string, add item to result. If the result is newAlgorithm, set strongest to item, set result to the empty set, and add item to result.
  3. Return result.

3.3.5 Does resource match metadataList?

  1. If resource’s URL’s scheme is about, return true.
  2. If resource is not eligible for integrity validation, return false.
  3. Let parsedMetadata be the result of parsing metadataList.
  4. If parsedMetadata is no metadata, return true.
  5. Let metadata be the result of getting the strongest metadata from parsedMetadata.
  6. For each item in metadata:
    1. Let algorithm be the alg component of metadata.
    2. Let expectedValue be the val component of metadata.
    3. Let actualValue be the result of applying algorithm to resource.
    4. If actualValue is a case-sensitive match for expectedValue, return true.
  7. Return false.

This algorithm allows the user agent to accept multiple, valid strong hash functions. For example, a developer might write a script element such as:

<script src="https://foobar.com/content-changes.js"
        integrity="sha256-C6CB9UYIS9UJeqinPHWTHVqh/E1uhG5Twh+Y5qFQmYg=
                   sha256-qznLcsROx4GACP2dm0UCKCzCG+HiZ1guq6ZZDob/Tng=">

which would allow the user agent to accept two different content payloads, one of which matches the first SHA256 hash value and the other matches the second SHA256 hash value.

Note

User agents may allow users to modify the result of this algorithm via user preferences, bookmarklets, third-party additions to the user agent, and other such mechanisms. For example, redirects generated by an extension like HTTPSEverywhere could load and execute correctly, even if the HTTPS version of a resource differs from the HTTP version.

3.4 Modifications to Fetch

The Fetch specification should contain the following modifications in order to enable the rest of this specification’s work [FETCH]:

  1. The following text should be added to section 2.1.4: “A request has an associated integrity metadata. Unless stated otherwise, a request’s integrity metadata is the empty string.”

  2. The following text should be added to section 2.1.5: “A response has an associated integrity state, which is one of indeterminate, pending, corrupt, and intact. Unless stated otherwise, it is indeterminate.

  3. Perform the following steps before executing both the “basic fetch” and “CORS fetch with preflight” algorithms:

    1. If request’s integrity metadata is the empty string, set response’s integrity state to indeterminate. Otherwise:

      1. Set response’s integrity state to pending.
      2. Include a Cache-Control header whose value is “no-transform”.
  4. Add the following step before step #1 of the handling of 401 status codes in the HTTP fetch algorithm:

    1. If request’s integrity state is pending, set response’s integrity state to corrupt and return response.
  5. Before firing the process request end-of-file event for any request:

    1. If the request’s integrity metadata is the empty string, set the response’s integrity state to indeterminate and skip directly to firing the event.

    2. If response matches the request’s integrity metadata, set the response’s integrity state to intact and skip directly to firing the event.

    3. Set the response’s integrity state to corrupt and skip directly to firing the event.

3.5 Verification of HTML document subresources

A variety of HTML elements result in requests for resources that are to be embedded into the document, or executed in its context. To support integrity metadata for each of these, and new elements that are added in the future, a new integrity attribute is added to the list of content attributes for the link and script elements.

A corresponding integrity IDL attribute which reflects the value each element’s integrity content attribute is added to the HTMLLinkElement and HTMLScriptElement interfaces.

Note

A future revision of this specification is likely to include SRI support for all possible subresources, i.e., a, audio, embed, iframe, img, link, object, script, source, -track, and video elements.

3.6 The integrity attribute

The integrity attribute represents integrity metadata for an element. The value of the attribute MUST be either the empty string, or at least one valid metadata as described by the following ABNF grammar:

integrity-metadata = *WSP hash-with-options *( 1*WSP hash-with-options ) *WSP / *WSP
hash-with-options  = hash-expression *("?" option-expression)
option-expression  = option-name "=" option-value
option-name        = 1*option-name-char
option-name-char   = ALPHA / DIGIT / "-"
option-value       = *option-value-char
option-value-char  = ALPHA / DIGIT / "-" / "+" / "." / "/"
hash-algo          = <hash-algo production from [Content Security Policy Level 2, section 4.2]>
base64-value       = <base64-value production from [Content Security Policy Level 2, section 4.2]>
hash-expression    = hash-algo "-" base64-value

The integrity IDL attribute must reflect the integrity content attribute.

option-expressions are associated on a per hash-expression basis and are applied only to the hash-expression that immediately precedes it.

Note

At the moment, no option-expressions are defined. However, future versions of the spec make define options, such as MIME types [MIMETYPE].

3.7 Element interface extensions

3.7.1 HTMLLinkElement

partial interface HTMLLinkElement {
                attribute DOMString integrity;
};
3.7.1.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute

3.7.2 HTMLScriptElement

partial interface HTMLScriptElement {
                attribute DOMString integrity;
};
3.7.2.1 Attributes
integrity of type DOMString,
The value of this element’s integrity attribute

3.8 Handling integrity violations

Documents may specify the behavior of a failed integrity check by delivering a Content Security Policy which contains an integrity-policy directive, defined by the following ABNF grammar:

directive-name  = "integrity-policy"
directive-value = 1#failure-mode
failure-mode    = ( "block" / "report" )

A document’s integrity policy is the value of the integrity-policy directive, if explicitly provided as part of the document’s Content Security Policy, or block otherwise.

If the document’s integrity policy contains block, the user agent MUST refuse to render or execute resources that fail an integrity check, and MUST report a violation.

If the document’s integrity policy contains report, the user agent MAY render or execute resources that fail an integrity check, but MUST report a violation.

3.9 Elements

3.9.2 The script element

When executing step 5 of step 14 of HTML5’s “prepare a script” algorithm:

  1. Set the integrity metadata of the request to the value of the element’s integrity attribute.

Insert the following steps after step 5 of step 14 of HTML5’s “prepare a script” algorithm:

  1. Once the fetching algorithm has completed:
    1. If the response’s integrity state is corrupt:
      1. If the document’s integrity policy is block:
        1. If resource is same origin with the script element’s Document’s origin, then queue a task to fire a simple event named error at the element, and abort these steps.
      2. Report a violation.

3.10 Verification of CSS-loaded subresources

Issue 13

Tab and Anne are poking at adding fetch() to some spec somewhere which would allow CSS files to specify various arguments to the fetch algorithm while requesting resources. Detail on the proposal is at https://lists.w3.org/Archives/Public/public-webappsec/2014Jan/0129.html. Once that is specified, we can proceed defining an integrity argument that would allow integrity checks in CSS.

4. Proxies

Optimizing proxies and other intermediate servers which modify the content of fetched resources MUST ensure that the digest associated with those resources stays in sync with the new content. One option is to ensure that the integrity metadata associated with resources is updated along with the resource itself. Another would be simply to deliver only the canonical version of resources for which a page author has requested integrity verification. To support this latter option, user agents MUST send a Cache-Control header with a value of no-transform when requesting a resource with associated integrity metadata (see item 3 in the “Modifications to Fetch” section).

Issue 17

Think about how integrity checks would effect vary headers in general.

5. Security Considerations

5.1 Unprivileged contexts remain unprivileged

Integrity metadata delivered to an unprivileged context, such as an unprivileged document, only protects an origin against a compromise of the server where an external resources is hosted. Network attackers can alter the digest in-flight (or remove it entirely, or do absolutely anything else to the document), just as they could alter the resource the hash is meant to validate. Thus, authors SHOULD deliver integrity metadata only to a privileged document. See also securing the web.

5.2 Hash collision attacks

Digests are only as strong as the hash function used to generate them. User agents SHOULD refuse to support known-weak hashing functions like MD5 or SHA-1, and SHOULD restrict supported hashing functions to those known to be collision-resistant. At the time of writing, SHA-256 is a good baseline. Moreover, user agents SHOULD reevaluate their supported hashing functions on a regular basis, and deprecate support for those functions shown to be insecure.

5.3 Cross-origin data leakage

Attackers can determine whether some cross-origin resource has certain content by attempting to load it with a known digest, and watching for load failure. If the load fails, the attacker can surmise that the resource didn’t match the hash, and thereby gain some insight into its contents. This might reveal, for example, whether or not a user is logged into a particular service.

Moreover, attackers can brute-force specific values in an otherwise static resource: consider a JSON response that looks like this:

Example 5
{'status': 'authenticated', 'username': 'Stephan Falken'}

An attacker can precompute hashes for the response with a variety of common usernames, and specify those hashes while repeatedly attempting to load the document. By examining the reported violations, the attacker can obtain a user’s username.

User agents SHOULD mitigate the risk by refusing to fire error events on elements which loaded cross-origin resources, but some side-channels will likely be difficult to avoid (image’s naturalHeight and naturalWidth for instance).

6. Acknowledgements

None of this is new. Much of the content here is inspired heavily by Gervase Markham’s Link Fingerprints concept, as well as WHATWG’s Link Hashes.

A special thanks to Mike West of Google, Inc. for his invaluable contributions to the initial version of this spec.

A. References

A.1 Normative references

[ABNF]
D. Crocker, Ed.; P. Overell. Augmented BNF for Syntax Specifications: ABNF. January 2008. Internet Standard. URL: https://tools.ietf.org/html/rfc5234
[CORS]
Anne van Kesteren. Cross-Origin Resource Sharing. 16 January 2014. W3C Recommendation. URL: http://www.w3.org/TR/cors/
[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[MIMETYPE]
Ned Freed; Nathaniel S. Borenstein. Multipurpose Internet Mail Extensions (MIME) Part Two: Media Types. Draft Standard. URL: https://tools.ietf.org/html/rfc2046
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[RFC4648]
Simon Josefsson. The Base16, Base32, and Base64 Data Encodings. Proposed Standard. URL: https://tools.ietf.org/html/rfc4648
[RFC6454]
A. Barth. The Web Origin Concept. December 2011. Proposed Standard. URL: https://tools.ietf.org/html/rfc6454
[RFC7230]
R. Fielding, Ed.; J. Reschke, Ed.. Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing. June 2014. Proposed Standard. URL: https://tools.ietf.org/html/rfc7230
[RFC7231]
R. Fielding, Ed.; J. Reschke, Ed.. Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content. June 2014. Proposed Standard. URL: https://tools.ietf.org/html/rfc7231
[RFC7234]
R. Fielding, Ed.; M. Nottingham, Ed.; J. Reschke, Ed.. Hypertext Transfer Protocol (HTTP/1.1): Caching. June 2014. Proposed Standard. URL: https://tools.ietf.org/html/rfc7234