26838 – Normatively address vulnerabilities related to initData contained in media data

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26838 - Normatively address vulnerabilities related to initData contained in media data

Summary: Normatively address vulnerabilities related to initData contained in media data

Status:	RESOLVED FIXED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	Encrypted Media Extensions (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	David Dorwin
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:	Security
Keywords:

Depends on:	26332 27093
Blocks:
	Show dependency tree / graph

Reported:	2014-09-17 19:50 UTC by David Dorwin
Modified:	2015-04-21 23:52 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description David Dorwin 2014-09-17 19:50:22 UTC

Though not necessary, the EME model assumes that the initData passed to generateRequest() is the "encrypted" event’s initData, which the UA parsed from the media data. This presents an attack vector that is not addressed by other mitigations, including CORS and secure origins (bug 26332).

Specifically, CDMs parse the initData and many CDM implementations are likely to be a) closed source and b) running with more privileges than the user agent's renderer. (See also [1].) In addition, most existing DRM implementations use proprietary, closed, and/or opaque formats (mainly PSSH boxes), which may not be publicly inspected. They may also use formats, such as XML, that may require complex parsers in the CDM.


The EME requirement that the "media data [must be] CORS-same-origin" [2] for the parsed initData to be provided to the application only protects the media data. It does not protect the user or CDM in any way - an attacker's server would allow all origins. As a result, malicious media data may end up being passed to a "trusted" application that will then provide it to the CDM. (This is especially a concern in the “federated” case.)  This is an even more serious problem if one were to rely on secure origins to safeguard the user/CDM.

The concern can be broken into the following categories:
1) Non-secure application origin
 1) Network attacker (attack #13 in [3])
 b) User navigates (attack #9 in [3])
2) Application has a secure origin, but:
 a) Not all media data is blocked in the mixed content case (see below), reducing this case to #1
 b) Media data is retrieved from a third-party server (i.e. the federated case)

1a) This can be prevented by enforcing secure origins (bug 26332).

1b) This may be mitigated by enforcing secure origins (bug 26332), especially if the user agent requires permissions, but it may also need additional mitigations (as with 2b).

2a)
Per Mixed Content [4], for applications using HTTPS:
* The .src= case ("Optionally-blockable Content") would allow initData from anywhere to be provided to the CDM.
* The MSE case would allow initData from any HTTPS origin to be provided to the CDM.

We could explicitly close the .src mixed content loophole by preventing initData from being populated in the mixed content case in the same way we enforce CORS. That would at least address the network attacker case. (In addition to #13 in [3], it also addresses #14 and #15.)

2b)
That leaves the case where an attacker has an HTTPS server and somehow convinces the application or user to load media data from that server. This is probably unlikely for a case like Netflix, but it could happen in a federated scenario. Also, the lure of elevated access and a vulnerable CDM could be enough to motivate someone to find an exploit in a content provider's site.

[1] The paragraph starting with "Note: Unsandboxed CDMs..." at https://dvcs.w3.org/hg/html-media/raw-file/default/encrypted-media/encrypted-media.html#security
[2] Step 3 of https://dvcs.w3.org/hg/html-media/raw-file/default/encrypted-media/encrypted-media.html#algorithms-initdata-encountered
[3] https://www.w3.org/Bugs/Public/show_bug.cgi?id=26332#c70
[4] https://w3c.github.io/webappsec/specs/mixedcontent/

Comment 1 David Dorwin 2014-09-17 19:51:37 UTC

Potential mitigations:
1) Treat Optionally-blockable [mixed] Content media data as not CORS-same-origin for the purposes of determining ([2] above) whether to provide initData in the "encrypted" event.
2) Update the generateRequest() algorithm to have the user agent validate and/or sanitize (possibly by pre-parsing and sanitizing) the |initData| and pass a verified/sanitized version to the CDM.

I think #1 is reasonable (regardless of the outcome of bug 26332). This simply brings .src= media data to the same level as MSE media data. The Optionally-blockable Content category only exists to avoid breaking existing web pages, which is not a concern for EME. As noted above, this addresses (network-based) attacks #13, #14, and #15 in [3] above.

#2 is consistent with the security considerations in [1] above and good practices for passing "user data" across security boundaries. As noted in [3] above, this is "[analogous] to browsers validating WebGL shaders before passing them to a shader compiler whose bugs aren't under the control of the browser vendor."

Comment 2 David Dorwin 2014-09-19 23:31:29 UTC

https://dvcs.w3.org/hg/html-media/rev/f18f378041a2 implements #1 in comment #1.
https://dvcs.w3.org/hg/html-media/rev/4642f0f6d841 implements #2 in comment #1.
https://dvcs.w3.org/hg/html-media/rev/c64c7311ade3 adds checks similar to the other methods that pass data from the application to the CDM - load() and update().

Comment 3 Joe Steele 2014-09-23 14:22:57 UTC

(In reply to David Dorwin from comment #1)
> Potential mitigations:
> 1) Treat Optionally-blockable [mixed] Content media data as not
> CORS-same-origin for the purposes of determining ([2] above) whether to
> provide initData in the "encrypted" event.
> 2) Update the generateRequest() algorithm to have the user agent validate
> and/or sanitize (possibly by pre-parsing and sanitizing) the |initData| and
> pass a verified/sanitized version to the CDM.
> 
> I think #1 is reasonable (regardless of the outcome of bug 26332). This
> simply brings .src= media data to the same level as MSE media data. The
> Optionally-blockable Content category only exists to avoid breaking existing
> web pages, which is not a concern for EME. As noted above, this addresses
> (network-based) attacks #13, #14, and #15 in [3] above.
> 
> #2 is consistent with the security considerations in [1] above and good
> practices for passing "user data" across security boundaries. As noted in
> [3] above, this is "[analogous] to browsers validating WebGL shaders before
> passing them to a shader compiler whose bugs aren't under the control of the
> browser vendor."

Can you detail more how you would see this working? 

Here are some roadblocks I see to this:

1) The PSSH formats are often proprietary and confidential. I can imagine the documentation being made available to the UA implementer but I am not sure how validation would be possible for UAs which are open source (or mostly open source) without revealing those formats .

2) The initData is often encrypted. It seems unlikely to me that those keys would be provided to the UA other than in the context of the CDM itself. Passing the encrypted data to the CDM for decryption prior to "cleaning" seems like it defeats the purpose of the cleaning. 

3) The initData is often signed. It is likely that modifications to the initData (e.g. removing data perceived as "bad") would invalidate signatures on the initData.

Comment 4 David Dorwin 2014-09-24 00:08:08 UTC

(In reply to Joe Steele from comment #3)
> (In reply to David Dorwin from comment #1)
> > Potential mitigations:
> > 1) Treat Optionally-blockable [mixed] Content media data as not
> > CORS-same-origin for the purposes of determining ([2] above) whether to
> > provide initData in the "encrypted" event.
> > 2) Update the generateRequest() algorithm to have the user agent validate
> > and/or sanitize (possibly by pre-parsing and sanitizing) the |initData| and
> > pass a verified/sanitized version to the CDM.
> > 
> > I think #1 is reasonable (regardless of the outcome of bug 26332). This
> > simply brings .src= media data to the same level as MSE media data. The
> > Optionally-blockable Content category only exists to avoid breaking existing
> > web pages, which is not a concern for EME. As noted above, this addresses
> > (network-based) attacks #13, #14, and #15 in [3] above.
> > 
> > #2 is consistent with the security considerations in [1] above and good
> > practices for passing "user data" across security boundaries. As noted in
> > [3] above, this is "[analogous] to browsers validating WebGL shaders before
> > passing them to a shader compiler whose bugs aren't under the control of the
> > browser vendor."
> 
> Can you detail more how you would see this working? 

As discussed in the telecon today, there is no normative text about *how* to validate and/or sanitize, but there are non-normative guidelines about possible ways to accomplish this.
> 
> Here are some roadblocks I see to this:
> 
> 1) The PSSH formats are often proprietary and confidential. I can imagine
> the documentation being made available to the UA implementer but I am not
> sure how validation would be possible for UAs which are open source (or
> mostly open source) without revealing those formats .

This is not ideal and also makes it difficult to package content. User agents should consider this when considering whether to support a CDM.

> 2) The initData is often encrypted. It seems unlikely to me that those keys
> would be provided to the UA other than in the context of the CDM itself.
> Passing the encrypted data to the CDM for decryption prior to "cleaning"
> seems like it defeats the purpose of the cleaning. 

It did not sound like there was a concrete scenario of the entire PSSH Data box being encrypted. (The hypothetical scenarios discuss would be better solved in other ways, such as encryption in transit.) Thus, the structure and many of the fields can still be validated.

> 3) The initData is often signed. It is likely that modifications to the
> initData (e.g. removing data perceived as "bad") would invalidate signatures
> on the initData.

This prevents sanitizing or parsing and regenerating, but many other types of validation can be performed.

Comment 5 David Dorwin 2014-10-17 18:17:59 UTC

Of the concerns identified in comment #0:
* #1 depends on bug 26332.
* #2a is normatively addressed by the changes to the Initialization Data Encountered in https://dvcs.w3.org/hg/html-media/rev/f18f378041a2.
* #2b is somewhat addressed by the new normative step in generateRequest(), but the actual validation and sanitization are not normatively specified. Fixing this depends on bug 27093.

Comment 6 David Dorwin 2014-10-17 19:25:02 UTC

https://dvcs.w3.org/hg/html-media/rev/7f45ba26e755 makes initData validation and sanitization normative, which should address #2b.

The remaining issue is #1, which requires implementation of bug 26332.

Comment 7 Joe Steele 2015-01-12 21:26:33 UTC

(In reply to David Dorwin from comment #6)
> https://dvcs.w3.org/hg/html-media/rev/7f45ba26e755 makes initData validation
> and sanitization normative, which should address #2b.
> 
> The remaining issue is #1, which requires implementation of bug 26332.

I think this may need to be revisited after resolving bug 27093.

Comment 8 David Dorwin 2015-04-21 23:52:28 UTC

(In reply to David Dorwin from comment #6)
> https://dvcs.w3.org/hg/html-media/rev/7f45ba26e755 makes initData validation
> and sanitization normative, which should address #2b.
This was related to bug 27093.
> 
> The remaining issue is #1, which requires implementation of bug 26332.

Bug 27093 and bug 26332 are now fixed, so I think we can close this bug.