Bugzilla – Bug 19788
What, if any, event should be fired when no key is available to decrypt the block?
Last modified: 2014-01-18 01:48:03 UTC
In the current draft, the Encrypted Block Encountered algorithm  may fire a needkey event if the required key is not available. For ISO BMFF/CENC^ and WebM, the needed keys can always be identified based on headers, so no new information should be available during the decrypt/decode phase that wasn't available when the user agent determined that the stream may be encrypted (Potentially Encrypted Stream Encountered algorithm ) and sent a needkey event. Thus, multiple identical events may be sent in certain valid scenarios. This issue is tracking whether we should have a different behavior in the case that a required key is unavailable. (Note that this is not a MediaError because there are legitimate use cases where a key may have been requested as a result of the Potentially Encrypted Stream Encountered event but has not yet been received by the time the key is needed.)
* Report a "needkey" event on the media element.
* Report a different event on the media element.
* Report an event on the MediaKeySession element if one has been associated with the media element.
* Report no event and assume the event from the Potentially Encrypted Stream Encountered algorithm is sufficient.
- This works for ISO BMFF/CENC and WebM and other container formats with “headers” but would not work for a container where key IDs can appear in blocks but not identified based on “headers”. One option in those cases would be to consider this the “may contain” case, which should then be renamed to reference the first time a key reference is encountered. (More on this below.)
If we do send events in this case, how many events should be sent for the same key ID? Since audio and video (or multiple streams of each) may be processed at different times or on different threads, it's possible that an event could be fired for each even if they use the same key ID. Should we allow, prevent, or discourage this? (There is a similar issue for the Potentially Encrypted Stream Encountered algorithm when multiple files/streams/tracks contain the same Initialization Data.)
One option might be to change how we think about the two scenarios. This might make implementations more difficult but would resolve some of these issues.
* The first algorithm would be “First Time a Key Reference is Encountered.” Each container would need to specify what this means. For example, ISO BMFF/CENC might define this as encountering a PSSH even if the PSSH does not explicitly reference a key. For WebM, this might be when ContentEncryption/ContentEncKeyID is parsed. For a container without such headers, it might be the first time each key ID is encountered (i.e. in a block).
* The second algorithm would continue to be "Encrypted Block Encountered" with the change that the Key Presence step does not fire an event or an error in the case where the needed key is not available. Note that this may or may not occur at the same time as the first reference to a specific key (the first algorithm).
The first algorithm would be the only one that sends an event, and the second one would describe the behavior of playback (see bug 18515). Applications would not be informed that a key is needed for decrypting a current block. They shouldn’t really need to know for key-related reasons, but are there other reasons? Would/should existing events (i.e. stalled?) cover any such needs?
Follow-up: Should the event for the Encrypted Block Encountered algorithm  contain Initialization Data or the key ID?
If we choose to report an event, we need to decide what data to report in the event.
Step 7 of  says to fire a needkey event where "initData = block initData". "block initData" was set in step 4, which says, "If the block (or its parent entity) has Initialization Data, let block initData be that initialization data."
The problem is that Initialization Data may not be readily available when decrypting. Instead, the key ID is generally what is known for a given block.
Which of the following should we specify?
1) The needkey event contain the Initialization Data, which can be sent to the server just like it can for the Potentially Encrypted Stream Encountered algorithm . This has implementation overhead.
2) The key ID of the current block. This is easier to implement but inconsistent with the Potentially Encrypted Stream Encountered algorithm  and may not be useful for obtaining a key. This option is probably better if we have a separate even name and/or fire it at different objects.
[These footnotes apply to all three updates through this one.]
^ Is this true for CENC, even in use cases case that involve key rotation?
(In reply to comment #0)
> For ISO BMFF/CENC^ and
> WebM, the needed keys can always be identified based on headers,
For CENC 2012, there's no specified coherence between PSSH boxes and tenc/senc KIDs. It's certainly expected that the PSSH boxes contain the information necessary to obtain the keys, but there is no guidance or guarantee offered anywhere to this effect.
(The 2011 drafts of CENC extend 14496-12:2008, so this problem was much less severe. There's still quite a bit of straw-man mischief one could get up to, but there weren't that many legitimate reasons to do something twisted. 14496-12:2012 allows sample groups in fragments, which blows the barn doors off.)
> 2) The key ID of the current block. This is easier to implement but
> inconsistent with the Potentially Encrypted Stream Encountered algorithm
>  and may not be useful for obtaining a key. This option is probably
> better if we have a separate even name and/or fire it at different objects.
With my author hat on, I favor this. It solves at least one problem we have already had to hack around (keeping content permissions and PSSH atoms in sync, with code to defend against drift present at every level) by allowing the client or server to synthesize missing boxes on demand.
I also favor it from a spec point of view. Because of CENC's choice in making PSSHs completely devoid of specified spatial/temporal correspondence with KIDs, there really are two separate kinds of data at the format level. IMO, either we should marshal these two and essentially amend the CENC spec via the format-specific guidelines to require such a correspondence, or we should be honest about the underlying discord and inform the client explicitly.
There are two subcases:
a) The player encounters some new information in the stream that indicates that a previously unseen KeyId is needed
b) The player encounters some media encrypted using a key it does not have, but for which it already has initData
For (b), I think the CDM should just send a keymessage.
For (a), we need to understand how we want to handle this 'new information', which we could call new initData. I see two options:
(a)(i) assume the CDM just handles it internally, putting it into case (b)
(a)(ii) require it to be sent up to application, like the original initData
This 'subsequent initData' differs from the initial initData because the keysystem has already been selected and it's possibly more embedded in the stream (rather than being in some kind of initialization segment). So (a)(i) should be possible and makes things rather simple.
On the other hand, for initial initData we have the possibility for the application to process or even construct this. Do we want that possibility for subsequent initData as well ?
If yes, then the next question is whether this should be dealt with inside the existing MediaKeySession or whether another one should be constructed, or whether this should be up to the CDM.
(a)(ii)(1) [same session] We need to fire a needkey-like event on the same session and have a new method on the session to add initData
(a)(ii)(2) [different session] We fire a needkey event and the app creates a new session
(a)(ii)(3) [CDM decides] Both of the above are supported
I think I have largely just enumerated the options in the comments above. My preference for (a) is to support (a)(i) and (a)(ii)(2).
(In reply to comment #4)
> There are two subcases:
> a) The player encounters some new information in the stream that indicates
> that a previously unseen KeyId is needed
> b) The player encounters some media encrypted using a key it does not have,
> but for which it already has initData
I think (b) is difficult to determine. For example, nothing guarantees that all key IDs are specified in the PSSH. I also think it is an (admittedly minor) implementation burden to have to check whether you have seen a PSSH (or equivalent structure) for a key ID. Theoretically, CDMs may not keep the PSSH around or even know how to parse all of it.
> For (b), I think the CDM should just send a keymessage.
What type of keymessage? Would it be key system-specific.
> For (a), we need to understand how we want to handle this 'new information',
> which we could call new initData. I see two options:
Is the 'new information' format a key ID or the same Initialization Data (initData) format used elsewhere? I think Initialization Data should always be the same format for a given container. Thus, if it's a key ID, we should call it something else.
> (a)(i) assume the CDM just handles it internally, putting it into case (b)
What does it mean to be put into case (b)?
> (a)(ii) require it to be sent up to application, like the original initData
> This 'subsequent initData' differs from the initial initData because the
> keysystem has already been selected and it's possibly more embedded in the
> stream (rather than being in some kind of initialization segment). So (a)(i)
> should be possible and makes things rather simple.
> On the other hand, for initial initData we have the possibility for the
> application to process or even construct this. Do we want that possibility
> for subsequent initData as well ?
If this data is defined as a key ID, then an application can construct it, though I'm not sure what the use case is since createSession() takes a different type of data. I think it's more likely that it would be sent to the server to get a new key for the ID.
(In reply to comment #2)
> Step 7 of  says to fire a needkey event where "initData = block
> initData". "block initData" was set in step 4, which says, "If the block (or
> its parent entity) has Initialization Data, let block initData be that
> initialization data."
> The problem is that Initialization Data may not be readily available when
> decrypting. Instead, the key ID is generally what is known for a given block.
Bug 20552 has been filed to fix this text.
We should definitely not send different types of data to the same event. That means we need a new event if we are going to send the key ID.
Unless we guarantee that all keys can be determined from the Initialization Data (CENC does not), we can't guarantee that the user agent knows which, if any, MediaKeySession to fire the event at. The event would either need to be fired at the HTMLMediaElement or the MediaKeys object.
Since CreateSession() takes a specific Initialization Data format, any other data format (i.e. key id) provided in the new event could not be used to create a new session. It could only be used to tell the application/server that the key has not been provided. The reply would need to be a new license (for an existing session) containing that key.
Unless there is a good use case, I propose that we go with no event. An event can always be added later if we find that it would be useful (it is easier to add something than remove it), but currently we don't know what the event should include or what it should be fired at.
Assuming this decision, the change in Comment 1 probably also makes sense.
Note also that this is an abnormal condition, not something an application should expect during normal playback. In most instances, the user will experience some type of pause or skip in playback if this occurs.
Notes from the March 12th telecon (http://www.w3.org/2013/03/12-html-media-minutes.html#item05):
Currently, there is a needkey event for hitting initdata
and another needkey if you need a key to decrypt the current frame
. For the second one, there is not much the app can do - it is really an error condition
We assume that if the app gets to this point with no key, the app has received an earlier event telling it that a key is needed
and we assume the app is already working on acquiring it.
We decided to delete the firing of a needkey event on encrypted block encountered with no key and merge in comment 1
I am having a little trouble following the recommendation at this point. Please bear with me.
If the app determines that another key will be needed at some point in the future (for example from information in the manifest) is it allowed to start a second session to kick off the acquisition of that key? Or is it required to wait for the "Encrypted Block Encountered" algorithm to kick in?
If the latter -- that is a problem when key acquisition takes any amount of time.
(In reply to comment #11)
> If the app determines that another key will be needed at some point in the
> future (for example from information in the manifest) is it allowed to start
> a second session to kick off the acquisition of that key? Or is it required
> to wait for the "Encrypted Block Encountered" algorithm to kick in?
Yes, we discussed the necessity for multiple sessions. If you have initdata you should be able to call createSession with it. You definitely don't want to wait for playback to stall before performing the license acquisition.
I updated the text per comment 1 and comment 10 in https://dvcs.w3.org/hg/html-media/rev/9dedfcd2e3a3
* Removed the second needkey event.
* Changed the name of 5.2
* Did not remove the reporting of MEDIA_ERR_ENCRYPTED. This will be addressed in bug 16857.
* Updated the 7.1 WebM section to specify when to call “First Time a Key Reference is Encountered.”
* 7.2 ISO Base Media File Format needs to be updated in bug 17673.