Both v0.1 and the current version of the OO API allow the key system to be specified at any point after loading has started. They even require that the key system not be chosen or associated, respectively, until after loading has started. As a result, it is very likely that the key system will be set AFTER some frames have been decoded. For key systems/CDMs that include a decoder (or entire video pipeline), this means that the decoder may change when the key system is selected. This is primarily (only?) a problem if there is some clear content being decoded/played before the encrypted content so decoding could start before a key system was specified.
Supporting decoder changes may be very difficult to implement, especially without restrictions on the content. For example, the switch might occur in the middle of a group of frames that depend on previous frames to decode subsequent frames. While this could be worked around, it might be better to avoid this situation completely.
1) Require the key system to be specified before loading and/or decoding starts. If it is not specified by this time, it cannot be set later, meaning decryption would not be possible. This would likely reduce the utility of the needkey event.
2) Allow switching decoders whenever Media Source Extensions allow changing codecs/decoders.
3) Switch decoders when the first encrypted frame is encountered, possibly requiring the next item.
4) Establish limitations on which frames can be encrypted. For example, P-and B-frames may not be encrypted if the I-frame is not. Due to seeking, this would apply throughout a stream and not just to the beginning.
5) Switch immediately and drop frames if necessary.
6) Suggest the above to applications and make it a quality of implementation issue for applications.
7) Leave the behavior undefined, making it a quality of implementation issue for user agents.
There are some implicit assumptions above.
The first is that once you select a key system (and thus decoder), you cannot change it. In other words, you cannot use multiple key systems during the life (between loads) of a media element.
Also, to be clear, unencrypted content is supported even after selecting a key system. This means there is a requirement that CDM decoders can also decode clear/unencryped content.
We think we should add informative text that suggests applications should create a MediaKeys object and call setMediaKeys before setting the media source if they want to signal to the media engine that a protected pipeline is desired. If the application does not do this then this should be left as a quality of implementation issue.
Resolution of bug 19009 may address this. It should at least consider it.
For the record, this was discussed on the email list last fall: http://lists.w3.org/Archives/Public/public-html-media/2012Oct/0001.html
Discussed on the telcon, the current spec address this issue.
I don't believe the current spec text addresses this issue.
Adrian proposed the following in comment #2. Is this no longer the intended solution?
> We think we should add informative text that suggests applications should
> create a MediaKeys object and call setMediaKeys before setting the media
> source if they want to signal to the media engine that a protected pipeline
> is desired. If the application does not do this then this should be left as
> a quality of implementation issue.
The fix for bug 19009 notes that events may not be fired at MediaKeySession objects until setMediaKeys() is called, but this is not the same as Adrian's suggestion.
The current setMediaKeys() algorithm text contains the following statement, which is incompatible with this solution:
"In general, applications should wait for an event named needkey or loadstart (per the resource fetch algorithm) before calling this method."
This text is left over from generateKeyRequest(). I think we can eliminate it since HTMLMediaElement implementations should be able to store a reference to the MediaKeys even if the MediaPlayer is not initialized.
1) Should we add text similar to Adrian's proposal?
2) Any reason not to remove the text left over from generateKeyRequest()?
In addition to the non-normative text mentioned above, the setMediaKeys() algorithm contains the following normative step, which we should also remove:
"If loading has not started, throw an INVALID_STATE_ERR exception and abort these steps."
If we do recommend that setMediaKeys() be called before setting the source, we may want to reconsider whether to add the 'keySystem' attribute to HTMLSourceElement (issue 20336). Using <source> with 'keySystem' would contradict this recommendation, so I don't think it makes sense to make user agents implement this unlikely-to-be-used feature.
Updated the spec with changes from comments 6 and 7:
David will file a new bug for comment 8.
The new bug to address the contradiction is bug 23828.