This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25218 - Allow license management directly via MediaKeySession
Summary: Allow license management directly via MediaKeySession
Status: RESOLVED WORKSFORME
Alias: None
Product: HTML WG
Classification: Unclassified
Component: Encrypted Media Extensions (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Adrian Bateman [MSFT]
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-03-31 20:48 UTC by Joe Steele
Modified: 2014-05-08 01:13 UTC (History)
4 users (show)

See Also:


Attachments

Description Joe Steele 2014-03-31 20:48:30 UTC
We have been having a lot of discussion about how key lifecycle should be managed. I will throw another proposal out there which can meet all the needs I have heard expressed so far. 

Here are the problems I see:
1) A session may contain more than just a single license.  
2) A license may be shared between content streams (e.g. domain licenses) 
3) A license may or may NOT be allowed to be persisted on the device
4) The application may want to manage licenses directly to the extent possible
5) The application may require a license release notification

Items 1 & 2 mean that managing the session itself is not good enough. The application cannot know what is in the "session" without key system specific logic that we are trying to avoid. 
Item 3 means that we must throw an error when the CDM does not support the behavior - either because it is incapable or because policy does not allow it.
Item 4 means we must have a way of identifying individual licenses in the session. 
Item 5 means we would like a way of identifying licenses as they are released.

I propose we add a license specific set of methods to the MediaKeySession:

void MediaKeySession::removeKey(DomString keyID);
This would remove the key identified by the keyID if present and send any required key notifications. It would return an error if the key is not present.

void MediaKeySession::loadKey(DomString keyID);
This would remove the key identified by the keyID if present into the current MediaKeySession. It would return an error if the key is not present.

void MediaKeySession::queryKey(DomString keyID);
This would create an opaque blob containing the key identified by the keyID if present. The blob would be returned as a message or an error would be returned if the key was not present. 

This would remove the need for the MediaKeySession.release() method. It would not remove the need for the "persistent/ephemeral" license type flag to be passed to createSession() since that would provide a useful error message.

I also think we should change Section 1.1.3. Key Session to reflect that license/key lifetime can now be managed directly.
Comment 1 Joe Steele 2014-03-31 20:59:35 UTC
Some cut/paste errors in my earlier submission. Let me try that again.

void MediaKeySession::loadKey(DomString keyID);
This would LOAD the key identified by the keyID if present into the current MediaKeySession. It would return an error if the key is not present.

void MediaKeySession::queryKey(DomString keyID);
This would create an opaque blob containing the key identified by the keyID if present. The blob would be returned ATTACHED to a message or an error would be returned if the key was not present.
Comment 2 Joe Steele 2014-03-31 21:01:05 UTC
This would also eliminate the need for loadSession I believe.
Comment 3 Joe Steele 2014-04-04 00:00:42 UTC
Some background information on this thread - http://lists.w3.org/Archives/Public/public-html-media/2014Apr/0019.html
Comment 4 Joe Steele 2014-04-04 16:24:42 UTC
I believe the text for Section 1.1.7 (https://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-media/encrypted-media.html#license) also needs to be updated to reflect that fact that non-content decryption keys may be contained in the license and may NOT have associated key IDs.
Comment 5 David Dorwin 2014-04-07 21:58:57 UTC
(In reply to Joe Steele from comment #0)
> We have been having a lot of discussion about how key lifecycle should be
> managed. I will throw another proposal out there which can meet all the
> needs I have heard expressed so far. 

Can you be more specific about the requirements this addresses? I'm assuming these are related to the key chaining and domain key scenarios.


EME is currently designed around key sessions, which represent "the lifetime of the license(s)/key(s) [they] contains and associates all messages related to them." This proposal seems to be a departure from that. An API could be designed around keys, licenses, or something else, but that is not currently the case with EME.

Some potential issues:
* Are loaded/queried keys really part of the session?
* What if they are associated with a persisted session ID?
* What if that key is contained in a license that contains other keys? Are they loaded to? What if only one of the keys is released?

Does this proposal really solve the underlying requirements or use cases? Are there ways to achieve the same goals without effectively adding a second set of APIs? If the application needs to know about the different keys anyway, why not put them in different sessions and figure out a way for them to interact?
Comment 6 Joe Steele 2014-04-07 22:58:11 UTC
(In reply to David Dorwin from comment #5)
> (In reply to Joe Steele from comment #0)
> > We have been having a lot of discussion about how key lifecycle should be
> > managed. I will throw another proposal out there which can meet all the
> > needs I have heard expressed so far. 
> 
> Can you be more specific about the requirements this addresses? I'm assuming
> these are related to the key chaining and domain key scenarios.

The current loadSession() and remove() APIs do not discriminate between content keys and non-content keys. This proposal should allow applications which need to load keys or remove keys to do so without impacting other keys that may be present as a side effect of requesting the content keys. 

> 
> EME is currently designed around key sessions, which represent "the lifetime
> of the license(s)/key(s) [they] contains and associates all messages related
> to them." This proposal seems to be a departure from that. An API could be
> designed around keys, licenses, or something else, but that is not currently
> the case with EME.

Yes. This proposal is attempting to rectify an issue arising from the current definition of key session, namely that content keys and non-content keys are lumped together. 

> 
> Some potential issues:
> * Are loaded/queried keys really part of the session?

Loaded keys would become part of the session they are loaded in. Queried keys may be part of the session, or may be available for loading into the session. 

> * What if they are associated with a persisted session ID?

I don't think the idea of a persisted session id is useful, so this is independent of that concept. 

> * What if that key is contained in a license that contains other keys? 

I would expect that a license provider which needed to release keys would not bundle keys corresponding to unrelated key IDs in the same license. 

> Are they loaded too?

If the license contains keys corresponding to multiple key IDs I would expect only the key corresponding to the key ID being loaded to become available for decryption. 

> What if only one of the keys is released?

I would expect releasing a key ID to be equivalent to releasing a license, which would free any keys in that license. 

> 
> Does this proposal really solve the underlying requirements or use cases?

I believe so. This allows for applications to release individual keys based on key ID and get appropriate notifications. This allows for non-content keys acquired during a session to be ignored by the application. 

> Are there ways to achieve the same goals without effectively adding a second
> set of APIs? 

This proposal replaces loadSession() with loadKey() and release() with releaseKey(). 

The only additional api would be queryKey() which would allow applications to manage key/license storage rather than delegating it to the CDM. It would be useful for the Ultraviolet use case, but could be handled separately if we decide not to support that use case.

I am open to other suggestions. 

> If the application needs to know about the different keys anyway, why not 
> put them in different sessions and figure out a way for them to interact?

The problem is that the session can also contain keys acquired in order to unwrap the content keys. Since the application does not know about these keys (and should not) the application has no way of knowing to acquire them in a separate session. I will send out some sample key flows as I see them which should explain the issue better.
Comment 7 Mark Watson 2014-04-28 23:05:33 UTC
I confess to not having followed the threads associated with this issue, but I'm going to give my opinion anyway ;-)

I believe that as far as the EME API is concerned, a MediaKeySession contains a content key or keys, each associated with a key id.

Concepts such as licenses and domains are not modeled in our API. I think the question here is do they need to be ? If they are not, this does not mean such concepts to do exist in any given CDM, it's just that they are not visible on the API and the CDM is left to do whatever it chooses with them.

Specifically, I would expect that if a license is a vehicle for communicating keys to the CDM, then any given license remains in the CDM until all the keys that it contains are released, unless we explicitly expect the keys to be persisted, in which case the license may be persisted. Likewise, if a DRM has a concept of 'domain' and licenses are associated with domains then the state associated with a domain needs to live for at least long as the licenses associated with it, but again could be persisted for longer than that if the CDM chooses.

This is not to say a CDM MUST operate in the way described above: all we say in EME is that there are keys and key ids and they live in sessions. Everything else is up to the CDM.
Comment 8 Joe Steele 2014-04-29 17:03:50 UTC
(In reply to Mark Watson from comment #7)
> I believe that as far as the EME API is concerned, a MediaKeySession
> contains a content key or keys, each associated with a key id.
> 
> Concepts such as licenses and domains are not modeled in our API. I think
> the question here is do they need to be ? If they are not, this does not
> mean such concepts to do exist in any given CDM, it's just that they are not
> visible on the API and the CDM is left to do whatever it chooses with them.

The question of whether to model licenses and domains in our API is mostly irrelevant to this issue. The issue I am raising can happen in the most basic use case - a single encrypted stream being played back across multiple browsers with license request every time. 

> 
> Specifically, I would expect that if a license is a vehicle for
> communicating keys to the CDM, then any given license remains in the CDM
> until all the keys that it contains are released, unless we explicitly
> expect the keys to be persisted, in which case the license may be persisted.
> Likewise, if a DRM has a concept of 'domain' and licenses are associated
> with domains then the state associated with a domain needs to live for at
> least long as the licenses associated with it, but again could be persisted
> for longer than that if the CDM chooses.
> 
> This is not to say a CDM MUST operate in the way described above: all we say
> in EME is that there are keys and key ids and they live in sessions.
> Everything else is up to the CDM.

By assuming that we can manage content keys effectively just by managing the session (which the loadSession() and release() appear to do), we are assuming that the session contains only content keys. If a CDM requires non-content keys delivered within the session to function in even the most basic capacity, then a CDM is forced to break compliance to function. 

My basic premise here is that there are REQUIRED keys other than content keys that get delivered to the CDM via the update() mechanism. 

I see four options:

1) We can allow for managing keys directly rather than sessions. That is my proposal above, which I do not believe introduces a lot of additional complexity. This would also allow support for domains and key chaining in CDMs that use them. 

2) We provide a separate set of APIs for downloading non-content keys. This data would not be part of the media playback session. This would not support key chaining or domains, since those would usually be delivered during a media playback session, but it would support bootstrapping keys delivered at first playback. I think this is a more complex mechanism. 

3) We allow the CDM to download keys either via a direct network connection or via some browser specific mechanism. In either case the key acquisition is not under control of the web application. This seems to be against the spirit of EME. It would also introduce a non-deterministic element into the playback from the applications point of view.

4) We can force some CDMs to behave in non-standard ways. This is the fallback position CDMs will use if nothing else can be agreed on. I am trying to avoid this as this will make it even harder to build cross-browser implementations.
Comment 9 David Dorwin 2014-04-29 21:37:05 UTC
(In reply to Joe Steele from comment #8)
> The question of whether to model licenses and domains in our API is mostly
> irrelevant to this issue. The issue I am raising can happen in the most
> basic use case - a single encrypted stream being played back across multiple
> browsers with license request every time. 

Can you give an example of how this can happen in the basic use case? What other type of key is there?

When you say "across multiple browsers with license request every time", do you mean different browsers, different browsing sessions, different browsing contexts, or something else?
Comment 10 Mark Watson 2014-04-29 22:11:32 UTC
Joe: there is one more option, that sessions contain content keys but they are not constrained only to contain content keys. It is just that content keys are the only kind that are visible in a keysystem-independent way on out API.

So there can be other state in the session besides content keys. There can also be state that CDM maintains which is not part of any session.

A message back from a license server can contain whatever the CDM needs to derive the content keys. This may include chains of keys or information that populates state outside the session (though, preferably, scoped within the origin).

Persisting the session would persist any other information the CDM has in that session. Releasing the session would release it. But releasing the session need not release all information which came from the license server, if some of that relates to some keysystem-specific scope which is outside any session, for example a domain.
Comment 11 David Dorwin 2014-04-29 22:18:41 UTC
(In reply to Mark Watson from comment #10)
That really eliminates any hopes of interoperability or expected behavior for the application. Also, if some of the state is not visible to the application, how can the application (or anyone) manage it?
Comment 12 Mark Watson 2014-04-29 22:25:48 UTC
(In reply to David Dorwin from comment #11)
> (In reply to Mark Watson from comment #10)
> That really eliminates any hopes of interoperability or expected behavior
> for the application. Also, if some of the state is not visible to the
> application, how can the application (or anyone) manage it?

The application doesn't need to manage it. The keysystem is managing it between client and server component.

I should say that my understanding is that any additional state is really just an optimization. We should assume that a CDM in a 'cold state' can always perform the necessary message exchanges with its server peer to get the content keys needed to decrypt the content. What I understood Joe was asking for was to avoid repeating exchanges every time when they can be done once and the resulting state persisted. The application doesn't even need to know this is happening.
Comment 13 Joe Steele 2014-04-29 22:59:43 UTC
(In reply to David Dorwin from comment #9)
> (In reply to Joe Steele from comment #8)
> > The question of whether to model licenses and domains in our API is mostly
> > irrelevant to this issue. The issue I am raising can happen in the most
> > basic use case - a single encrypted stream being played back across multiple
> > browsers with license request every time. 
> 
> Can you give an example of how this can happen in the basic use case? What
> other type of key is there?

I gave a pretty comprehensive list of key types here:
http://lists.w3.org/Archives/Public/public-html-media/2014Apr/0004.html

In this case the key type I am concerned about are app/device keys. 


> 
> When you say "across multiple browsers with license request every time", do
> you mean different browsers, different browsing sessions, different browsing
> contexts, or something else?

I am talking about the basic use case described on the wiki.
Comment 14 Joe Steele 2014-04-30 18:01:52 UTC
(In reply to Mark Watson from comment #12)
> (In reply to David Dorwin from comment #11)
> > (In reply to Mark Watson from comment #10)
> I should say that my understanding is that any additional state is really
> just an optimization. We should assume that a CDM in a 'cold state' can
> always perform the necessary message exchanges with its server peer to get
> the content keys needed to decrypt the content. What I understood Joe was
> asking for was to avoid repeating exchanges every time when they can be done
> once and the resulting state persisted. The application doesn't even need to
> know this is happening.

To say this is just an "optimization" is understating it quite a bit. In the same sense, taking a car on a 100mile trip is an optimization over walking. Both methods will get you there, but there is a huge difference in experience.

In the case where a web application is using a CDM for the first time, there may be bootstrapping keys unique to that origin that need to be downloaded. Why unique? Because in our earlier discussions on key sharing it was determined that keys should not be shared across origins. 

If the CDM is using software-based key hiding mechanisms, the bootstrapping process is slow. On the order of seconds. This is not a problem when it happens once, and can be managed to happen while the application is occupied with other things (like displaying video thumbnails for selection) but a HUGE problem if it happens on every download. 

This is the key exchange I do not want to repeat. This key exchange should be subject to the same-origin constraints all other CDM communication is. To me this implies it must go through the keyrequest/update() mechanism. Or we must introduce an alternate mechanism which looks essentially the same, but is specific to this purpose.
Comment 15 Mark Watson 2014-04-30 18:06:25 UTC
(In reply to Joe Steele from comment #14)
> (In reply to Mark Watson from comment #12)
> > (In reply to David Dorwin from comment #11)
> > > (In reply to Mark Watson from comment #10)
> > I should say that my understanding is that any additional state is really
> > just an optimization. We should assume that a CDM in a 'cold state' can
> > always perform the necessary message exchanges with its server peer to get
> > the content keys needed to decrypt the content. What I understood Joe was
> > asking for was to avoid repeating exchanges every time when they can be done
> > once and the resulting state persisted. The application doesn't even need to
> > know this is happening.
> 
> To say this is just an "optimization" is understating it quite a bit. In the
> same sense, taking a car on a 100mile trip is an optimization over walking.
> Both methods will get you there, but there is a huge difference in
> experience.
> 
> In the case where a web application is using a CDM for the first time, there
> may be bootstrapping keys unique to that origin that need to be downloaded.
> Why unique? Because in our earlier discussions on key sharing it was
> determined that keys should not be shared across origins. 
> 
> If the CDM is using software-based key hiding mechanisms, the bootstrapping
> process is slow. On the order of seconds. This is not a problem when it
> happens once, and can be managed to happen while the application is occupied
> with other things (like displaying video thumbnails for selection) but a
> HUGE problem if it happens on every download. 
> 
> This is the key exchange I do not want to repeat. This key exchange should
> be subject to the same-origin constraints all other CDM communication is. To
> me this implies it must go through the keyrequest/update() mechanism. Or we
> must introduce an alternate mechanism which looks essentially the same, but
> is specific to this purpose.

I certainly agree with not repeating such bootstrap functions unnecessarily. And I agree it could use the keyrequest / update mechanism to complete it. What is it in the existing specification that means you have to repeat this part ?
Comment 16 Joe Steele 2014-04-30 21:00:46 UTC
(In reply to Mark Watson from comment #15)
> (In reply to Joe Steele from comment #14)
> > (In reply to Mark Watson from comment #12)
> > > (In reply to David Dorwin from comment #11)
> > > > (In reply to Mark Watson from comment #10)
<snip>
> > This is the key exchange I do not want to repeat. This key exchange should
> > be subject to the same-origin constraints all other CDM communication is. To
> > me this implies it must go through the keyrequest/update() mechanism. Or we
> > must introduce an alternate mechanism which looks essentially the same, but
> > is specific to this purpose.
> 
> I certainly agree with not repeating such bootstrap functions unnecessarily.
> And I agree it could use the keyrequest / update mechanism to complete it.
> What is it in the existing specification that means you have to repeat this
> part ?

One of my concerns is that there seems to be some disagreement about what is considered in the session. The spec (https://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-media/encrypted-media.html#dom-release) seems clear that the CDM has flexibility in deciding what is considered part of the session and what is not. Your comment 10 make sense to me, but comment 11 indicates that David disagrees. If this is not resolved, it could lead to problems. 

My second concern is that even if we get agreement that the CDM can make the call on what is considered in the session and thus affected by loadSession and release(), the session is the wrong level of granularity. 

The session contents are CDM-specific by design. The KID is a generic well-known entity specified by the media format. It seems clear to me that loading and releasing keys should rely on the generic well-known entity rather than the more nebulous CDM-specific one. This would avoid the whole question of what is in the session.
Comment 17 David Dorwin 2014-04-30 21:28:33 UTC
(In reply to Joe Steele from comment #16)
> One of my concerns is that there seems to be some disagreement about what is
> considered in the session. The spec
> (https://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-media/encrypted-
> media.html#dom-release) seems clear that the CDM has flexibility in deciding
> what is considered part of the session and what is not.

What specific text are you referring to? I don't read this algorithm that way.

<snip>
> My second concern is that even if we get agreement that the CDM can make the
> call on what is considered in the session and thus affected by loadSession
> and release(), the session is the wrong level of granularity. 
> 
> The session contents are CDM-specific by design. The KID is a generic
> well-known entity specified by the media format. It seems clear to me that
> loading and releasing keys should rely on the generic well-known entity
> rather than the more nebulous CDM-specific one. This would avoid the whole
> question of what is in the session.

Does it help to think of a session as a "license"? That seems like a well-known concept. Or a "set of keys"? Since sessions are created from one set of initData, each can really only have one "license" (at a time), though it can be updated/renewed/replaced over time.

The problems with providing access by key ID include:
* It is necessarily unique like the session ID. (How does the CDM know which session to load when provided a key ID?)
* The application may not know all key IDs contained in the session/license.
* If a session is loaded by a single key ID then destroyed, this may (unexpectedly) destroys the license for other key IDs
* Modifying individual keys may not be supported by CDMs.


I don't think we've thoroughly explored the use cases and how these might be resolved in ways that don't involve multiple types of keys/licenses in the same session. Having a multiple licenses in a session and addressing each one individually seems to defeat the purpose of having objects. I think we can provide a better model for applications.

As an example, I don't think it makes sense for the result of the one-time bootstrapping in comment #14 to be considered part of the session. Presumably, the application never needs to manage such a key. However, something like a domain license, which the application can join and leave, should probably be exposed as its own session.
Comment 18 Joe Steele 2014-04-30 21:52:32 UTC
(In reply to David Dorwin from comment #17)
> (In reply to Joe Steele from comment #16)
> > One of my concerns is that there seems to be some disagreement about what is
> > considered in the session. The spec
> > (https://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-media/encrypted-
> > media.html#dom-release) seems clear that the CDM has flexibility in deciding
> > what is considered part of the session and what is not.
> 
> What specific text are you referring to? I don't read this algorithm that
> way.

From algorithm step 1.2.1:
Note: the release() method is intended to act as a hint to the user agent that the application believes the session is no longer needed. ** However, the CDM determines whether resources can now be released. ** 

> 
> <snip>
> > My second concern is that even if we get agreement that the CDM can make the
> > call on what is considered in the session and thus affected by loadSession
> > and release(), the session is the wrong level of granularity. 
> > 
> > The session contents are CDM-specific by design. The KID is a generic
> > well-known entity specified by the media format. It seems clear to me that
> > loading and releasing keys should rely on the generic well-known entity
> > rather than the more nebulous CDM-specific one. This would avoid the whole
> > question of what is in the session.
> 
> Does it help to think of a session as a "license"? That seems like a
> well-known concept. Or a "set of keys"? Since sessions are created from one
> set of initData, each can really only have one "license" (at a time), though
> it can be updated/renewed/replaced over time.
> 
> The problems with providing access by key ID include:
> * It is necessarily unique like the session ID. (How does the CDM know which
> session to load when provided a key ID?)

A CDM would not load a session when provided a key ID. The CDM would load a cached key based on the key ID into an empty session created by the application. 

> * The application may not know all key IDs contained in the session/license.

This is why I like your proposal in bug 25409. This would provide exactly the information the application needs. 

> * If a session is loaded by a single key ID then destroyed, this may
> (unexpectedly) destroys the license for other key IDs

I would not expect the CDM to load a "session" which could contain multiple keys, but instead load a single key into an empty session.

> * Modifying individual keys may not be supported by CDMs.

I am not sure what this means. If you mean that a CDM may not support splitting out keys into separately loadable chunks, that could be a problem. 

> 
> 
> I don't think we've thoroughly explored the use cases and how these might be
> resolved in ways that don't involve multiple types of keys/licenses in the
> same session. Having a multiple licenses in a session and addressing each
> one individually seems to defeat the purpose of having objects. I think we
> can provide a better model for applications.

Even in the absence of multiple keys or licenses in a session, I don't think that the session is the right way to model this problem. I think if we say session==license we will paint ourselves into a corner. 

> 
> As an example, I don't think it makes sense for the result of the one-time
> bootstrapping in comment #14 to be considered part of the session.
> Presumably, the application never needs to manage such a key. However,
> something like a domain license, which the application can join and leave,
> should probably be exposed as its own session.

The application may want to manage it in the scenario I described above. If the application wants to hide the cost of acquiring such keys while the user is doing media selection for example, it would start up a session using some generic initData and close it once those keys are acquired. However that session does not contain any content keys and would therefore be not in compliance with the spec's definition of a "session". That session should also never be released or at least the application should not expect that releasing it will have any impact.
Comment 19 Mark Watson 2014-04-30 22:11:05 UTC
> > 
> > As an example, I don't think it makes sense for the result of the one-time
> > bootstrapping in comment #14 to be considered part of the session.
> > Presumably, the application never needs to manage such a key. However,
> > something like a domain license, which the application can join and leave,
> > should probably be exposed as its own session.
> 
> The application may want to manage it in the scenario I described above. If
> the application wants to hide the cost of acquiring such keys while the user
> is doing media selection for example, it would start up a session using some
> generic initData and close it once those keys are acquired. However that
> session does not contain any content keys and would therefore be not in
> compliance with the spec's definition of a "session". That session should
> also never be released or at least the application should not expect that
> releasing it will have any impact.

Ah. I had imagined that such a bootstrap step would take place the first time you tried to play back content. The session would be created using the initData and the bootstrap would take place together with or before the license exchange.

My assumption here is that such bootstrapping is needed only once when the CDM is first used and perhaps once each time any persistent store is cleared.

If you have a bootstrap step which is needed for every browsing session, this is a different matter.

Still, it is not clear to me that because a keymessage response does not contain any content keys this is not compliant to the specification. There are any number of error cases where the keymessage response does not contain any content keys. In your example, the keymessage contains all the content keys identified in the initData (i.e. none).

Also, it's not clear that releasing this 'bootstrap' session would release the bootstrapped state. In fact I would assume it would not. Same as if the bootstrap was done at the beginning of a 'real' session, that state that it establishes is still available to future sessions.
Comment 20 David Dorwin 2014-04-30 22:20:35 UTC
(In reply to Mark Watson from comment #19)
> > > 
> > > As an example, I don't think it makes sense for the result of the one-time
> > > bootstrapping in comment #14 to be considered part of the session.
> > > Presumably, the application never needs to manage such a key. However,
> > > something like a domain license, which the application can join and leave,
> > > should probably be exposed as its own session.
> > 
> > The application may want to manage it in the scenario I described above. If
> > the application wants to hide the cost of acquiring such keys while the user
> > is doing media selection for example, it would start up a session using some
> > generic initData and close it once those keys are acquired. However that
> > session does not contain any content keys and would therefore be not in
> > compliance with the spec's definition of a "session". That session should
> > also never be released or at least the application should not expect that
> > releasing it will have any impact.
> 
> Ah. I had imagined that such a bootstrap step would take place the first
> time you tried to play back content. The session would be created using the
> initData and the bootstrap would take place together with or before the
> license exchange.
> 
> My assumption here is that such bootstrapping is needed only once when the
> CDM is first used and perhaps once each time any persistent store is cleared.
> 
> If you have a bootstrap step which is needed for every browsing session,
> this is a different matter.

I agree with what Mark said above.

> Still, it is not clear to me that because a keymessage response does not
> contain any content keys this is not compliant to the specification. There
> are any number of error cases where the keymessage response does not contain
> any content keys. In your example, the keymessage contains all the content
> keys identified in the initData (i.e. none).
> 
> Also, it's not clear that releasing this 'bootstrap' session would release
> the bootstrapped state. In fact I would assume it would not. Same as if the
> bootstrap was done at the beginning of a 'real' session, that state that it
> establishes is still available to future sessions.

Agreed - the bootstrapped key would *not* be considered part of the session and you would have a compliant real session containing the content key(s).

This also happens in the background without the application needing to know the details of a specific implementation of a specific key system.


If an application wanted to implement a key-system specific optimization for the very first playback on a subset of platforms, it seems what you really need is one of the following:
1) MediaKeys.bootstrap() to do the one-time non-session-related work.

2) To know whether this process has already occurred.
If not, you could initiate a dummy license request to cause the bootstrapping to occur. The resulting session would be a compliant session but with a useless license based on the initData provided.
Comment 21 David Dorwin 2014-04-30 22:41:03 UTC
(In reply to Joe Steele from comment #18)
> (In reply to David Dorwin from comment #17)
> > (In reply to Joe Steele from comment #16)
> > > One of my concerns is that there seems to be some disagreement about what is
> > > considered in the session. The spec
> > > (https://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-media/encrypted-
> > > media.html#dom-release) seems clear that the CDM has flexibility in deciding
> > > what is considered part of the session and what is not.
> > 
> > What specific text are you referring to? I don't read this algorithm that
> > way.
> 
> From algorithm step 1.2.1:
> Note: the release() method is intended to act as a hint to the user agent
> that the application believes the session is no longer needed. ** However,
> the CDM determines whether resources can now be released. ** 

Oh, that simply means that release() is not close(). This idea of release() being a "hint" started in https://www.w3.org/Bugs/Public/show_bug.cgi?id=17750#c5, but I'm not sure of the use case. Maybe the point was that the CDM may need to send messages before closing. The release flow needs to be improved. I'll add this to the list.


> A CDM would not load a session when provided a key ID. The CDM would load a
> cached key based on the key ID into an empty session created by the
> application. 

> I would not expect the CDM to load a "session" which could contain multiple
> keys, but instead load a single key into an empty session.

> I am not sure what this means. If you mean that a CDM may not support
> splitting out keys into separately loadable chunks, that could be a problem. 

All of these statements seem to assume that CDMs support loading, managing, and destroying the individual keys within a license separately. It seems unlikely that such support is common.
Comment 22 Joe Steele 2014-04-30 23:55:06 UTC
(In reply to David Dorwin from comment #20)
> (In reply to Mark Watson from comment #19)
> > > > 
> > > > As an example, I don't think it makes sense for the result of the one-time
> > > > bootstrapping in comment #14 to be considered part of the session.
> > > > Presumably, the application never needs to manage such a key. However,
> > > > something like a domain license, which the application can join and leave,
> > > > should probably be exposed as its own session.
> > > 
> > > The application may want to manage it in the scenario I described above. If
> > > the application wants to hide the cost of acquiring such keys while the user
> > > is doing media selection for example, it would start up a session using some
> > > generic initData and close it once those keys are acquired. However that
> > > session does not contain any content keys and would therefore be not in
> > > compliance with the spec's definition of a "session". That session should
> > > also never be released or at least the application should not expect that
> > > releasing it will have any impact.
> > 
> > Ah. I had imagined that such a bootstrap step would take place the first
> > time you tried to play back content. The session would be created using the
> > initData and the bootstrap would take place together with or before the
> > license exchange.

It certainly could take place in that way. The issue with that is that the user experience might not be optimal. We direct our developers to pre-load key acquisition cost as much as possible to improve the user experience. 

In the simplest scenario, the user selects a video for first time playback and hits "play". Since the player has never been used, an initial bootstrap happens (taking multiple seconds). Now the license request happens (taking multiple seconds). The user could end up wondering when their video is going to start playing and hitting refresh on the browser. Painful. :-)

> > 
> > My assumption here is that such bootstrapping is needed only once when the
> > CDM is first used and perhaps once each time any persistent store is cleared.

This is correct, but I would add "when the CDM is first used for an origin".

> > 
> > If you have a bootstrap step which is needed for every browsing session,
> > this is a different matter.
> 
> I agree with what Mark said above.

The bootstrapping use case I am describing is a once-per-origin thing. However chained licenses could require a lot more of this. For example requesting a new root every 24 hours. 

> 
> > Still, it is not clear to me that because a keymessage response does not
> > contain any content keys this is not compliant to the specification. There
> > are any number of error cases where the keymessage response does not contain
> > any content keys. In your example, the keymessage contains all the content
> > keys identified in the initData (i.e. none).
> > 
> > Also, it's not clear that releasing this 'bootstrap' session would release
> > the bootstrapped state. In fact I would assume it would not. Same as if the
> > bootstrap was done at the beginning of a 'real' session, that state that it
> > establishes is still available to future sessions.
> 
> Agreed - the bootstrapped key would *not* be considered part of the session
> and you would have a compliant real session containing the content key(s).
> 
> This also happens in the background without the application needing to know
> the details of a specific implementation of a specific key system.

Agreed. Ideally the application can force it to happen early, but does not need to know whether it happened or not. If it did not happen, or did not work, then the key request will fail.

> 
> 
> If an application wanted to implement a key-system specific optimization for
> the very first playback on a subset of platforms, it seems what you really
> need is one of the following:
> 1) MediaKeys.bootstrap() to do the one-time non-session-related work.

What would this bootstrap call do? Does it generate a keyrequest and expect an update()? If so -- how is it different than the current model? How is the web app able to intermediate the bootstrap network requests?

> 
> 2) To know whether this process has already occurred.
> If not, you could initiate a dummy license request to cause the
> bootstrapping to occur. The resulting session would be a compliant session
> but with a useless license based on the initData provided.

The application doesn't need to know whether it has occurred, but if it has occurred then the call should be zero-cost. 

This discussion seems to be heading towards a proposal to solve the simple bootstrapping problem and not the larger issue. This is getting away from my initial proposal to replace loadSession() and release(). I proposed this because I think the conceptual model that says session==contentkeys is wrong. I think that the application should treat the session as an opaque communication channel without assuming it knows the relationship between the content playing and the contents of the session. If the application needs to manage keys directly, it can use the key ID for referring to them. That seems to be a minimal amount of complexity to add, that enables a lot of flexibility later.

I am concerned we will paint ourselves into a corner if we don't change our idea of what the session is. Multiple use cases (embedded keys, chained licenses, multiple license sessions, Ultraviolet playback) will require communication with a server prior to or during playback. If the application can make assumptions about what is in the session, this is going to put up a barrier against implementing these use cases in the future because the existing loadSession/release behavior may not make sense in the context of those features.
Comment 23 Joe Steele 2014-05-01 23:56:32 UTC
I just had a long discussion on this topic with the other engineer at Adobe who will be implementing this. He agrees with me that sessions are not the right level of granularity, but has convinced me that keys are not either since a license can contain multiple keys used for a stream. I am thinking this bug should be RESOLVED to LATER, unless we can agree on a definition of license to use as a reference point. 

The problem of loadSession/release remains.

Here is how our implementation will behave for loadSession():
Our CDM can support it - but it is completely irrelevant. If licenses are cacheable, the CDM will cache them and retrieve them based on the initData. So this would either behave the same as createSession or just reject the Promise. Haven't decided yet. 

Here is how our implementation will behave for release():
Our CDM will release only content keys, and ignore any non-content keys. Those are not releasable. 

It sounds like both of these behaviors are acceptable based on your comments above. 

I will probably open another bug to remove loadSession() since (a) it is not clear to me now why this is needed to support the basic use case and (b) it muddies the waters for supporting future use cases like offline playback.
Comment 24 Joe Steele 2014-05-08 01:13:17 UTC
There has been some offline discussion around this. I have submitted another bug 25595 to cover what I think are ambiguities in the text. I also realized after some media and license auditing that key IDs are not unique between media streams and therefore are not enough in many cases to characterize the license to manage. *sigh* 

SO I will withdraw this bug until I can come up with a better solution.