This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Our current spec uses sessionType to control storage of session data, and load() and remove() to support secure release. These assume the app is in control of the secure release process, and in many respects, the CDM need not be aware. Unfortunately, these features don’t work well with implementations that have the license server and CDM control secure release using terms set in the license itself. This license server model has many positives. It can securely manage license persistence, expirations and secure release features, directly and with little app complexity. We believe supporting it will be important for EME; however, there are issues between it and the current spec: 1. At most, sessionType could only be a hint to the process in the license controlled model, as the license terms would determine whether the license itself is stored, and whether it is subject to secure release requirements. 2. Load() is problematic, since licenses under secure release may still be temporary (in fact this may be desirable) and may be released even though secure release messages are still pending. 3. Status information on secure release is maintained in the CDM and currently would be inaccessible to applications. We could make this information accessible, but the utility of any function we provide for this might vary by CDM. Some of these issues result from overloading the original intents for MediaKeySession and sessionId. These originally managed the short term exchange of messages and have grown to now manage the lifetime of media content playback on a given device. The original short term intention was a much better fit with license server controlled CDM implementations. We believe the license server controlled model must be supported by EME. I’d like to get agreement on this and discuss whether it can be done with a common EME abstraction or not.
We should consider what a superset of the different models (if they are really that different) would look like. On the one hand, the interaction between CDM and license server _must_ be able to control: - whether licenses are persisted across browser sessions - whether key release messages are generated On the other hand, there is a desire for the client side application to also play a role in enabling persistence of any state (both licenses and key release state), for the CDM state in these respects to be visible to the application and for the CDM / application interaction model to be consistent across CDMs. It is also necessary that we can associate key release messages with the original session at the client, since server state needed to process those messages may be stored on the client. I think it's clear that the license server can require key release messages and can prohibit license persistence. I do not see any harm in the client application being able to prohibit license persistence either, though perhaps it should not be able to prohibit persistence of key release information ? As I argued in another post, we do need some concept on the API for a group of keys that are related - specifically those keys that will be persisted or released as a group. In the key release context, this has always been the session, identified by the sessionId, since the very earliest key release proposals. We can certainly decouple the existing session from this grouping for release / persistence purposes, making the existing session completely ephemeral, but some form of identifier needs to be attached both to the original process that delivered a license and the key release process. Finally, a superset would provide visibility to the client application of what the CDM is doing. If it is persisting the license, then the application should be told of this and told how it can obtain the license again later (whether this is by calling create with the same initData or load with some provided identifier seems like a detail here). Equally, if key release information is being persisted, the application should be made aware of this and given instructions for how to obtain it later.
Here's a proposal for discussion: Background: ========== We believe the license server model is not just valid, but likely the superior approach to managing persistent data. It doesn’t fit well with our current notions of sessionType, load session or remove. sessionId has become overloaded, since sessions themselves should be able to come and go as needed to support messaging. It’s possible to have multiple sessions and sessionIds for a single keyId. In hindsight, we think we may have gotten it wrong when we decided to make sessionId mandatory in https://www.w3.org/Bugs/Public/show_bug.cgi?id=17203. Proposal: ======== At some point, we do need to address Mark’s comments about Ids, and the ability to group Ids under a single identifier for convenience. For now, let’s start at a higher level. 1. We should delete sessionId and related operations from the spec. 2. Sessions themselves should be limited to coordinating messages, nothing more. 3. The license server model gives apps plenty of control over persistence and secure release. The first will be based on a transaction that the app will often initiate, and the latter will likely be baked into the general operation of some websites. In both cases, the app doesn’t need to enable data storage. The CDM should accommodate the storage requirements of the license and use persisted licenses automatically for playback. 4. Apps need a way to identify content that has not satisfied secure release requirements and complete the process. 5. Apps may need to ability to identify and delete persistent licenses. IDL: === - Add a method to enumerate specific types of Ids from the CDM. This could be keyIds or something else. The app should know the type of Id it is retrieving. enum IdType (type1, type2, …); interface MediaKeys { Promise<sequence<DOMString>> getStoredIds(IdType); } - Add a method to prepare the session to operate on a returned Id. This would not imply that the session has been fully prepared for playback, but that it is positioned to handle messages related to a specific Id. The primary action that would be taken on returned Ids may be remove. Retain the current remove for that purpose. interface MediaKeySession : EventTarget { Promise<bool> prepare(DOMString storedId); Promise<void> remove(); } Usage: ===== - Secure release – general: o License issued by server is tagged for secure release o One or more secure release licenses are issued during playback of specific content o At playback completion, the app calls getStoredIds with a type associated with secure release o One or more Ids are returned o App calls prepare() and remove() to delete each Id - Secure release – resumption: o Playback session is abruptly interrupted, leaving secure release requirements pending o On the next browser session to the same media site, app calls getStoredIds with a type associated with secure release o One or more Ids are returned o App calls prepare() and remove() to delete each Id - Secure release – variation o App calls getStoredIds after the start of playback and stores returned Ids for later use - Manage persistent licenses: o App calls getStoredIds with a type associated with licenses o App calls prepare() and remove() to delete each specific persistent Id that should be deleted Pending based on discussion: =========================== - IdType details - Secure release app tracking and grouping of Ids
(In reply to Jerry Smith from comment #2) > We believe the license server model is not just valid, but likely the > superior approach to managing persistent data. How is it superior? I expected this claim to be supported below. As described below, I believe this model actually seems to create more problems than it addresses. Regardless of the methods and identifiers used: * The application should control whether data is persisted because it is responsible for managing it once persisted. * The UA should be able to disallow persistence (i.e. in private browsing modes). * Behavior should be deterministic and consistent. Separately, I've been thinking about private browsing modes and how the UA can support "stateless" licenses while rejecting other types of requests. (In some cases, pretending to be stateful is bad for the user because, for example, it can use up device counts or license counts (key release case).) It's not possible for UAs to address such scenarios if they have no control or insight into whether a session will use storage. |sessionType| is also useful because it allows CDM implementations to generate different requests if necessary. > At some point, we do need to address Mark’s comments about Ids, and the > ability to group Ids under a single identifier for convenience. What type of IDs would be grouped? Key IDs? Stored IDs? I think Mark's use case is 1:1 for license:storage. The need to group (key?) IDs in the proposed case seems to be a symptom of a problem. > 1. We should delete sessionId and related operations from the spec. > 2. Sessions themselves should be limited to coordinating messages, nothing > more. Does this mean there could be multiple licenses per session? Could generateRequest() be called multiple times per session? If so, does the session concept have any value in this model? > 3. The license server model gives apps plenty of control over persistence > and secure release. The first will be based on a transaction that the app > will often initiate, and the latter will likely be baked into the general > operation of some websites. In both cases, the app doesn’t need to enable > data storage. The CDM should accommodate the storage requirements of the > license and use persisted licenses automatically for playback. My point is that the app SHOULD control data storage just as it does for all other web APIs. "[Using] persisted licenses automatically for playback" essentially means that persistent licenses are added to a key pool, which will be used for all subsequent playbacks. Issues: * That should be clear from the API. * This takes control away from the application, which may actually wish to fetch a new license. * ow is the application supposed to know what to do when it gets an "encrypted" event? (This event is generated by the UA and has no information about the available keys. Also, initData does not necessarily identify all the key IDs the application would fetch.) Automatic use also creates API problems (as long as we have the concept of a session). The CDM could use keys without a session being created. If use of a key/license triggers a message, there would be no session to send it to. If there are multiple sessions, how would the CDM decide which to send the message to? > - Add a method to enumerate specific types of Ids from the CDM. This could > be keyIds or something else. The app should know the type of Id it is > retrieving. It sounds like you're talking about license management based on key ID. This and the issues it presents have previously been discussed in https://www.w3.org/Bugs/Public/show_bug.cgi?id=25218#c17 Another alternative, retrieval via Initialization Data, is also problematic as previously discussed in http://lists.w3.org/Archives/Public/public-html-media/2014Jul/0030.html > - Add a method to prepare the session to operate on a returned Id. This > would not imply that the session has been fully prepared for playback, but > that it is positioned to handle messages related to a specific Id. The > primary action that would be taken on returned Ids may be remove. Retain > the current remove for that purpose. I would assume that all IDs retrieved above can be use to "prepare". That means other keys could be affected, there could be multiple licenses with the same ID, etc. > interface MediaKeySession : EventTarget { > > Promise<bool> prepare(DOMString storedId); > Promise<void> remove(); > } Is prepare() the equivalent of the current load() method except that it doesn't load keys for playback? Would prepare() work the same as the current generateRequest() and load() where the result is that the MediaKeySession now represents the |storedId|? If not, remove() would not be sufficient as defined.
(In reply to David Dorwin from comment #3) > (In reply to Jerry Smith from comment #2) > > What type of IDs would be grouped? Key IDs? Stored IDs? I think Mark's use > case is 1:1 for license:storage. The need to group (key?) IDs in the > proposed case seems to be a symptom of a problem. > Let's say the only IDs we exposed on the API were key IDs and there was no concept or context in which they were grouped together in any way. So, the available key IDs would be exposed on *MediaKeys*. MediaKeySession would be purely ephemeral for the purpose of message processing state. My point was that, starting here, we would then need to introduce some kind of grouping of key IDs because some operations only operate at the level of a group of key IDs, not at an individual key ID level. For example, release and persistence operate on licenses - which we do not expose explicitly, but which manifest on the API in the fact that keys are released and persisted as a group. Such a grouping could be introduced explicitly, perhaps we would even call it 'licenses'. Or we could note that the grouping aligns with the sessions and use the MediaKeySession for this grouping.
Thinking about this some more. I think one aspect of our current design that makes a lot of sense is the idea of a session as a container for keys and associated state as well as a context for message exchange. We expose the available keys on each session, not on MediaKeys. remove() removes the keys that were created as a result of the messages of the session. Key release messages for the set of keys are fired on the associated session. Script control of persistence is done at the session level. If sessions were to be changed to be purely ephemeral context for message exchange, all of the above would need to change too. However, I do see we have a problem in unifying different underlying DRM implementations. Basically, we have a difference of opinion as to whether some application decisions (persist or not, reload existing or create new) should be communicated to the CDM though the message exchange, directly through the API or both. If we're thinking of monolithic applications, where the same party develops the entire client and server side, I'm not sure we're going to be able to derive any rationale one way or another. If we're thinking of a general-purpose player that might work with multiple services, we have some way to think about who should do what. It seems clear in such a model that persistence requires permission from both the license server and the client application. It's also clear that the client application should be informed when existing state is used instead of a message exchange, but we have this as the session state jumps to ready without any messaging. Some questions: 1) Should the client application be able to force a new message exchange even when there exists state (a license) that could be used for the content ? 2) Should the client be required to know, a priori, that there exists suitable state in order to make use of it ? 3) Could there be multiple distinct licenses created from the same initData and if yes, what are the requirements for the application to control which is used ?
Reply to David: It should be sufficient to establish that the license server model is valid. That it’s already used by more than one established DRM system argues that is the case. I am asserting that it should be allowed. We believe a license with a persistent attribute always be stored (outside of inprivate), and that the license server model can fully allow the app to participate in the decisions that lead up to that happening. There is no loss of app control. Apps can be given the ability to identify and remove persistent licenses from the CDM license store, which would give them further control over what has been stored and whether it should be retained and reused. If the license server and app work together on determining whether a subscription, rental or purchase license should be issued, it’s hard to see further utility of the app also declaring a sessionType, particularly if the session itself is restricted to managing messages as we believe it should be. For automatic re-use, the CDMs could go from generateRequest (or perhaps a renamed method) to fire keychange (or a new variant or renamed version of this event), bypassing the messages. This doesn’t seem a great departure from the current model, and would allow apps to be aware that a persisted license was reused. The previous discussion on keyIds concluded that a given license may contain multiple keys used for a stream. That suggests that some other license or objectId may be needed. That doesn’t invalidate the suggestion that Ids be retrievable from the CDM, and that those support use cases for both managing persistent licenses and securely releasing others. Reply to Mark: There need not be a message exchange when a persistent license is reused, but its use should probably be detectable by the app. Also, it makes sense to me that apps should be able to query stored licenses and potentially delete ones for specific content. I don’t see great advantage though in having the app declare that licenses may or may not be stored. The app and service have full control over this without requiring a redundant API control that broadly controls storage.
Ok, so the only kind of ID we have on the API (that we all agree on) are the key IDs. And we know that keys may be acted on by the CDM in groups. So, for the sake of argument, suppose we have a concept of a 'group of keys', identified by an Array of Key IDs. On a MediaKeySession, we can get an Array<KeyId> for the keys managed in that session. On MediaKeys, we can get an Array< Array<KeyId> > for all the groups of key ids that have some persisted state (and are not managed by an current session?). The persisted state could be licenses for the keys, or secure release information: we might expose a type, that it, what we get is Array <Pair< Type, Array<KeyId> > >. And then if we want to use / interact with one of these persisted groups, we can create a MediaKeySession initialized with the Array<KeyId>, rather than initData (though if we use initData we can still get back one of the existing groups - we will see that keychange is fired and the Array of key ids appears on the session without any messaging). One further thing might be to expose on the MediaKeySession a boolean that tells the application whether the keys will be persisted when the session is closed. I am not necessarily proposing the above, but I think it meets (most of) the requirements from both Jerry and David. - the application can deny persistence, by calling remove() on the MediaKeySession. - the application has full visibility of the keys and key release information which is persisted - the application can see when existing keys are re-used and can explicitly request that existing keys be re-used - the license server can direct that keys are persisted - sessions are entirely ephemeral, used for interaction with groups of keys Here are the remaining problems: - what if there are multiple licenses for the same set of key ids ? In our existing specification this can be managed by the application, which knows what kind licenses it is requesting and can persist the sessionIds for the different sessions. - what if there are multiple secure release records for the same set of key ids ? Again, in the existing specification the application can track which is which based on the sessionId.
Reply to Mark: Are keys used in a session the grouping you really want? I would think you’d want to group key data by a given show or movie that was watched. Secure release would then need to associate some collection of licenses and data with that show or movie, and confirm that it was securely deleted, and the pipeline that used it securely terminated. I don’t think MediaKeySessions map very cleanly to this. Do they? My thought on reusing a persistent license was that the session would be initialized with initData, but that the persistent license would be found and no messages exchanged. A keychange (or keyfound) event would be fired after the stored keys were loaded. This wouldn’t require initializing the session with IDs from the persisted license itself. If multiple valid licenses are stored for a given keyId, then the first found might be used. This assumes that we don’t have a specific scenario that cares which license is used. I’m not sure exactly how this case could be created. If there are multiple secure release records, would the app not want to complete them all before proceeding? That too assumes there is not a need to be specific on completing some secure releases, but not others. Would it be valid for an app to first start playback of secure release content, and then go looking for old sessions not released? That could mix the current session in with the old ones.
(In reply to Jerry Smith from comment #8) > Reply to Mark: > > Are keys used in a session the grouping you really want? I would think > you’d want to group key data by a given show or movie that was watched. > Secure release would then need to associate some collection of licenses and > data with that show or movie, and confirm that it was securely deleted, and > the pipeline that used it securely terminated. I don’t think > MediaKeySessions map very cleanly to this. Do they? > The lowest level grouping is really a license and this is the granularity we expect for secure proof of key release (unless we had to extend the secure proof concept to multi-use licenses, which is complex). I generally assume one session <-> one content item. > My thought on reusing a persistent license was that the session would be > initialized with initData, but that the persistent license would be found > and no messages exchanged. A keychange (or keyfound) event would be fired > after the stored keys were loaded. This wouldn’t require initializing the > session with IDs from the persisted license itself. Yes. What I meant above was that both this model and the app explicitly providing the list of key ids would be supported. > > If multiple valid licenses are stored for a given keyId, then the first > found might be used. This assumes that we don’t have a specific scenario > that cares which license is used. I’m not sure exactly how this case could > be created. It is easy to think of examples where multiple license are present for the same keys with different properties (e.g. limited viewing 'preview' licenses and normal ones), but I agree it is not obvious why you might want to store more than one kind across browser sessions. > > If there are multiple secure release records, would the app not want to > complete them all before proceeding? That too assumes there is not a need > to be specific on completing some secure releases, but not others. Would it > be valid for an app to first start playback of secure release content, and > then go looking for old sessions not released? That could mix the current > session in with the old ones. We only get persisted secure stop information if the regular secure stop process doesn't complete before browser shutdown. So, to get more than one ssecure stop information for the same key you would need multiple playbacks in progress at the same time with the same key. For our part, we do have use-cases for picture-in-picture with multiple simultaneous playbacks, but this is always of different content items and different content items always have different keys in our system. It's not hard to imagine someone using the same key for multiple content items, though: for example the same key for some group of related short clips (e.g. news). We have these corner cases because key id is not really a unique identifier for the thing we want to refer to, which is a license (in the persistent case) or a use of a license (playback session) for secure key release. SessionId stands in for both of these in the existing specification.
I mostly agree with what Mark says in Comment #1 and Comment #5. I also believe that the current spec addresses most of the issues and use cases he discusses or can be easily extended to address specific needs. For example, I don't necessarily object to a mechanism to retrieve stored session IDs from the CDM or even searching for sessions in other ways. Although all such proposals have been associated with eliminating sessionId, I think that's achievable in the current model if desired. (In reply to Mark Watson from comment #1) > I think it's clear that the license server can require key release messages > and can prohibit license persistence. I do not see any harm in the client > application being able to prohibit license persistence either, though > perhaps it should not be able to prohibit persistence of key release > information ? An important clarification on the second sentence: In the current model, the application doesn't really prevent persistence of a license that requests it. Instead, the application tells the user agent and CDM what it expects and prevents incompatible licenses from being requested (and issued). The normative, transparent, and interoperable |sessionType| value allows the user agent to make decisions and provide proper responses early. Among other things, this prevents the server from issuing an unusable license. (If an unusable offline or key release license is issued, the CDM would need to immediately reject it and make sure a "release" receipt reaches the server or the user might lose some rights.) sessionType also serves as a check that the application and server are in agreement. While one might assume this to be true, I have seen such misconfiguration (of a PlayReady server) in production. This may be increasingly likely when multiple different license servers are being used. An application's use of a persistent session type is also an acceptance of responsibility that does not come with use of temporary session(s). For example, managing key release messages or calling remove(). This is another reason that simply giving applications "the ability to identify and remove persistent licenses from the CDM license store" after the fact is insufficient - applications that don't care about or expect licenses to be persisted won't be designed to clean up persistent licenses.
(In reply to Jerry Smith from comment #6) > Reply to David: > > It should be sufficient to establish that the license server model is valid. I don't think anyone is arguing whether it is valid. I'm arguing that it is not appropriate as-is for the web platform. The current spec basically adds a normative framework around this opaque model. There are likely many valid models, but that doesn't mean EME is designed around all of them. In fact, we have decided that some models are out of scope for similar reasons. > That it’s already used by more than one established DRM system argues that > is the case. These DRM systems were developed in the context of generally DRM-specific applications, often built on associated vertical application stacks. I believe it is misguided to project this onto the web platform, user agents, interoperable applications, and all clients, DRM systems, and use cases. > I am asserting that it should be allowed. I disagree with this characterization. The proposal appears to force this model on all implementations and fundamentally changes the long-standing meaning of MediaKeySession. That's much different than _allowing_ something. > We believe a license with a persistent attribute always be stored (outside > of inprivate), and that the license server model can fully allow the app to > participate in the decisions that lead up to that happening. There is no > loss of app control. How do the app *and user agent*, which the app relies on, participate in these decisions? I don't see any proposal for a normative and interoperable mechanism. (See comment #10 for why this is important.) You mention inprivate as an exception, but there is no clear mechanism to enable a user agent to implement such an exception. Implementing InPrivate/Incognito/private browsing in a way that makes sense to applications is one of the reasons I want the application to specify what it needs. Why do you and/or Microsoft feel so strongly that involving the application and user agent is a problem? Is that the real issue or are your concerns more about related issues, such as how license are loaded/prepared for use?
(In reply to Mark Watson from comment #9) > > It is easy to think of examples where multiple license are present for the > same keys with different properties (e.g. limited viewing 'preview' licenses > and normal ones), but I agree it is not obvious why you might want to store > more than one kind across browser sessions. > Actually, I take back that last bit. I may have both a 'preview' license with various limitations and a 'real' license and they might both be persistable. I might intend to have the app remove the preview one when it gets the real one, but the browser might shut down before I have a chance and both may be present in persistent store. In a future session, if I want to retrieve one of these, the app should be in control of which one. Using the 'real' license may have consequences (it may be single-use, may start expiration timers etc.), so that must be an explicit request from the app, not a choice of the CDM. I think we can say that Key Ids (or equivalently initData) alone are not sufficient for selecting which of several persisted data sets the application wants to access. I can see two options: - we have an id that identifies the state created in a given session, and allows it to be retrieved later (this is what sessionid is now, though we could rename it to identify the persistent state, rather than the session, if you like) - some additional indication is needed together with the key ids or initData to identify which bit of state being retrieved. This could be a 'type' field, but that still assumes you will never multiple things of the same type which is not the case for secure key release. To support various DRM's approaches and capabilities this type would either need to be key-system specific or a DOMString with the opportunity for people to register values with definitions.
We have a revised proposal that drops querying the CDM for IDs and accommodates specific requests we received when this bug was last discussed. That feedback included these specific requests: 1. Apps must have control over data storage 2. Apps should be able to control whether an existing license is reused or a new one obtained 3. SessionId should be retained 4. If multiple persistent licenses were available, the app should be able to select which it uses Our revised proposal would: 1. Give apps control over reusing persistent keys by calling a load method using the same initData as generateRequest. Apps then have complete control over re-use, and it’s natural for them to choose between re-use or requesting a new license. 2. Separate secure release session retrieval from re-using persisted licenses. We’ve continued that in the model of the current spec with a session retrieval an remove methods. 3. Retain sessionId (or whatever it becomes) and return errors if license terms don’t conform. We’ve not resolved the issue about apps selecting from specific persisted licenses. We’d not believed this was an issue previously in our DRM, and think it deserves further discussion. An updated WebIDL would be: interface MediaKeys { MediaKeySession createSession (optional SessionType sessionType = "temporary"); Promise<void> setServerCertificate (BufferSource serverCertificate); Promise<void> removeKey (sequence<ArrayBuffer>); Removes keys based on keyId, including persisted copies (affects all relevant existing MediaKeySessions) Promise<void> removeAllKey (); Removes all keys currently loaded, including persisted copies (affects all relevant existing MediaKeySessions) }; interface MediaKeySession : EventTarget { readonly attribute DOMString sessionId; readonly attribute unrestricted double expiration; readonly attribute Promise<void> closed; Promise<void> request (DOMString initDataType, BufferSource initData); Promise<void> update (BufferSource response); Promise<boolean> retrieve (DOMString sessionId); Loads session data based on sessonId, except for keys Promise<boolean> load (DOMString initDataType, BufferSource initData); Loads stored keys based on initData Promise<void> remove() Promise<void> close (); Promise<sequence<ArrayBuffer>> getUsableKeyIds (); }; This does some renaming that I hope doesn't confuse the discussion: 1. The previous generateRequest becomes request 2. The previous load becomes retreive (for session data retreival) 3. A new load is used to attempt loading persistent licenses 4. New removeKey and removeAllKeys are intended to remove specific keys based on individual keyId or remove all keys currently loaded
(In reply to Jerry Smith from comment #13) Some initial feedback. > That feedback included these specific requests: ... > 4. If multiple persistent licenses were available, the app should be able to > select which it uses How is this accomplished? > Our revised proposal would: ... > 2. Separate secure release session retrieval from re-using persisted > licenses. We’ve continued that in the model of the current spec with a > session retrieval an remove methods. Does this mean it is not possible to load a session for use in playback via sessionId? Is there a reason for this limitation? Some apps might wish to load by sessionId. Why wouldn't remove() > We’ve not resolved the issue about apps selecting from specific persisted > licenses. We’d not believed this was an issue previously in our DRM, and > think it deserves further discussion. I guess this answers my first question above. > interface MediaKeys { ... > Promise<void> removeKey (sequence<ArrayBuffer>); > Removes keys based on keyId, including persisted copies (affects all > relevant existing MediaKeySessions) This seems really odd and has all the issues with removing parts of a license (session) and multiple licenses having the key ID (you say to remove it from all). I really think this should be addressed by operating on a MediaKeySession. Why is this necessary? > Promise<void> removeAllKey (); Removes all keys > currently loaded, including persisted copies (affects all relevant existing > MediaKeySessions) > }; Why is this necessary? The application could just as easily call remove on each session and use .all(). > interface MediaKeySession : EventTarget { ... > Promise<void> request (DOMString initDataType, > BufferSource initData); This doesn't actually request anything - it generates a request to send (be used in the request). > Promise<boolean> retrieve (DOMString sessionId); > Loads session data based on sessonId, except for keys See above - I'm not sure why this is different from load(). > This does some renaming that I hope doesn't confuse the discussion: > > 1. The previous generateRequest becomes request See above. > 2. The previous load becomes retreive (for session data retreival) See above. > 3. A new load is used to attempt loading persistent licenses > 4. New removeKey and removeAllKeys are intended to remove specific keys > based on individual keyId or remove all keys currently loaded See above - can you explain the reason for theses?
If I understand Jerry's comment correctly, it is problematic for them to retrieve licenses by sessionId, so they would like to restrict the sessionId concept to be used only for key release data. If you want to retrieve license / keys, you use initData and load(). If you want to retrieve secure stop information, you use sessionId and retrieve(). Because you can't retrieve a session with keys using sessionId then removing persisted keys needs these new MediaKeys methods. I think you could almost polyfill the existing spec API on top of this one, by keeping (in IndexedDB, say) mappings from sessionIds to initData and sessionId to keyIds. The only difference is that you would need to specify when using the (old, polyfilled) loadSession() whether you wanted a session for use with playback or were expecting only key release data, which the application should know. Equally, one could polyfill this proposed API on top of the existing one in the specification, by keeping a mapping from initData to sessionId and from keyIds to sessionId.
(In reply to Mark Watson from comment #15) > If I understand Jerry's comment correctly, it is problematic for them to > retrieve licenses by sessionId, so they would like to restrict the sessionId > concept to be used only for key release data. > > If you want to retrieve license / keys, you use initData and load(). > > If you want to retrieve secure stop information, you use sessionId and > retrieve(). > > Because you can't retrieve a session with keys using sessionId then removing > persisted keys needs these new MediaKeys methods. > > I think you could almost polyfill the existing spec API on top of this one, > by keeping (in IndexedDB, say) mappings from sessionIds to initData and > sessionId to keyIds. The only difference is that you would need to specify > when using the (old, polyfilled) loadSession() whether you wanted a session > for use with playback or were expecting only key release data, which the > application should know. > > Equally, one could polyfill this proposed API on top of the existing one in > the specification, by keeping a mapping from initData to sessionId and from > keyIds to sessionId. This would work in many cases, but in some cases two different initData may cause overlapping key requests in ways that the application cannot easily detect. Here is the example I tried to give in the TPAC -- The application tries to play stream1 and provides initData1. The CDM makes a key request based on initData1. The license server returns a set of keys that includes keys needed for stream1 AND stream2. Later on the application tries to play stream2 and provides initData2. Without parsing the PSSH boxes, the application has no way of knowing that it already has the keys available. It could try to load all the previous sessions. Or it could call generateKeyRequest(initData2) and make the unnecessary license request. However if an API was available to load keys based on initData alone, the CDM could make that determination and not require a license request.
(In reply to Joe Steele from comment #16) > > Here is the example I tried to give in the TPAC -- > > The application tries to play stream1 and provides initData1. The CDM makes > a key request based on initData1. The license server returns a set of keys > that includes keys needed for stream1 AND stream2. Later on the application > tries to play stream2 and provides initData2. Without parsing the PSSH > boxes, the application has no way of knowing that it already has the keys > available. It could try to load all the previous sessions. Or it could call > generateKeyRequest(initData2) and make the unnecessary license request. > However if an API was available to load keys based on initData alone, the > CDM could make that determination and not require a license request. Ok, this a good example, but is it different in character from the case where the initData in a video file also contains the key ids for the associated audio ? We have said that the application is supposed to know what to expect in this respect, because it has metadata about the content that tells it what to expect. I think the model we have is as follows: 1) initData maps to a set of key ids, but detecting at the application whether two initData map to the same of different sets of key ids is key-system specific / impossible 2) at a given time the session can contain either (a) the key ids obtained from an initData (it is waiting for a license) (b) the key ids, associated keys and associated policy (it got a license) (c) the key ids and proof of key release It certainly seems convenient for the application if the CDM can automatically retrieve any session it already has for given initData. Retrieval of sessions by sessionId would then make sense only for: (i) there are several sessions with the same key ids but different license policy (ii) to retrieve the key release information, where again there may be several for the same set of keyids Would it make sense to make session retrieval *always* driven primarily by the initData / set of key ids, supplemented, when necessary by: - which case you want (a), (b), (c) or "(a) or (b)" - something to disambiguate for cases (b) and (c) between the multiple possible sessions The disambiguator here is like sessionId but scoped to sessions with the same set of key ids.
(In reply to Joe Steele from comment #16) > Here is the example I tried to give in the TPAC -- > > The application tries to play stream1 and provides initData1. The CDM makes > a key request based on initData1. The license server returns a set of keys > that includes keys needed for stream1 AND stream2. Later on the application > tries to play stream2 and provides initData2. Without parsing the PSSH > boxes, the application has no way of knowing that it already has the keys > available. It could try to load all the previous sessions. Or it could call > generateKeyRequest(initData2) and make the unnecessary license request. > However if an API was available to load keys based on initData alone, the > CDM could make that determination and not require a license request. It's unclear whether your stream1 and stream2 are for the same title or different title. We have given up on gracefully handling the former for the basic streaming case (bug 25268) and settled on requiring application or server logic. It's reasonable to expect that persistent licenses will require application logic, especially for such complex scenarios as different titles. In the different titles scenario, the application needs to do more of the work. This is already a complex application involving multiple (related?) titles and persistent licenses. The CDM should not be a replacement for application logic. In the current spec, the application would need to know that session "6789" contains the keys for stream1 and stream2. If we allowed initData to be an index, the application would need to know that it can load the keys for either stream using initData1. For this scenario, there isn't much difference. However, "generateKeyRequest(initData2)" seems to imply that the CDM would crack open the initData to get the key IDs then look for these key IDs. I do not think we should add this complexity to the clients when applications can handle this themselves in an interoperable way.
(In reply to Mark Watson from comment #17) > (In reply to Joe Steele from comment #16) > > I think the model we have is as follows: > 1) initData maps to a set of key ids, but detecting at the application > whether two initData map to the same of different sets of key ids is > key-system specific / impossible > 2) at a given time the session can contain either > (a) the key ids obtained from an initData (it is waiting for a license) > (b) the key ids, associated keys and associated policy (it got a license) > (c) the key ids and proof of key release > > It certainly seems convenient for the application if the CDM can > automatically retrieve any session it already has for given initData. > Retrieval of sessions by sessionId would then make sense only for: > (i) there are several sessions with the same key ids but different license > policy > (ii) to retrieve the key release information, where again there may be > several for the same set of keyids > > Would it make sense to make session retrieval *always* driven primarily by > the initData / set of key ids, supplemented, when necessary by: > - which case you want (a), (b), (c) or "(a) or (b)" > - something to disambiguate for cases (b) and (c) between the multiple > possible sessions > > The disambiguator here is like sessionId but scoped to sessions with the > same set of key ids. I think we have different mental models here. By saying "retrieve the session" you are implying that the CDM is doing a lookup which results in a previously persisted session being loaded. That is not what I am saying. Assume that I have a database of persistent licenses. When update() is called on a persistent session, I add to that database. When remove() is called on a session, I remove from that database. When loadKeys(initData) is called on a new session, I look in that database to see if I have matching licenses. If I do, I add them to the session. If I do not, I fail. The session in this case becomes a convenience for holding transient state only. (In reply to David Dorwin from comment #18) > (In reply to Joe Steele from comment #16) > > > Here is the example I tried to give in the TPAC -- > > > > The application tries to play stream1 and provides initData1. The CDM makes > > a key request based on initData1. The license server returns a set of keys > > that includes keys needed for stream1 AND stream2. Later on the application > > tries to play stream2 and provides initData2. Without parsing the PSSH > > boxes, the application has no way of knowing that it already has the keys > > available. It could try to load all the previous sessions. Or it could call > > generateKeyRequest(initData2) and make the unnecessary license request. > > However if an API was available to load keys based on initData alone, the > > CDM could make that determination and not require a license request. > > It's unclear whether your stream1 and stream2 are for the same title or > different title. We have given up on gracefully handling the former for the > basic streaming case (bug 25268) and settled on requiring application or > server logic. It's reasonable to expect that persistent licenses will > require application logic, especially for such complex scenarios as > different titles. stream1 and stream2 are different titles. E.g. different episodes of a TV series. > > In the different titles scenario, the application needs to do more of the > work. This is already a complex application involving multiple (related?) > titles and persistent licenses. The CDM should not be a replacement for > application logic. In general I agree here. Except that as I was attempting to point out, this is logic that the CDM can potentially perform orders of magnitude more efficiently than the application. > > In the current spec, the application would need to know that session "6789" > contains the keys for stream1 and stream2. If we allowed initData to be an > index, the application would need to know that it can load the keys for > either stream using initData1. For this scenario, there isn't much > difference. I don't understand what "allowing initData to be an index" means. An index to what? > > However, "generateKeyRequest(initData2)" seems to imply that the CDM would > crack open the initData to get the key IDs then look for these key IDs. I do > not think we should add this complexity to the clients when applications can > handle this themselves in an interoperable way. The problem with generateRequest() is that it assumes a keyrequest will be generated. Step 7 of the algorithm does not state what to do if the CDM determines that a key request is not needed because the available keys are already present. In that case, step 11 should not be called. If we were to fix that (e.g. skip step 11 if step 7 does not result in a message being produced) then the name of the method seems odd. It should be something more like generateRequestIfNeeded. That would be a return to something more like createSession() was originally (which I preferred).
(In reply to Joe Steele from comment #19) > (In reply to Mark Watson from comment #17) > > (In reply to Joe Steele from comment #16) > > > > I think the model we have is as follows: > > 1) initData maps to a set of key ids, but detecting at the application > > whether two initData map to the same of different sets of key ids is > > key-system specific / impossible > > 2) at a given time the session can contain either > > (a) the key ids obtained from an initData (it is waiting for a license) > > (b) the key ids, associated keys and associated policy (it got a license) > > (c) the key ids and proof of key release > > > > It certainly seems convenient for the application if the CDM can > > automatically retrieve any session it already has for given initData. > > Retrieval of sessions by sessionId would then make sense only for: > > (i) there are several sessions with the same key ids but different license > > policy > > (ii) to retrieve the key release information, where again there may be > > several for the same set of keyids > > > > Would it make sense to make session retrieval *always* driven primarily by > > the initData / set of key ids, supplemented, when necessary by: > > - which case you want (a), (b), (c) or "(a) or (b)" > > - something to disambiguate for cases (b) and (c) between the multiple > > possible sessions > > > > The disambiguator here is like sessionId but scoped to sessions with the > > same set of key ids. > > I think we have different mental models here. By saying "retrieve the > session" you are implying that the CDM is doing a lookup which results in a > previously persisted session being loaded. That is not what I am saying. > > Assume that I have a database of persistent licenses. When update() is > called on a persistent session, I add to that database. When remove() is > called on a session, I remove from that database. When loadKeys(initData) is > called on a new session, I look in that database to see if I have matching > licenses. If I do, I add them to the session. If I do not, I fail. The > session in this case becomes a convenience for holding transient state only. I think this is a distinction without a difference. Whether you think of it as "retrieving a previously persisted session" or "creating a new session and loading in some persisted state" doesn't matter if the end result, a session containing the keys identified by the initData, is the same.
(In reply to Mark Watson from comment #20) > (In reply to Joe Steele from comment #19) > I think this is a distinction without a difference. Whether you think of it > as "retrieving a previously persisted session" or "creating a new session > and loading in some persisted state" doesn't matter if the end result, a > session containing the keys identified by the initData, is the same. But they are not the same. In the example of stream1 and stream2 that I gave, passing stream2's initData would only cause the keys for stream2 to be loaded. Not the keys for stream1, even though those keys were delivered in the same session. The same is true if stream1's initData was used to load a session later. This would not be useful in combination with key release, but would be very useful on its own.
(In reply to Joe Steele from comment #21) > (In reply to Mark Watson from comment #20) > > (In reply to Joe Steele from comment #19) > > I think this is a distinction without a difference. Whether you think of it > > as "retrieving a previously persisted session" or "creating a new session > > and loading in some persisted state" doesn't matter if the end result, a > > session containing the keys identified by the initData, is the same. > > But they are not the same. In the example of stream1 and stream2 that I > gave, passing stream2's initData would only cause the keys for stream2 to be > loaded. Not the keys for stream1, even though those keys were delivered in > the same session. The same is true if stream1's initData was used to load a > session later. This would not be useful in combination with key release, but > would be very useful on its own. In this case, there is a concept of key grouping that doesn't (yet) have a name. Let's assume the initData for stream1 contains key id X1 and the initData for stream2 contains key id X2. To play stream1, the server knows that keys X1 and Y1 and needed. To play stream2, the server knows that keys X2 and Y2 are needed. So, what the server has returned are keys X1, Y1, X2 and Y2. To satisfy your usecase, the client also needs to learn that these are grouped: X1 with Y1 and X2 with Y2. So, then, later if an initData indicating X1 is provided we retrieve from store both X1 and Y1, as the server would have provided. Likewise, if an initData indicating Y1 is provided we retrieve from store both X1 and Y1. We need a name for this grouping, otherwise we are not going to be to specify interoperable behaviour. One is tempted to say "license" ... Unless, that is, we require that the initData identify all the keys needed. i.e. the SD streams contains the key ids for both SD and HD streams etc.
(In reply to Mark Watson from comment #22) > (In reply to Joe Steele from comment #21) > > (In reply to Mark Watson from comment #20) > > > (In reply to Joe Steele from comment #19) > > > I think this is a distinction without a difference. Whether you think of it > > > as "retrieving a previously persisted session" or "creating a new session > > > and loading in some persisted state" doesn't matter if the end result, a > > > session containing the keys identified by the initData, is the same. > > > > But they are not the same. In the example of stream1 and stream2 that I > > gave, passing stream2's initData would only cause the keys for stream2 to be > > loaded. Not the keys for stream1, even though those keys were delivered in > > the same session. The same is true if stream1's initData was used to load a > > session later. This would not be useful in combination with key release, but > > would be very useful on its own. > > In this case, there is a concept of key grouping that doesn't (yet) have a > name. > > Let's assume the initData for stream1 contains key id X1 and the initData > for stream2 contains key id X2. To play stream1, the server knows that keys > X1 and Y1 and needed. To play stream2, the server knows that keys X2 and Y2 > are needed. > > So, what the server has returned are keys X1, Y1, X2 and Y2. > > To satisfy your usecase, the client also needs to learn that these are > grouped: X1 with Y1 and X2 with Y2. So, then, later if an initData > indicating X1 is provided we retrieve from store both X1 and Y1, as the > server would have provided. > > Likewise, if an initData indicating Y1 is provided we retrieve from store > both X1 and Y1. Because the initData indicates what keys are needed to "satisfy" the request, the client can just use the initData for the content it is trying to play without knowing anything about previous sessions. This will not negatively affect interop because the client can always try "loading" a session using the initData rather than creating a new session and forcing request/response network traffic if persistent keys may be in use. The CDM can succeed in loading the session with keys that match the initData it was passed, or it can fail to load implying that not enough keys were available to satisfy the initData requirements. In that case, the client just uses a newly created session instead. > > We need a name for this grouping, otherwise we are not going to be to > specify interoperable behaviour. One is tempted to say "license" ... I am also tempted. Key and policy bundle does not drop from the tongue quite as easily. > > Unless, that is, we require that the initData identify all the keys needed. > i.e. the SD streams contains the key ids for both SD and HD streams etc. It seems implicit to me that the initData refers to the keys needed for the content it is included in. But I don't see the need for it to require keys for other content.
(In reply to Joe Steele from comment #23) > (In reply to Mark Watson from comment #22) > > (In reply to Joe Steele from comment #21) > > > (In reply to Mark Watson from comment #20) > > > > (In reply to Joe Steele from comment #19) > > > > I think this is a distinction without a difference. Whether you think of it > > > > as "retrieving a previously persisted session" or "creating a new session > > > > and loading in some persisted state" doesn't matter if the end result, a > > > > session containing the keys identified by the initData, is the same. > > > > > > But they are not the same. In the example of stream1 and stream2 that I > > > gave, passing stream2's initData would only cause the keys for stream2 to be > > > loaded. Not the keys for stream1, even though those keys were delivered in > > > the same session. The same is true if stream1's initData was used to load a > > > session later. This would not be useful in combination with key release, but > > > would be very useful on its own. > > > > In this case, there is a concept of key grouping that doesn't (yet) have a > > name. > > > > Let's assume the initData for stream1 contains key id X1 and the initData > > for stream2 contains key id X2. To play stream1, the server knows that keys > > X1 and Y1 and needed. To play stream2, the server knows that keys X2 and Y2 > > are needed. > > > > So, what the server has returned are keys X1, Y1, X2 and Y2. > > > > To satisfy your usecase, the client also needs to learn that these are > > grouped: X1 with Y1 and X2 with Y2. So, then, later if an initData > > indicating X1 is provided we retrieve from store both X1 and Y1, as the > > server would have provided. > > > > Likewise, if an initData indicating Y1 is provided we retrieve from store > > both X1 and Y1. Sorry, there was a typo there, if the initData indicates X2 then we should find X2 and Y2. > > Because the initData indicates what keys are needed to "satisfy" the > request, the client can just use the initData for the content it is trying > to play without knowing anything about previous sessions. Not quite. More concretely, let's say stream1 is episode1 and stream2 is episode2. Each episode has a SD version with key X(1|2) and an HD stream with key Y(1|2). So there are four files, four keys and four initData. The intention is that given any one of the initData, you can get the keys for both the HD and SD streams. Additionally, the server helpfully chooses to provide the keys for both episodes. To be able to look up the right thing from a single initData, the client needs to know that X1 goes with Y1 and X2 foes with Y2. > This will not > negatively affect interop because the client can always try "loading" a > session using the initData rather than creating a new session and forcing > request/response network traffic if persistent keys may be in use. > > The CDM can succeed in loading the session with keys that match the initData > it was passed, or it can fail to load implying that not enough keys were > available to satisfy the initData requirements. In that case, the client > just uses a newly created session instead. > > > > > We need a name for this grouping, otherwise we are not going to be to > > specify interoperable behaviour. One is tempted to say "license" ... > > I am also tempted. Key and policy bundle does not drop from the tongue quite > as easily. > > > > > Unless, that is, we require that the initData identify all the keys needed. > > i.e. the SD streams contains the key ids for both SD and HD streams etc. > > It seems implicit to me that the initData refers to the keys needed for the > content it is included in. But I don't see the need for it to require keys > for other content. It's always been my assumption that the initData refers only to the keys needed for the individual file in which it is embedded (anything else complicates packaging). But for adaptive streaming you need the keys for all the bitrates of the same content and you want to get those in one server round trip.
(In reply to Mark Watson from comment #24) > (In reply to Joe Steele from comment #23) > > (In reply to Mark Watson from comment #22) > > > (In reply to Joe Steele from comment #21) > > > > (In reply to Mark Watson from comment #20) > > > > > (In reply to Joe Steele from comment #19) > > > > > I think this is a distinction without a difference. Whether you think of it > > > > > as "retrieving a previously persisted session" or "creating a new session > > > > > and loading in some persisted state" doesn't matter if the end result, a > > > > > session containing the keys identified by the initData, is the same. > > > > > > > > But they are not the same. In the example of stream1 and stream2 that I > > > > gave, passing stream2's initData would only cause the keys for stream2 to be > > > > loaded. Not the keys for stream1, even though those keys were delivered in > > > > the same session. The same is true if stream1's initData was used to load a > > > > session later. This would not be useful in combination with key release, but > > > > would be very useful on its own. > > > > > > In this case, there is a concept of key grouping that doesn't (yet) have a > > > name. > > > > > > Let's assume the initData for stream1 contains key id X1 and the initData > > > for stream2 contains key id X2. To play stream1, the server knows that keys > > > X1 and Y1 and needed. To play stream2, the server knows that keys X2 and Y2 > > > are needed. > > > > > > So, what the server has returned are keys X1, Y1, X2 and Y2. > > > > > > To satisfy your usecase, the client also needs to learn that these are > > > grouped: X1 with Y1 and X2 with Y2. So, then, later if an initData > > > indicating X1 is provided we retrieve from store both X1 and Y1, as the > > > server would have provided. > > > > > > Likewise, if an initData indicating Y1 is provided we retrieve from store > > > both X1 and Y1. > > Sorry, there was a typo there, if the initData indicates X2 then we should > find X2 and Y2. > > > > > Because the initData indicates what keys are needed to "satisfy" the > > request, the client can just use the initData for the content it is trying > > to play without knowing anything about previous sessions. > > Not quite. More concretely, let's say stream1 is episode1 and stream2 is > episode2. Each episode has a SD version with key X(1|2) and an HD stream > with key Y(1|2). > > So there are four files, four keys and four initData. The intention is that > given any one of the initData, you can get the keys for both the HD and SD > streams. Additionally, the server helpfully chooses to provide the keys for > both episodes. > > To be able to look up the right thing from a single initData, the client > needs to know that X1 goes with Y1 and X2 foes with Y2. I am not clear why the initData would not reference both keys? In a case where separate keys are used for separate bitrates, I expect both keys will be referenced in the initData and both would be found via the local lookup. Otherwise I would expect that different bit rates would have separate initData. > > > This will not > > negatively affect interop because the client can always try "loading" a > > session using the initData rather than creating a new session and forcing > > request/response network traffic if persistent keys may be in use. > > > > The CDM can succeed in loading the session with keys that match the initData > > it was passed, or it can fail to load implying that not enough keys were > > available to satisfy the initData requirements. In that case, the client > > just uses a newly created session instead. > > > > > > > > We need a name for this grouping, otherwise we are not going to be to > > > specify interoperable behaviour. One is tempted to say "license" ... > > > > I am also tempted. Key and policy bundle does not drop from the tongue quite > > as easily. > > > > > > > > Unless, that is, we require that the initData identify all the keys needed. > > > i.e. the SD streams contains the key ids for both SD and HD streams etc. > > > > It seems implicit to me that the initData refers to the keys needed for the > > content it is included in. But I don't see the need for it to require keys > > for other content. > > It's always been my assumption that the initData refers only to the keys > needed for the individual file in which it is embedded (anything else > complicates packaging). But for adaptive streaming you need the keys for all > the bitrates of the same content and you want to get those in one server > round trip. That is an optimization we should allow for, but I don't see why this needs to be explicit in the spec. I don't believe everyone manages adaptive content keys in this manner. If other retailers want to comment on this, it would be useful.
r content. > > > > It's always been my assumption that the initData refers only to the keys > > needed for the individual file in which it is embedded (anything else > > complicates packaging). But for adaptive streaming you need the keys for all > > the bitrates of the same content and you want to get those in one server > > round trip. > > That is an optimization we should allow for, but I don't see why this needs > to be explicit in the spec. I don't believe everyone manages adaptive > content keys in this manner. If other retailers want to comment on this, it > would be useful. Ok, so it's good that we've identified a place where we had different assumptions. As a general rule it's important if as much of the file packaging process can be done independently for each of the bitrates. Individual bitrates are re-encoded, re-packaged, added, removed etc. at different times, so if all the files need information about all the other associated files this is a lot of complexity. It needs to be handled in the specification if we need to define what loading by initData really means: which keys are loaded ? If it is only the keys referred to in the initData, the above is not supported. If it is all keys that were provided last time this initData was used, then this breaks your multi-episode use-case.
Jerry Smith and Mark Watson have agreed that this bug is obsolete and thus it can be closed. /paulc HME WG Chair