Bug 17673 - Define Initialization Data for implementations that choose to support the ISO Base Media File Format
Summary: Define Initialization Data for implementations that choose to support the ISO...
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: Encrypted Media Extensions (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: FPWD
Assignee: Jerry Smith
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on: 24951
Blocks: 17682 24027
  Show dependency treegraph
 
Reported: 2012-07-02 23:39 UTC by David Dorwin
Modified: 2014-08-05 21:15 UTC (History)
15 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Dorwin 2012-07-02 23:39:41 UTC
v0.1 of the proposal says "Initialization Data is... container-specific data"  [1]. To help ensure implementations that choose to support a given format are interoperable and provide examples for the reader, we should define Initialization Data along with its use and behavior for common formats. This is similar to how the Media Source proposal has provided guidance for containers [2].

This bug tracks this task for ISO/IEC 14496-12, the ISO Base Media File Format (ISO BMFF) [3].

Below is a non-exhaustive list of items to address:
* Files should use ISO/IEC 23001-7, Common Encryption (CENC) [4].
* What are the contents of initData? Likely one of the following:
  - All Protection System Specific Header (PSSH) boxes in the file
  - A single Protection System Specific Header (PSSH) box
    * Note: If a key system has not been specified, the user agent won't know which to select.
  - Key ID(s)
* Does the first initData result in a license containing all the keys that might be necessary for the entire file/stream?
* If there are multiple keys used, should a needkey event be fired for each?
* What are the contents of the needkey event when an encrypted block is encountered and no key is available [5]? Is it the same initData or key-specific information?
* How does Clear Key work?
  - Must there be a PSSH for Clear Key?
  - How are keys and key IDs correlated?


[1] http://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-media/encrypted-media.html#initialization-data
[2] http://dvcs.w3.org/hg/html-media/raw-file/tip/media-source/media-source.html#byte-stream-formats
[3] http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=51533
[4] http://www.iso.org/iso/iso_catalogue/catalogue_tc/catalogue_detail.htm?csnumber=60397
[5] http://dvcs.w3.org/hg/html-media/raw-file/tip/encrypted-media/encrypted-media.html#algorithms-enrypted-block
Comment 1 Yang Sun 2012-07-03 08:59:19 UTC
So we need to define a format specification for every media format, h.264?
Comment 2 David Dorwin 2012-07-03 17:09:43 UTC
(In reply to comment #1)
> So we need to define a format specification for every media format, h.264?

H.264 is a codec, which is commonly used in an ISO BMFF/MP4 container. Thus, this bug should cover its use.

> Should we define a format specification for every container?
> Or we can define a common requirement for all format?

These bugs are related to clearly defining how EME will make use of existing container formats, which already have specifications for providing encryption information. We can - and should - provide basic guidelines for use of other/future formats, but we should also have specifics to help with interoperability and to serve as concrete examples.
Comment 3 Mark Watson 2012-07-24 20:27:42 UTC
I propose the following:

"The format of the initData for the ISO Base Media File Format is dependent on the Scheme Type indicated in the Protection Scheme Information Box.

When the Scheme Type indicates ISO Common Encryption (ISO/IEC23001-7), "cenc", the initData shall consist of one or more Protection System Specific Header ("pssh") boxes."

Regarding multiple keys. I think this is up to the CDM and its server-side peer: if the license comes back without all the keys and the Media Element actually encounters samples encrypted with a key not in the license then this would cause a new needKey event, with the same initData again.

Regarding ClearKey: to have this work with CENC indeed requires us to define a SystemID and PSSH contents for ClearKey. This should contain the keyids of the required keys. We may wish to say that for the ClearKey/CENC combination the key request message contains the key ids of the requested keys and the key message contains the keys. If the key id and key size is known (for example from the PSSH), then these can both simply be the concatenation of the ids or keys.
Comment 4 johnsim 2012-10-02 14:22:55 UTC
HERE ARE THE PROPOSED CONTAINER GUIDELINES FOR ISOBMFF 
------
7.2 ISOBMFF Media using the “cenc” protection scheme
This section defines the stream format and initialization data for implementations that choose to support Common Encryption (“cenc”) protected ISO Base Media File Format (ISOBMFF) content.

7.2.1 Stream format
Under the “cenc” protection scheme, ISOBMFF content is encrypted at the sample level with AES-128 CTR encryption, according to ISO/IEC 23001-7:2012, “Information technology - MPEG system technologies - Part 7: Common encryption in ISO base media file format files”. This protection method enables multiple Key Systems to decrypt the same media content.

7.2.2 Detecting Encryption
Protection scheme signaling conforms with ISO/IEC 14496-12. When protection has been applied, the stream type will be transformed to ‘encv’ for video or ‘enca’ for audio, with a scheme information box (‘sinf’) added to the sample entry in the sample description box (‘stsd’). The scheme information box (‘sinf’) will contain a scheme type box (‘schm’) with a scheme_type field set to a value of “cenc” (Common Encryption).

With ISOBMFF common encryption, the “encrypted block” is a sample. Determining whether a sample is encrypted depends on the corresponding track encryption box (‘tenc’) and the sample group associated with the sample.
The default encryption state of a sample is defined by the IsEncrypted flag in the associated track encryption box (‘tenc’). This default state may be modified by the IsEncrypted flag in the Sample Group Description Box (‘sgpd’), pointed to by an index in the Sample to Group Box (‘sbgp’).
  
For complete information see ISO/IEC 23001-7:2012. 

7.2.3 Initialization Data and Events
Under the “cenc” protection scheme, ISOBFF media content can be decrypted by multiple Key Systems. The file may contain protection system specific header ('pssh') boxes for one or more Key Systems, each containing a SystemID identifying the Key System. These multiple ‘pssh’ boxes are concatenated into a single Initialization Data and returned with the needkey event. 

In a file encrypted with Common Encryption, each key is identified by a Key ID and each encrypted sample is associated with the Key ID of the key needed to decrypt it. This association is signaled either through the specification of a default Key ID in the Track Encryption Box ('tenc') or by assigning the sample to a Sample Group, the definition of which specifies a Key ID. Common Encryption files may contain a mixture of encrypted and unencrypted samples. Playback of unencrypted samples should not be impeded by unavailability of the keys needed to decrypt other samples in the same file or track.

Note that if there is already an active Key System CDM and the key storage for that Key System already contains the key associated with the Key ID, there is no need to generate a needkey event.
Comment 5 Adrian Bateman [MSFT] 2012-12-11 16:48:35 UTC
Need to add this text while johnsim discusses more with dsinger.
Comment 6 David Dorwin 2012-12-11 16:55:09 UTC
At TPAC 2012, David Singer requested more generic text to cover other encryption types. David and John will talk offline to come up with common text.
Comment 7 Adrian Bateman [MSFT] 2012-12-18 06:35:14 UTC
Added John's initial proposed text while he follows up with David.
Changeset http://dvcs.w3.org/hg/html-media/rev/5d8755a5d2f1
Comment 8 johnsim 2013-01-15 00:44:02 UTC
As per discussion with David Singer on more general treatment of ISOBMFF. 

Instead of ISOBMFF InitData being the concatenation of 'pssh' boxes, it will be Protection Scheme Information Box ('sinf'). The 'sinf' includes the scheme type box ('schm'), giving the scheme_type, and the scheme information box ('schi'). 

If this scheme_type is "CENC", the 'schi' box will also contain the track encryption box ('tenc'), containing the defaults for IsEncrypted, IV_size and KID for that track. 

Also, if the scheme_type is "CENC", one or more 'pssh' boxes will be concatenated after the 'sinf' box. 



-------------------------------------------------------------------
7.2 ISO Base Media File Format
This section defines the stream format and initialization data for ISO Base media File Format (ISOBMFF) content.

7.2.1 Stream format
The stream format is dependent upon the protection scheme, as defined in the scheme type box ('schm'). 

For example, under the common encryption ("cenc") protection scheme, ISOBMFF content is encrypted at the sample level with AES-128 CTR encryption, according to ISO/IEC 23001-7:2012, "Information technology - MPEG system technologies - Part 7: Common encryption in ISO base media file format files". This protection method enables multiple Key Systems to decrypt the same media content.

7.2.2 Detecting Encryption
Protection scheme signaling conforms with ISO/IEC 14496-12. When protection has been applied, the stream type will be transformed to ‘encv’ for video or ‘enca’ for audio, with a scheme information box (‘sinf’) added to the sample entry in the sample description box (‘stsd’). The scheme information box (‘sinf’) will contain a scheme type box (‘schm’) with a scheme_type field set to the 4CC value of the protection scheme.

Additionally, if the protection scheme is common encryption ("cenc"), the "encrypted block" is a sample. Determining whether a sample is encrypted depends on the corresponding track encryption box (‘tenc’) and the sample group associated with the sample. In this case the default encryption state of a sample is defined by the IsEncrypted flag in the associated track encryption box (‘tenc’). This default state may be modified by the IsEncrypted flag in the Sample Group Description Box (‘sgpd’), pointed to by an index in the Sample to Group Box (‘sbgp’). 

For complete information about "cenc" see ISO/IEC 23001-7:2012.

7.2.3 Initialization Data and Events

For ISOBMFF the InitData begins with a the protection scheme information box ('sinf'). The 'sinf' includes the scheme type box ('schm'), giving the scheme_type, and the scheme information box ('schi'). 

If this scheme_type is common encryption ("cenc"), the scheme information box will also contain the track encryption box ('tenc'), giving the defaults for IsEncrypted, IV_size and KID for that track. In addition, one or more protection system specific heder boxes ('pssh') will be concatenated after the 'sinf' box.

In a file encrypted with common encryption, each key is identified by a Key ID and each encrypted sample is associated with the Key ID of the key needed to decrypt it. This association is signaled either through the specification of a default Key ID in the track encryption box ('tenc') or by assigning the sample to a Sample Group, the definition of which specifies a Key ID. Common encryption files may contain a mixture of encrypted and unencrypted samples. Playback of unencrypted samples should not be impeded by unavailability of the keys needed to decrypt other samples in the same file or track.

Note that if there is already an active Key System CDM and the key storage for that Key System already contains the key associated with the Key ID, there is no need to generate a needkey event.
Comment 9 Adrian Bateman [MSFT] 2013-01-15 00:51:48 UTC
Updated in changeset https://dvcs.w3.org/hg/html-media/rev/122330340bf1.
Comment 10 Steven Robertson 2013-01-16 02:43:59 UTC
Straw-man comment, but one that could be relevant for DASH (or other BMFF) live streams:

The final text does not seem to address the possibility of multiple 'sinf' boxes, which may arise through either or both of multiple tracks and multiple SampleDescription elements for a single track.

It also seems to preclude key rotation in e.g. a long-running DASH live stream by means of including a 'pssh' and corresponding updated SampleGroupDescription element in one or more 'moof' elements. If text addressing this is incorporated in the EME spec, we should consider updating references of '14496-12' to '14496-12:2012', since 'sgpd'-under-'moof' was new in the :2012 version.
Comment 11 Steven Robertson 2013-01-27 21:58:39 UTC
As discussed, here is the proposed modified text for 7.2.3 Initialization Data and Events. This text is intended to replace the first two paragraphs of that section, and leave the remaining paragraphs intact.

For simplicity, it includes an explicit callout against 64-bit or "until-EOF" size fields in initData boxes; the former is exceedingly unlikely, and the latter would lead to undefined behavior. This change basically just removes ambiguity, as does the language specifying how to go from 'moov' to 'initData'.

A more important change is the one indicating that the value of 'block initData' in 5.1 is allowed to be defined by the CDM once it has been selected, but the format must be the same as for normal start-of-stream 'initData'. This enables CDMs to take advantage of the full, maddening open-endedness of CENC should they so choose, without requiring any extra effort on the part of authors, since all 'needkey' events will be handled in the same way. It also solves a few particular issues around adaptive streaming and key rotation in live streams that we've encountered.

--- Proposed text to replace the first two paragraphs of 7.2.3: ---

For ISO BMFF, the Initialization Data is in the form of concatenated BMFF boxes. Such boxes (and any descendants) _must_ be correct according to the specification which defines their structure in the stream, including the box header defined in ISO/IEC 14496-12 section 4.2. Additionally, boxes with a 'size' value of 0 or 1 are forbidden.

The initData _shall_ begin with zero or more protection scheme information boxes ('sinf'). Each 'sinf' includes zero or one scheme type box ('schm'), which defines 'scheme_type', and zero or one scheme information box ('schi').

If a 'sinf' box contains a 'schm' box with a 'scheme_type' value of "cenc", the 'sinf' box will also contain one track encryption box ('tenc'), which provides default values for 'IsEncrypted', 'IV_size', and 'KID' for that track.

Following the 'sinf' boxes, the initData _shall_ contain zero or more protection-scheme-specific information boxes ('pssh'). If any 'sinf' box indicates a 'scheme_type' of "cenc", at least one 'pssh' box _should_ be present.

For the purposes of step 3 of section 5.2 of this specification, when no CDM is selected, the User Agent _shall_ derive 'initData' from the 'moov' box of an ISO BMFF stream as follows: for each 'moov' encountered, it _shall_ include, in its entirety, every 'sinf' box that is an (indirect) descendant of a 'moov', followed by every 'pssh' box that is a direct child of that 'moov'.

For the purposes of step 4 of section 5.1 of this specification, when no CDM is selected, the User Agent _shall not_ consider a block to have 'block initData'; that value will remain null.

When a CDM has been selected, it _may_ provide its own mechanism for providing a value for 'initData' in 5.2 step 3 and for 'block initData' in 5.1 step 4. Such initData _must_ conform to the format described above ('sinf' boxes, then 'pssh' boxes).

---
Comment 12 David Dorwin 2013-04-30 01:37:39 UTC
Per https://www.w3.org/Bugs/Public/show_bug.cgi?id=19788#c13, section 7.2 also needs to be updated to specify when to call the “First Time a Key Reference is Encountered” algorithm for ISO BMFF.
Comment 13 johnsim 2013-05-08 17:58:31 UTC
I am fine with these changes
Comment 14 David Dorwin 2013-07-25 18:47:06 UTC
In no particular order, below are some of the issues with the current proposal that we need to accept and/or address.

---------

Is there a subset of CENC possibilities that would make more sense and significantly reduce complexity at all levels while still satisfying most/all actual use cases?

If we did limit the possibilities (related to initData), applications could still synthesize initData to do what they need to do.

---------

It appears that trying to support any scheme type (not just CENC) has led us to include the entire |sinf| box along with |pssh| boxes. This may be adding complexity and overhead that are otherwise unnecessary.

I don't understand the details, but it sounds like using |sinf| might also be incompatible with SampleGroups. This means that supporting other scheme types limits our ability to support features of specific scheme types.

Can we just say treat "cenc" files this way, treat "foob" files that way, etc. and not rely on an overlap/common root? Since the container indicates the scheme type, the UA should be able to provide the correct behavior for a given stream. I guess the problem would be with createSession() where the scheme type is not specified in |initData| or |type|. What other ISO BMFF scheme types exist, and is it likely that a UA or key system would support multiple of them? (Can multiple scheme types be supported in the same file? I'm not sure that would be compatible with any of the current proposals.)

If we returned to just reporting |pssh| boxes, we could always fire a needkey event with the |pssh| and rely on the de-duplication of sessions currently under discussion to reduce network traffic. Depending on the stream layout, that might still result in a lot of events and sessions. (Maybe this is expected to be unlikely in practice.)

Alternatively, we could focus on the |sinf| box(es) and not send |pssh| boxes. This would be much closer to the behavior for WebM, which is to only send generic key IDs in needkey events. If services were able to only use the information in the |sinf| box, content providers could support any key system without needing to add a new PSSH to all existing files. This could also be done in the |sinf| + |pssh| case, but including |pssh| boxes adds a lot of (potentially) unnecessary complexity. This solution is still incompatible with SampleGroups, though.

Going even further, we could pare down needkey/initData to just emit key IDs (the first time they are encountered). This would be super simple, consistent with WebM, and support any key system regardless of whether the stream has a PSSH.

---------

The |tenc| box is sent (as part of the |sinf| box) and described, but most of that information is irrelevant to the license server.

The only part of the |tenc| that might be useful is the default |KID|, which could be used for protection systems (aka key systems) that are not explicitly supported in the media stream. Perhaps most importantly, this includes Clear Key. (If we don't come up with a generic solution, we'll need to define a system ID and PSSH format for Clear Key and (in some scenarios) it would need to be added to media streams.)

Even so, this is only one of the key IDs that might be needed for the stream. Thus, a server would need to know all the KIDs associated with this default KID. This is already necessary for PSSH formats that only include one key ID, which I think might be common.

---------

The current proposal says, "for each 'moov' encountered, it _shall_ include, in its entirety, every 'sinf' box that is an (indirect) descendant of a 'moov', followed by every 'pssh' box that is a direct child of that 'moov'." This sounds like a lot of parsing in the UA (or communication of parsing that is already occurring to the EME implementation) and potentially a lot of |pssh| boxes. (Can people with experience with a diverse set of CENC files comment on the likelihood of this?

Also, the |pssh| boxes can appear separate from the |sinf| box(es) throughout the file (in |moof| boxes?). Does this mean we might have to search the entire file? Is this really what we want?

Note that the current proposal excludes by omission |pssh| boxes in |moof| boxes. Such |pssh| boxes are only useful with SampleGroups, which aren't supported as mentioned above, anyway. If we simplify to either only emitting key IDs or only emitting all |pssh| boxes throughout the file, then |moof| becomes relevant again.

---------

Specific comments on the text in the current proposal (not current spec draft):
 * The "when no CDM is selected" text is not necessary (in the object-oriented version of the API). CDM selection is irrelevant to needkey events.
 * The "block initData" text can be removed since that no longer exists in the algorithms.
 * Steve's text split definition of format and definition of derivation, but (as discussed with him) that doesn't seem to be necessary anymore, and removing this separation would make things simpler.
 * I wonder if there is anything generic we can say that would cover Initialization Data for all BMFF protection schemes. Regardless, I think we should put the CENC portions in section(s) that are explicitly CENC-only or otherwise clearly identify CENC-specific text.
Comment 15 Henri Sivonen 2013-08-13 13:20:05 UTC
(In reply to comment #14)
> It appears that trying to support any scheme type (not just CENC) has led us
> to include the entire |sinf| box along with |pssh| boxes. This may be adding
> complexity and overhead that are otherwise unnecessary.

What's the use case for non-CENC schemes? Since the interoperability story of EME hinges on CENC, supporting non-CENC schemes looks like an avenue for vendor lock-in.

> If we returned to just reporting |pssh| boxes, we could always fire a
> needkey event with the |pssh|

What's the use case for exposing even pssh? It, too, looks like an avenue for defeating the interop story of EME by introducing vendor lock-in.
Comment 16 Mark Watson 2013-08-14 17:02:14 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > It appears that trying to support any scheme type (not just CENC) has led us
> > to include the entire |sinf| box along with |pssh| boxes. This may be adding
> > complexity and overhead that are otherwise unnecessary.
> 
> What's the use case for non-CENC schemes? Since the interoperability story
> of EME hinges on CENC, supporting non-CENC schemes looks like an avenue for
> vendor lock-in.

I would not be opposed to including only CENC schemes.

> 
> > If we returned to just reporting |pssh| boxes, we could always fire a
> > needkey event with the |pssh|
> 
> What's the use case for exposing even pssh? It, too, looks like an avenue
> for defeating the interop story of EME by introducing vendor lock-in.

Common encryption arose originally from work by Microsoft in DECE in which context several DRM vendors agreed to a common encryption format. This was then brought, with minor adjustments, to MPEG to define CENC, but there was no discussion of a common header format.

A common header format would clearly be desirable, but its absence is not the same problem that a lack of a common encryption format would be. The headers are small and it is not difficult to include headers for multiple keysystems. Since the creator of the file needs to support the keysystems on their servers anyway, its not a huge deal to add the additional headers to the files [It is something of a pain, though, since new files must be created and distributed.].

As I understand it, the various DRMs include different things in their headers, not just the key id in different format. I'm told their choices of what to include are tightly tied to their design and specifically their use or not of certain IPR.

So, the prospects of a common header format look weak.
Comment 17 Henri Sivonen 2013-08-15 08:48:32 UTC
(In reply to comment #16)
> As I understand it, the various DRMs include different things in their
> headers, not just the key id in different format. I'm told their choices of
> what to include are tightly tied to their design and specifically their use
> or not of certain IPR.

EME+PlayReady and EME+Widevine are already out there. What do they *need* to include in pssh? Looking at the YouTube EME demo files that have pssh boxes for PlayReady and another key system (maybe Widevine, but I'm not sure), all the data in the pssh boxes seems redundant to me: http://lists.w3.org/Archives/Public/public-html-media/2013May/0025.html

(Redundant assuming that the CENC layer knows about key ids, CENC fixes the key length & the algorithm and the Web app knows its own license server URLs.)
Comment 18 johnsim 2013-08-15 20:33:17 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > As I understand it, the various DRMs include different things in their
> > headers, not just the key id in different format. I'm told their choices of
> > what to include are tightly tied to their design and specifically their use
> > or not of certain IPR.
> 
> EME+PlayReady and EME+Widevine are already out there. What do they *need* to
> include in pssh? Looking at the YouTube EME demo files that have pssh boxes
> for PlayReady and another key system (maybe Widevine, but I'm not sure), all
> the data in the pssh boxes seems redundant to me:
> http://lists.w3.org/Archives/Public/public-html-media/2013May/0025.html
> 
> (Redundant assuming that the CENC layer knows about key ids, CENC fixes the
> key length & the algorithm and the Web app knows its own license server
> URLs.)

The contents of the protection system specific header ('pssh') box are by design protection system specific. It is true that some of the parameters are the same between DRMs, but when we envisioned DRM-interoperability, we determined that standardizing metadata specific to key acquisition was a lot of negotiations for very little gain, so we focused on DRM-interoperable encoding. I would argue that the adoption of CENC shows we struck the right balance.

In the case of PlayReady, the information you ask about is publicly documented at this website: http://www.microsoft.com/playready/documents, In the document "PlayReady Header Object". 

This is the same information which can be conveyed in a DASH MPD PlayReady ContentProtection element. Arguably it is better to encode this information in the MPD so that the media itself is service provider agnostic - and therefore more interoperable. 

The rules for using this ContentProtection element with PlayReady can be found at the same website, in the document "DASH Content Protection using Microsoft PlayReady".
Comment 19 Henri Sivonen 2013-08-16 08:58:58 UTC
(In reply to comment #18)
> In the case of PlayReady, the information you ask about is publicly
> documented at this website: http://www.microsoft.com/playready/documents, In
> the document "PlayReady Header Object". 

Sorry for being dense, but I still don't see how any of that information is needed in the file itself when 1) the key length and the encryption algorithm are fixed by CENC (128 bits and AES-CTR) and 2) the JS program known in the URL of its license server.
Comment 20 Steven Robertson 2013-08-17 04:01:19 UTC
A device manufacturer can put a newer browser on a device with an older implementation of a protection system, and that older protection system might require something beyond key ID and URL. For instance, older versions of PlayReady require a checksum of the content key to be present in the initialization data in order to generate a challenge.

It may be argued that every protection scheme going forward should restrict itself to not requiring any kind of extra initialization data. Such an argument, if followed, would exclude many devices, leaving only new, high-end devices with platform support for newer CDM implementations. Due to long CE manufacturing timelines, devices with older CDMs will continue to be produced for some time, and so such a position could leave many users out in the cold.
Comment 21 David Dorwin 2013-08-20 01:11:08 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > If we returned to just reporting |pssh| boxes, we could always fire a
> > needkey event with the |pssh|
> 
> What's the use case for exposing even pssh? It, too, looks like an avenue
> for defeating the interop story of EME by introducing vendor lock-in.

If we decide to surface PSSHes, we could improve interop while also solving the Clear Key issue by defining a default/EME system ID and PSSH format. We could recommend including it in all files, and Clear Key as well as any other key system could use it. Even key systems that support their own system ID could support it, especially if that ID is not present.

This would allow legacy devices to be supported while moving towards a more interoperable future.

(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > It appears that trying to support any scheme type (not just CENC) has led us
> > > to include the entire |sinf| box along with |pssh| boxes. This may be adding
> > > complexity and overhead that are otherwise unnecessary.
> > 
> > What's the use case for non-CENC schemes? Since the interoperability story
> > of EME hinges on CENC, supporting non-CENC schemes looks like an avenue for
> > vendor lock-in.
> 
> I would not be opposed to including only CENC schemes.

I think we should at least start here and try to solve the CENC case, which is already very complex.
Comment 22 David Dorwin 2013-08-20 20:23:56 UTC
In the telecon today, there was support for focusing on CENC but a desire to not exclude the possibility of other schemes.

One possibility is to state that the first part of Initialization Data for BMFF must be a box. Assuming we use PSSHes for CENC, CENC Initialization Data could be identified from the first first 8 bytes: |size=4|"pssh"|. As long as no other protection scheme uses 'pssh', there would be no ambiguity.

It was also noted that defining a common CENC system ID and PSSH format is a separate issue. We can file a separate bug for that once we've decided how to resolve this bug.
Comment 23 Mark Watson 2013-08-20 20:35:58 UTC
If in our "ISO BMFF with Common Encryption" we simply say the initData is a sequence of PSSH boxes then that is sufficient, IMO.

The problem of how to add other schemes doesn't need to be addressed until sections for other schemes are proposed. We know that there are several ways another scheme could be added in a backwards compatible fashion:
- put some different boxes in the initData
- put something which is not a box in the initData, provided bytes 4-7 don't read p.s.s.h.
- use PSSH boxes but define separate system ids
Comment 24 Henri Sivonen 2013-08-23 11:22:01 UTC
(In reply to comment #20)
> A device manufacturer can put a newer browser on a device with an older
> implementation of a protection system, and that older protection system
> might require something beyond key ID and URL. For instance, older versions
> of PlayReady require a checksum of the content key to be present in the
> initialization data in order to generate a challenge.

Are there legacy DRM implementations out there whose key exchange messaging maps to EME other than PlayReady? That is, are we talking about something that's truly a broader issue or something that's PlayReady-specific in practice? For example, on the mailing list, it was established that existing versions of Marlin don't match the EME architecture, so it seems that  in the Marlin case, old implementations out there couldn't be leveraged anyway.

Even in the PlayReady case, it seems that in order  for the scenario you mentioned to actually matter all the following would need to be true:
 * The old DRM system has API surface for emitting key requests and accepting responses in discrete messages that map to the EME API.
 * Someone who cares can  and will actually deliver a newer browser for the device.
 * The DRM part of the device doesn't support enough software renewability to remove the pssh  requirement from the DRM initialization as a software update  delivered together with the newer browser.
 * The old DRM system  has suitable  output capabilities for integrating into the composition pipeline of the newer browser.

Especially considering how CE  vendors tend to treat they products as ship-and-forget  software-wise,  it seems rather incredible that  situation where all the above points are true could arise at all let alone be common.

Are there concrete cases where the above points could be true and where someone is actually seriously contemplating shipping in your browser for a device that has an older DRM system that can be made to work with EME on the device, but the DRM system can't be renewed to waive the pssh  requirement?
Comment 25 Henri Sivonen 2013-08-23 11:32:58 UTC
(In reply to comment #22)
> In the telecon today, there was support for focusing on CENC but a desire to
> not exclude the possibility of other schemes.

What's the rationale for not excluding other schemes? What other multi-vendor schemes are there? Why shouldn't a standard exclude single-vendor schemes? After all, multi-vendor  interoperability is the whole point of standards.
Comment 26 Steven Robertson 2013-08-24 01:47:38 UTC
(In reply to comment #24)
> (In reply to comment #20)
> > A device manufacturer can put a newer browser on a device with an older
> > implementation of a protection system, and that older protection system
> > might require something beyond key ID and URL. For instance, older versions
> > of PlayReady require a checksum of the content key to be present in the
> > initialization data in order to generate a challenge.
> 
> Are there legacy DRM implementations out there whose key exchange messaging
> maps to EME other than PlayReady?

I'm uncertain, I've only had detailed interaction with Widevine and PlayReady CDMs.

> Even in the PlayReady case, it seems that in order  for the scenario you
> mentioned to actually matter all the following would need to be true:
>  * The old DRM system has API surface for emitting key requests and
> accepting responses in discrete messages that map to the EME API.
>  * Someone who cares can  and will actually deliver a newer browser for the
> device.
>  * The DRM part of the device doesn't support enough software renewability
> to remove the pssh  requirement from the DRM initialization as a software
> update  delivered together with the newer browser.
>  * The old DRM system  has suitable  output capabilities for integrating
> into the composition pipeline of the newer browser.

Apologies for being vague, but check the shelves of your local electronics store and you will find many devices which implement (older drafts of) the EME spec in the manner you describe here.
Comment 27 Joe Steele 2013-09-09 21:31:09 UTC
(In reply to Mark Watson from comment #23)
> If in our "ISO BMFF with Common Encryption" we simply say the initData is a
> sequence of PSSH boxes then that is sufficient, IMO.

I agree with Mark. This should be sufficient for any CENC compatible CDM and additional variations can wait until a future version of EME if ever.
Comment 28 David Singer 2013-09-10 22:04:22 UTC
The problem is that encryption support in the ISO base media file format was built around using the SINF as *the* parameterization place for the encryption.  That's what OMA DRM uses, ISMACryp, Fairplay, and so on.

pssh boxes are outside the sinf.  they are associated with the transport, which is a layering problem (if you re-fragment for different fragment sizes, and so on).  Far from cenc being the normal design and common case, it was cenc that stepped outside the design guidelines.

That's why we say 'in addition to the sinf, you need the pssh boxes for the cenc special case'.

pssh isn't even specified in the iso base media file format spec. it's only in the one encryption spec (common encryption) that has this problem.
Comment 29 Henri Sivonen 2013-09-11 11:10:17 UTC
(In reply to David Singer from comment #28)
> That's what OMA DRM uses, ISMACryp,

Are these relevant to EME? Is anyone realistically going to deploy these with EME?

> Fairplay, and so on.

Is Fairplay going to become a DRM scheme that service operators other than Apple can deploy on the server side? In other words, will it be usable in a way that makes it useful to have aspects related to its behavior be part of a standard?

> pssh isn't even specified in the iso base media file format spec. it's only
> in the one encryption spec (common encryption) that has this problem.

That's an interesting way of putting it. Are there any known EME implementations other than (presumably) Apple's that aren't CENC-based in the MP4 case?
Comment 30 johnsim 2013-10-01 14:45:07 UTC
It has been proposed that section 7.2 - as originally proposed  (https://www.w3.org/Bugs/Public/show_bug.cgi?id=17673#c4) - 
- only specify initialization data for ISOBMFF Media using the “cenc” protection scheme. 

This does not address the issue first brought up by David Singer at TPAC 2012 - that other protection schemes are possible and should be supported. The proposal at that time was to go beyond concatenated 'pssh' boxes to include the protection scheme information box 'sinf', which includes the scheme type box 'schm' and the scheme information box 'schi'. Doing this, however, has led to numerous problems, as documented in this bug - including complexity.

Two proposals: 1) retain the more general formulation in the specification to allow flexibility for any future protection schemes. 2) as suggested by David Dorwin, https://www.w3.org/Bugs/Public/show_bug.cgi?id=17673#c14 - avoid trying to address the general case and include specific instances for each protection scheme proposed.

I prefer the former, since I am aware of no protection schemes other than 'cenc' that are being proposed to be used with EME, at least in our conference calls. Providing a general solution as well as the "cenc" specific solution would address all known issues.

In either case, for "cenc", there would be the simple, concatenated 'pssh' box formulation of InitData, and a separate ISOBMFF section for the general case.

I would recommend removing any reference to the 'tenc' box in the 'cenc' section, since as David points out - this information is irrelevant to the license server - and by design the contents of the 'pssh' box are intended to convey all information needed for key acquisition.

------------

This does, I believe, address the key rotation issues raised by Steven Robertson https://www.w3.org/Bugs/Public/show_bug.cgi?id=17673#c10, because key rotation in practice for "cenc" is conveyed through 'pssh' boxes embedded in the track fragments.
Comment 31 David Singer 2013-11-12 01:13:51 UTC
I think that EME can and should be independent of the encryption system used.  Part 12 is very clear: the parameters for the encryption system are all in the scheme information atom (box).  That's the minimum.

Common encryption also added the pssh box, and I have a harder time with that; it's specific to that basis (as 'common' as it is, it's not universal, and it's tied to the transport - fragment - structure).
Comment 32 David Singer 2013-11-14 06:51:29 UTC
Suggestion: given a movie atom, give me the sinf boxes from each track (possibly optimized to say if it has a non-empty scheme information box), plus the pssh boxes (if any); for a movie fragment, give me pssh boxes (if any).
Comment 33 Mark Watson 2013-11-14 07:14:21 UTC
Returning to some of the problems raised above related to sample groups and related structures in the Common Encryption case, the assumption with Common Encryption is that the pssh boxes contain sufficient information for the CDM to identify and obtain the keys needed to decrypt the file.

The other structures (Track Encryption Box, Sample Group Descriptions etc.), whether in moov or moof boxes are used during stream parsing to identify the key needed for each sample, but are not needed in the InitData to drive the acquisition of the keys.

A solution which ensures that all the pssh boxes encountered in the file make it into initData structures - as they are encountered - should work for common encryption.
Comment 34 David Dorwin 2013-12-07 01:22:08 UTC
(In reply to David Dorwin from comment #21)
> If we decide to surface PSSHes, we could improve interop while also solving
> the Clear Key issue by defining a default/EME system ID and PSSH format. We
> could recommend including it in all files, and Clear Key as well as any
> other key system could use it. Even key systems that support their own
> system ID could support it, especially if that ID is not present.
> 
> This would allow legacy devices to be supported while moving towards a more
> interoperable future.

I filed bug 24027 to track a solution for Clear Key and/or a generic CENC solution any key system can choose to use.
Comment 35 Adrian Bateman [MSFT] 2013-12-12 22:33:37 UTC
This was discussed at TPAC in the F2F:
http://www.w3.org/2013/11/14-html-wg-minutes.html#item09

The conclusion of the discussion was that the initdata for ISO BMFF should contain the SINF box if it has non-empty schema information and any PSSH boxes.

Adrian will work with David Singer to determine the precise language that needs to be added to the spec.
Comment 36 David Dorwin 2014-01-18 01:48:53 UTC
As noted in bug 24323, we also need to remove the last paragraph of this section: "Note that if there is already an active Key System CDM and the key storage for that Key System already contains the key associated with the Key ID, there is no need to generate a needkey event." We may want to add an explicit sentence about when needkey will be fired similar to the proposed text for WebM in that bug.
Comment 37 David Dorwin 2014-01-18 07:28:41 UTC
In http://lists.w3.org/Archives/Public/public-html-media/2014Jan/0037.html, I suggested that we might use a simple identifier that is *not* a MIME type (e.g. "video/mp4") to identify the format of the initData parameter and attribute.

*If* we used an identifier that is *not* a MIME type, we could specifically identify the protection scheme in ISO BMFF (i.e. "cenc") and avoid the ambiguity or generic solution that has been discussed in this bug.

If we did this, we may need some way to detect what format(s) are supported since isTypeSupported() currently takes a MIME type. However, we already have this issues with ISO BMFF - applications can query whether "video/mp4" is supported, but that does not indicate whether "cenc" or some other protection scheme is supported.

Even if we move forward with the generic solution of including the "sinf" box, I don't think applications will be able to detect whether a user agent supports a specific protection scheme.
Comment 38 Adrian Bateman [MSFT] 2014-01-28 06:38:07 UTC
(In reply to David Dorwin from comment #37)
> In http://lists.w3.org/Archives/Public/public-html-media/2014Jan/0037.html,
> I suggested that we might use a simple identifier that is *not* a MIME type
> (e.g. "video/mp4") to identify the format of the initData parameter and
> attribute.
> 
> *If* we used an identifier that is *not* a MIME type, we could specifically
> identify the protection scheme in ISO BMFF (i.e. "cenc") and avoid the
> ambiguity or generic solution that has been discussed in this bug.

We've discussed this at Microsoft and we agree that it would be beneficial to identify the format of the initdata with a string other than the simple MIME type and doing so would simplify both the spec and implementations by allowing for "CENC" to be supported without having to handle generic BMFF data even if the engine will never support anything but CENC content.

> If we did this, we may need some way to detect what format(s) are supported
> since isTypeSupported() currently takes a MIME type. However, we already
> have this issues with ISO BMFF - applications can query whether "video/mp4"
> is supported, but that does not indicate whether "cenc" or some other
> protection scheme is supported.

Separately we agree that isTypeSupported doesn't sufficiently indicate potential playback support of content. There are a number of arguments to this determination including key system, container, encryption scheme (e.g. CENC), encoding format, and probably others. For example, a media engine may support a particular key system and it may support a particular container but it might not have a binding for the key system to a particular encoding.
Comment 39 Mark Watson 2014-01-28 16:47:30 UTC
This comment compares the idea of comprehensive additional MIME type parameters to the alternative of a registry for "short named" for the container/codec/keysystem/protection scheme.

Additional MIME type parameters would give us something like

video/mp4;codec=avc1.xx.xx;keysystems=<ks2>,<ks2>;protection=cenc

This could be used with isTypeSupported, or even with canPlayType, to determine support for a specific container/codec/keysystem/protection combination and in the needkey to indicate the combinations supported by both file and browser (this implies the browser doing the PSSH <uuid> -> EME keysystem name mapping).

Alternatively, we could have - as pal suggested - a registry which defines short names for the above discussions. e.g.

mp4-pr-avc-cenc = mp4 file, PlayReady, AVC, CENC
wemb-wv-vp8 = WebM files, Widevine, VP8

etc.

In this case also the UA needs to parse the PSSH boxes to identity which keysystems are supported by the file.

The proposals are in some sense equivalent, except that with the former UAs can introduce support for a new combination and this will 'just work' with existing apps. For the latter the apps would need to be updated to support the new registered string.
Comment 40 Pierre Lemieux 2014-01-28 16:53:52 UTC
(In reply to Mark Watson from comment #39)
> This comment compares the idea of comprehensive additional MIME type
> parameters to the alternative of a registry for "short named" for the
> container/codec/keysystem/protection scheme.
> 

Can you indicate whether the proposals were for is initData format identification, isTypeSupported, or both? Aren't these two things somewhat independent: the first is the format of the initData, while the second is the format of the entire container + video + audio + content protection?

> 
> Alternatively, we could have - as pal suggested - a registry which defines
> short names for the above discussions. e.g.
> 
> mp4-pr-avc-cenc = mp4 file, PlayReady, AVC, CENC
> wemb-wv-vp8 = WebM files, Widevine, VP8
> 
> etc.

Note that some format specifications will have already defined such a string, e.g. video/vnd.dece.mp4 for CFF.
Comment 41 David Dorwin 2014-02-18 16:22:23 UTC
There are two questions (as noted by Adrian in the telecon today):
1) Do we have consensus to move to a different type identifier? (See http://lists.w3.org/Archives/Public/public-html-media/2014Jan/0037.html)
2) If so, what identifier should we use (and how does that affect the Initialization Data format in this bug)?
Comment 42 David Singer 2014-02-18 18:23:48 UTC
Two comments in one.

a) we get requests to expose in the MIME type string all sorts of information from the file.  Does it have closed-captions in SEI format in the video stream?  Are fragments limited to a maximum size?  Are certain boxes present or absent?  Does it use 64-bit variants of some features?  Does it require support for some encryption formats?  The 'codecs' string was introduced to answer questions about the codecs specifically, but can't answer all these others.  However, it is possible to write specifications that set these requirements, and they can define a brand that goes in the ftyp/styp boxes, to indicate that the file claims compatibility, and those compatible brands can be listed in the MIME type (the 'profiles' parameter). We could keep loading specific features into the mime string for years, but I think it may be better to write these 'portmanteau' specs that roll up a whole bunch of stuff, and then label the file as conforming to those.  We could even define such a profile and brand in the MSE spec itself.

b) but, as noted in email today, the spec. rather bizarrely expects (requires?) that any ftyp/styp in the initialization segment be ignored, a statement that is currently causing some implementations to strip it, which defeats the purpose of these compatibility claims. 

<http://www.w3.org/mid/45F3418A-EBB1-461E-8580-4B9354E40BEC@apple.com>
Comment 43 Adrian Bateman [MSFT] 2014-02-21 21:09:45 UTC
(In reply to David Singer from comment #42)
> b) but, as noted in email today, the spec. rather bizarrely expects
> (requires?) that any ftyp/styp in the initialization segment be ignored, a
> statement that is currently causing some implementations to strip it, which
> defeats the purpose of these compatibility claims. 
> 
> <http://www.w3.org/mid/45F3418A-EBB1-461E-8580-4B9354E40BEC@apple.com>

This part of the comment seems related only to MSE and not EME?
Comment 44 David Singer 2014-02-21 23:36:34 UTC
answering Adrian -- I can't tell;  is the referenced document 'under' MSE, EME, or both?  It doesn't say

By the way, what is its status and trajectory?  The status section doesn't say (it's on rec track, a note, or what?  and where on the track?)
Comment 45 Adrian Bateman [MSFT] 2014-03-04 15:48:45 UTC
(In reply to David Singer from comment #44)
> answering Adrian -- I can't tell;  is the referenced document 'under' MSE,
> EME, or both?  It doesn't say
> 
> By the way, what is its status and trajectory?  The status section doesn't
> say (it's on rec track, a note, or what?  and where on the track?)

The document says "This specification defines a Media Source Extensions byte stream format specification based on the ISO Base Media File Format." It is provided as information linked from the registry (http://www.w3.org/2013/12/byte-stream-format-registry/).

This bug is about how to provide initData for EME with ISO BMFF files.
Comment 46 David Dorwin 2014-03-05 23:47:53 UTC
The new identifier for Initialization Data format is now tracked in bug 24951.
Comment 47 David Dorwin 2014-04-03 01:10:17 UTC
Bug 24951 is fixed, replacing contentType with initDataType. We should be able to move forward with the specifying the behavior for "cenc". If there is interest, someone could also specify a generic behavior for "bmff" that involves the 'sinf' box, etc.
Comment 48 David Dorwin 2014-04-09 16:48:10 UTC
The text under discussion has been moved to updating https://dvcs.w3.org/hg/html-media/raw-file/default/encrypted-media/cenc-format.html. Although it is supposed to be specific to CENC, it currently has the generic text for BMFF that was developed earlier in this bug.

For starters, should we just revert the CL that changed the original text from CENC-specific to generic BMFF?
https://dvcs.w3.org/hg/html-media/rev/122330340bf1
Comment 49 Shinya Maruyama 2014-04-22 02:45:57 UTC
The last paragraph 'Note that if there is already an active Key System CDM and the key storage for that Key System already contains the key associated with the Key ID, there is no need to generate a needkey event.' is not correct anymore.
At least it should be removed.
Comment 50 David Dorwin 2014-05-02 21:05:56 UTC
(In reply to David Dorwin from comment #48)
> The text under discussion has been moved to updating
> https://dvcs.w3.org/hg/html-media/raw-file/default/encrypted-media/cenc-
> format.html. Although it is supposed to be specific to CENC, it currently
> has the generic text for BMFF that was developed earlier in this bug.
> 
> For starters, should we just revert the CL that changed the original text
> from CENC-specific to generic BMFF?
> https://dvcs.w3.org/hg/html-media/rev/122330340bf1

https://dvcs.w3.org/hg/html-media/rev/25b50dae8e58 restores the text to the CENC-specific text before that change. Please review!

(In reply to Shinya Maruyama from comment #49)
> The last paragraph 'Note that if there is already an active Key System CDM
> and the key storage for that Key System already contains the key associated
> with the Key ID, there is no need to generate a needkey event.' is not
> correct anymore.
> At least it should be removed.

I also removed this paragraph.
Comment 51 David Dorwin 2014-05-06 15:24:02 UTC
I added minor updates based on feedback in https://dvcs.w3.org/hg/html-media/rev/702cf19177e0
Comment 52 David Dorwin 2014-05-06 21:27:15 UTC
Minor text corrections in https://dvcs.w3.org/hg/html-media/rev/bf4b3f85e6ba
Comment 53 David Dorwin 2014-05-07 20:15:08 UTC
Jerry and Mark took an action to review the text.
Comment 54 David Dorwin 2014-05-13 00:27:00 UTC
Addressing a few earlier comments:

(In reply to Steven Robertson from comment #10)
...
> It also seems to preclude key rotation in e.g. a long-running DASH live
> stream by means of including a 'pssh' and corresponding updated
> SampleGroupDescription element in one or more 'moof' elements. If text
> addressing this is incorporated in the EME spec, we should consider updating
> references of '14496-12' to '14496-12:2012', since 'sgpd'-under-'moof' was
> new in the :2012 version.

Added ":2012" in https://dvcs.w3.org/hg/html-media/rev/67e151641669.

(In reply to David Dorwin from comment #12)
> Per https://www.w3.org/Bugs/Public/show_bug.cgi?id=19788#c13, section 7.2
> also needs to be updated to specify when to call the “First Time a Key
> Reference is Encountered” algorithm for ISO BMFF.

Done in https://dvcs.w3.org/hg/html-media/rev/67e151641669.

(In reply to David Dorwin from comment #36)
> As noted in bug 24323, we also need to remove the last paragraph of this
> section: "Note that if there is already an active Key System CDM and the key
> storage for that Key System already contains the key associated with the Key
> ID, there is no need to generate a needkey event." We may want to add an
> explicit sentence about when needkey will be fired similar to the proposed
> text for WebM in that bug.

This text (also mentioned in comment #49) was removed in https://dvcs.w3.org/hg/html-media/rev/25b50dae8e58
Comment 55 Mark Watson 2014-05-13 14:38:19 UTC
Looks good to me.
Comment 56 Jerry Smith 2014-05-13 15:29:21 UTC
I plan to also add two comments to close this bug:

-  Allow ‘pssh’ boxes in movie fragments
-  All keys matching KIDS for a specific Segment should be included in the ‘pssh’ for that Segment

We discussed whether to encourage in-band leaf licenses using pssh contents, and agreed this would best be handled in a separate bug.
Comment 57 Jerry Smith 2014-05-28 17:53:38 UTC
Changes made in https://dvcs.w3.org/hg/html-media/rev/5e4403d01787.

Allowed 'pssh' boxes in movie fragments and added CENC 2 SML form for 'pssh' in manifests.
Comment 58 David Dorwin 2014-05-28 21:05:03 UTC
(In reply to Jerry Smith from comment #57)
> Changes made in https://dvcs.w3.org/hg/html-media/rev/5e4403d01787.

Comments on the changes:

1) "Common Encryption files may contain one or more protection system specific header ('pssh') boxes, each for a different SystemID."
  - This makes it sound like a PSSH box can only appear once per SystemID.
  - I believe you mean that they are unique within a series at each location where a PSSH is necessary.
  - For interoperability and consistency, we may want to note that a PSSH box for all SystemIDs should be present wherever their are any PSSH boxes.

2) The second paragraph of section 3 seems to specifically address embedded keys. Is that right? Is this the *only* use of PSSH boxes in a 'moof'? Embedded keys are not currently covered by the main EME spec text.
2a) "Each ‘moof’/’pssh’ must protect the contained keys with a SystemID specific method." How can a 'moof' protect the keys?
2b) The last sentence explains why one might use sample groups. I believe this is the only such text in the spec and should be removed. (This would be valuable in a "Using EME with Common Encryption" primer, but I don't think it belongs in the spec/registry.

3) The third paragraph seems orthogonal to EME and like a DASH-specific usage of a generic EME capability (see also below). I think it should be removed. (Is it really CENC that specifies this or is it DASH?)

4) "The application may parse out 'pssh' boxes which do not correspond to the selected key system, and may not use the InitData from the file at all and instead use initData from another source (e.g. the XML element described above).
  - The first part is fine, but I'm not sure we need to specify it."
  - The second part is (or should be) covered by the spec and applies to all init data types. I don't think we need to specify this here.
  - As above, I think we should remove the XML reference.

5) Is there a formal reference for CENC 2nd edition? Is it just "ISO/IEC DIS 23001-7 2nd Edition" for now?

6) nit: s/boxes(s)/box(es)/
Comment 59 Joe Steele 2014-06-10 09:47:48 UTC
(In reply to David Dorwin from comment #58)
> (In reply to Jerry Smith from comment #57)
> > Changes made in https://dvcs.w3.org/hg/html-media/rev/5e4403d01787.
> 
> Comments on the changes:
> 
> 1) "Common Encryption files may contain one or more protection system
> specific header ('pssh') boxes, each for a different SystemID."
>   - This makes it sound like a PSSH box can only appear once per SystemID.
>   - I believe you mean that they are unique within a series at each location
> where a PSSH is necessary.
>   - For interoperability and consistency, we may want to note that a PSSH
> box for all SystemIDs should be present wherever their are any PSSH boxes.
> 
> 2) The second paragraph of section 3 seems to specifically address embedded
> keys. Is that right? Is this the *only* use of PSSH boxes in a 'moof'?
> Embedded keys are not currently covered by the main EME spec text.

This is correct, but neither are they excluded by the current text. Given that there is a desire to support existing DRMs, this should be supported as it is required for some DRMs. See my examples below.

> 2a) "Each ‘moof’/’pssh’ must protect the contained keys with a SystemID
> specific method." How can a 'moof' protect the keys?

The DRM referenced via the SystemID is responsible for protecting the keys. The content keys may be protected in various ways. For example - in the Access case the content keys may be encrypted using a domain key, a "root" license keys, a license server key or a common player key. The mechanism used is chosen at publishing time based on scalability and robustness requirements.  

> 2b) The last sentence explains why one might use sample groups. I believe
> this is the only such text in the spec and should be removed. (This would be
> valuable in a "Using EME with Common Encryption" primer, but I don't think
> it belongs in the spec/registry.
> 
> 3) The third paragraph seems orthogonal to EME and like a DASH-specific
> usage of a generic EME capability (see also below). I think it should be
> removed. (Is it really CENC that specifies this or is it DASH?)
> 
> 4) "The application may parse out 'pssh' boxes which do not correspond to
> the selected key system, and may not use the InitData from the file at all
> and instead use initData from another source (e.g. the XML element described
> above).
>   - The first part is fine, but I'm not sure we need to specify it."
>   - The second part is (or should be) covered by the spec and applies to all
> init data types. I don't think we need to specify this here.
>   - As above, I think we should remove the XML reference.
> 
> 5) Is there a formal reference for CENC 2nd edition? Is it just "ISO/IEC DIS
> 23001-7 2nd Edition" for now?

I would be interested in this as well. And if not -- do we need a different way to refer to proposed amendments to the spec? Or should we assume all of those are out of scope?
> 
> 6) nit: s/boxes(s)/box(es)/
Comment 60 Joe Steele 2014-06-10 09:49:49 UTC
(In reply to David Dorwin from comment #58)
> (In reply to Jerry Smith from comment #57)
> > Changes made in https://dvcs.w3.org/hg/html-media/rev/5e4403d01787.
> 
> Comments on the changes:
> 
> 1) "Common Encryption files may contain one or more protection system
> specific header ('pssh') boxes, each for a different SystemID."
>   - This makes it sound like a PSSH box can only appear once per SystemID.
>   - I believe you mean that they are unique within a series at each location
> where a PSSH is necessary.
>   - For interoperability and consistency, we may want to note that a PSSH
> box for all SystemIDs should be present wherever their are any PSSH boxes.
> 
> 2) The second paragraph of section 3 seems to specifically address embedded
> keys. Is that right? Is this the *only* use of PSSH boxes in a 'moof'?
> Embedded keys are not currently covered by the main EME spec text.

This is correct, but neither are they excluded by the current text. Given that there is a desire to support existing DRMs, this should be supported as it is required for some DRMs. See my examples below.

> 2a) "Each ‘moof’/’pssh’ must protect the contained keys with a SystemID
> specific method." How can a 'moof' protect the keys?

The DRM referenced via the SystemID is responsible for protecting the keys. The content keys may be protected in various ways. For example - in the Access case the content keys may be encrypted using a domain key, a "root" license keys, a license server key or a common player key. The mechanism used is chosen at publishing time based on scalability and robustness requirements.  

> 2b) The last sentence explains why one might use sample groups. I believe
> this is the only such text in the spec and should be removed. (This would be
> valuable in a "Using EME with Common Encryption" primer, but I don't think
> it belongs in the spec/registry.
> 
> 3) The third paragraph seems orthogonal to EME and like a DASH-specific
> usage of a generic EME capability (see also below). I think it should be
> removed. (Is it really CENC that specifies this or is it DASH?)
> 
> 4) "The application may parse out 'pssh' boxes which do not correspond to
> the selected key system, and may not use the InitData from the file at all
> and instead use initData from another source (e.g. the XML element described
> above).
>   - The first part is fine, but I'm not sure we need to specify it."
>   - The second part is (or should be) covered by the spec and applies to all
> init data types. I don't think we need to specify this here.
>   - As above, I think we should remove the XML reference.
> 
> 5) Is there a formal reference for CENC 2nd edition? Is it just "ISO/IEC DIS
> 23001-7 2nd Edition" for now?

I would be interested in this as well. And if not -- do we need a different way to refer to proposed amendments to the spec? Or should we assume all of those are out of scope?

> 
> 6) nit: s/boxes(s)/box(es)/
Comment 61 Jerry Smith 2014-06-16 19:26:50 UTC
Regarding multiple SystemIDs in ‘pssh’ box:  That is not specified and there is no discussion of that possibility in CENC 2nd Edition.  There may be multiple ‘pssh’ boxes in a file, but each has only one SystemID to indicate one DRM system.

I added the other instances for 'pssh' (in the moof and in XML) as further reference information on where initData of that type may be encountered.  I do believe the moof example may be for key rotation, but it's not specifically an endorsement of that approach.

I don't fully understand the concern about 4):
> 4) "The application may parse out 'pssh' boxes which do not correspond to
> the selected key system, and may not use the InitData from the file at all
> and instead use initData from another source (e.g. the XML element described
> above).
The second segment is intended also as reference information to facilitate use of CENC content.

CENC 2nd Edition is still under ballot in ISO/MPEG, so not publicly available.  The Draft International Standard (DIS) ballot has been completed and comments resolved in a Study document.  Final and formal resolution of comments will happen at the next MPEG meeting in July, and any edits to the Study document sent for Final Draft International Standard (FDIS) two month ISO approval ballot.
Comment 62 Jerry Smith 2014-07-01 01:02:13 UTC
I felt that at least two of David's comments should be directly addressed via an edit:

-  Paragraph 1 should include each location where a pssh is necessary.
-  Paragraph 4 should not include the section about running the Initialization Data Encountered algorithm, since that is not specific to the CENC format.

I've made changes to address these in https://dvcs.w3.org/hg/html-media/rev/ad22aff73407.
Comment 63 David Dorwin 2014-07-01 19:43:16 UTC
(In reply to Jerry Smith from comment #62)
> I felt that at least two of David's comments should be directly addressed
> via an edit:
> 
> -  Paragraph 1 should include each location where a pssh is necessary.
> -  Paragraph 4 should not include the section about running the
> Initialization Data Encountered algorithm, since that is not specific to the
> CENC format.
> 
> I've made changes to address these in
> https://dvcs.w3.org/hg/html-media/rev/ad22aff73407.

This change actually removed paragraph 5 ("Each time one or more 'pssh' boxes are encountered..."), which I think is appropriate and is consistent with similar text in the WebM format page.

My request was to remove the following sentence from paragraph 4 because it describes application behavior and other things that are out of scope for this text:
"The application may parse out 'pssh' boxes which do not correspond to the selected key system, and may not use the InitData from the file at all and instead use initData from another source (e.g. the XML element described above)."


Returning to my comment #58, I do not think the paragraphs 2, 3, and most of 4 should be part of this specification, especially as normative text. They repeat portions of a referenced spec and/or provide usage information without enhancing interoperability.

In addition, keys stored in Initialization Data (paragraph 2) are not currently supported by the main EME spec - that would be a separate feature request bug. If such a feature is added, we would probably add such text to a new section that specifically defines expectations, how it is handled, etc.
Comment 64 Jerry Smith 2014-07-03 20:29:41 UTC
The complete paragraph 5 text removed was:

"Each time one or more 'pssh' boxes are encountered, the Initialization Data Encountered algorithm shall be invoked with initDataType = "cenc" and initData = the 'pssh' box(es). Multiple 'pssh' boxes must be provided together if and only if they appear directly next to each other in the file."

Most of this is redundant to the first paragraph in EME under the Initialization Data Encountered section, but it does clarify the attribute strings and certainly is consistent with the EME spec.  I will restore it.

I believe paragraphs 2, 3 and 4 are relevant to the use of CENC content with EME, since they elaborate on CENC features that affect when init data may be encountered, and acknowledge that a compatible source with EME is in XML form.  Specific replies to David's comments:

>2) The second paragraph of section 3 seems to specifically address embedded >keys. Is that right? Is this the *only* use of PSSH boxes in a 'moof'? >Embedded keys are not currently covered by the main EME spec text.

Embedded keys are a likely example, but a protection system may use any type of authentication, authorization, and usage rules in the protection system specific information.  The intent is to not be restrictive.  The connection to Key ID (KID) is the only normative link between sample information and entitlement control.  

>2a) "Each ‘moof’/’pssh’ must protect the contained keys with a SystemID >specific method." How can a 'moof' protect the keys?

The Protection System Specific data must use encryption or some other means to protect the media keys.

>2b) The last sentence explains why one might use sample groups. I believe this >is the only such text in the spec and should be removed. (This would be >valuable in a "Using EME with Common Encryption" primer, but I don't think it >belongs in the spec/registry.

It is important to clarify that media keys and KID can change over time (per sample), and KID signaling in CENC provides a standard index that can be used by protection system specific information to control entitlement by controlling access to the new KID/key.  

>3) The third paragraph seems orthogonal to EME and like a DASH-specific usage >of a generic EME capability (see also below). I think it should be removed.(Is >it really CENC that specifies this or is it DASH?)

It is specified in CENC 2nd Edition with the intent of being useful in any XML manifest, and could be applied to other text format tags.  The important distinction is that identical ‘pssh’ box data (including the box header) can be stored and delivered via a manifest or other application specific method, and processed the same way as a ‘pssh’ box stored in a file header.
Comment 65 Jerry Smith 2014-07-03 20:43:14 UTC
Paragraph restored in https://dvcs.w3.org/hg/html-media/rev/da89ce88814a
Comment 66 David Dorwin 2014-07-07 21:01:24 UTC
(In reply to Jerry Smith from comment #64)
Thank you for restoring paragraph 5.

It seems that the rest of the information being discussed should a) be known to someone working with CENC and/or be in a primer, not normative spec text. I don't think it further enhances interoperability of implementations or applications supporting CENC (at least in it's current form). Also, some of it (e.g. init data coming from other sources, KID as an "index") applies to all types of initialization data and is not specific to CENC.
Comment 67 Jerry Smith 2014-07-25 22:41:32 UTC
I agree that much of the content we are debating should be non-normative.  I'd like to retain it for reference though, since I believe it assists in understanding use of CENC initData by apps.  I propose moving this content into a non-normative section and which will retain it for reference, and improve focus on the other normative information.
Comment 68 Jerry Smith 2014-07-28 23:50:40 UTC
Moved non-normative content into Notes sections.

https://dvcs.w3.org/hg/html-media/rev/47e373be9efd
Comment 69 David Dorwin 2014-08-05 20:14:22 UTC
Thanks. I'm still concerned about the "contained keys" text, but it's non-normative and we can clean that up later if necessary.

I'll remove the Issue block and close this bug.
Comment 70 David Dorwin 2014-08-05 21:15:33 UTC
https://dvcs.w3.org/hg/html-media/rev/a62d8cd65da3 removes the Issue box and contains minor editorial fixes.