26776 – Diagnosing and resolving CDM errors needs a numeric systemCode (deleted with MediaKeyError)

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26776 - Diagnosing and resolving CDM errors needs a numeric systemCode (deleted with MediaKeyError)

Summary: Diagnosing and resolving CDM errors needs a numeric systemCode (deleted with ...

Status:	RESOLVED MOVED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	Encrypted Media Extensions (show other bugs)
Version:	unspecified
Hardware:	PC Windows NT

Importance:	P2 normal
Target Milestone:	---
Assignee:	Jerry Smith
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:	API_Compatibility, to_be_implemented
Keywords:

Depends on:
Blocks:

Reported:	2014-09-10 18:43 UTC by Jerry Smith
Modified:	2015-10-30 02:56 UTC (History)
CC List:	12 users (show)

See Also:

Attachments

Description Jerry Smith 2014-09-10 18:43:37 UTC

MediaKeyError was removed from EME to resolve Bug 25896 - "Why is EME creating new DOMException subclasses?"  This reduced error reporting to using standard DOMException types or with a string message, and removed the previous (and valuable) numeric systemCode.  We need to restore some way for numeric error codes to be collected.

Comment 1 Jerry Smith 2014-09-16 00:51:28 UTC

There has been some previous concern raised about leaking keySystem specific information into the EME APIs.  The information isn't requested for interop concerns, but to help identify root causes for playback issues.  DRM implementations would be difficult to diagnose if the CDMs in EME were completely opaque.  KeySystem specific diagnostics have been important historically and should be provided somehow in EME.

Comment 2 Anne 2014-09-16 07:37:15 UTC

a) Why is only Microsoft interested in this? b) Can this be something that just goes to the developer console?

I made a ton of comments against having proprietary exception codes in bug 25896.

Comment 3 Mark Watson 2014-09-16 15:08:45 UTC

(In reply to Anne from comment #2)
> a) Why is only Microsoft interested in this? b) Can this be something that
> just goes to the developer console?

Not just Microsoft. We think this it is essential to have access to the system error code. I believe Adobe also.

In practice rolling out EME, we initially see an unacceptable level of errors which we had not previously seen during testing. Often this is due to platform diversity. It's essential to have the low-level cause reported to our servers for detecting and debugging these.

> 
> I made a ton of comments against having proprietary exception codes in bug
> 25896.

The situation with EME is somewhat special because we are providing an API to what is likely a component with proprietary functionality. The merits of that have been extensively debated on the public-restricted-media list, but given that fact and the practical deployment experience, I think exposing a system code is justified here.

Having said that, it's not clear to me why we could not simply define a recommended convention for the message field in the EME case - for example that it begin with a numeric code followed by whitespace.

Comment 4 Joe Steele 2014-09-16 20:44:05 UTC

(In reply to Anne from comment #2)
> a) Why is only Microsoft interested in this? b) Can this be something that
> just goes to the developer console?

This is something Adobe is very interested in as well. As a practical matter, having exact error codes available to the application makes debugging problems much simpler and faster for developers who end up using our CDM.

(In reply to Mark Watson from comment #3)
> Having said that, it's not clear to me why we could not simply define a
> recommended convention for the message field in the EME case - for example
> that it begin with a numeric code followed by whitespace.

If we think this is important enough to have a convention, it would seem important enough to have a separate numeric field. I think we lost some of the context from bug 25896, specifically this comment [1]. I am just not sure where to put it. 

Again as a practical matter, we can expose the numeric codes that way as a last resort. But it will require us to define a convention for those codes (i.e. base-10, unsigned, range 0-4billion, whitespace termination, etc.) and will require additional parsing code for any client that wants to make use of it. And this will impose a requirement on CDMs that do not expose such a value today to conform to the convention. 

[1] https://www.w3.org/Bugs/Public/show_bug.cgi?id=25896#c10

Comment 5 Mark Watson 2014-09-16 21:32:01 UTC

I do not see any obvious solution that provides an explicit numeric code except subclassing DOMException. Could Anne or David explain in more detail why that is bad ?

Comment 6 Anne 2014-09-17 07:37:26 UTC

(In reply to Mark Watson from comment #5)
> I do not see any obvious solution that provides an explicit numeric code
> except subclassing DOMException. Could Anne or David explain in more detail
> why that is bad ?

1) There's no consensus within the standards community as to whether DOMException in its current form is a good pattern (it does not follow native JavaScript exceptions very well).

2) DOMException is the only non-native JavaScript exception class we have.

3) There's no support for subclassing exception classes (as you can see in IDL, it uses "exception", not "interface").

Comment 7 Mark Watson 2014-09-23 15:06:32 UTC

Ok, so this is to say that the direction is towards Javascript exceptions, which cannot be subclassed. Makes sense.

Given this, it seems that exceptions (DOMException or JS ones) are not a good match for our requirements. Perhaps we should go back to using an object of our own ?

Comment 8 Jan-Ivar Bruaroey [:jib] 2014-10-05 17:14:28 UTC

FWIW, mediaCapture has a similar need. See post [1]. TL;DR: Like your MediaKeyError.systemCode, we have MediaStreamError.constraintName (a string) [2]. Since mediaCapture is near last-call, I suggest we combine forces and ask for a (long or DOMString) secondaryArgument on DOMException at least as a stopgap solution (deferring arguments over future extensions of this type as needed). Thoughts?

[1] http://lists.w3.org/Archives/Public/public-media-capture/2014Oct/0054.html
[2] http://w3c.github.io/mediacapture-main/getusermedia.html#mediastreamerror

Comment 9 Jerry Smith 2014-10-31 15:18:18 UTC

Error options currently implemented in EME or being discussed are:

-	DOMExceptions using standard error names for specific violations of method requirements

-	Key Status events (https://www.w3.org/Bugs/Public/show_bug.cgi?id=26372) with standard types for error states encountered outside of executing a specific EME method.  I believe these types are still proposed:

o	"acquired",
o	"expired", 
o	"notyetvalid",
o	"renewalfailed", 
o	"playbacksexceeded",
o	"authorizationfailed",
o	"outputnotallowed",
o	"downscaling",
o	"released"

Of these two mechanisms, the key status message most closely aligns with the requested delivery of systemCode error details.  The primary goal for systemCode support is to allow CDM specific error states to be reported, tracked and resolved.  Standard classifications are very useful, but so is the ability to isolate CDM specific problems.  I’d like to see two specific changes to key status events to resolve this bug:

1.	Include a classification for other CDM specific errors

2.	Include an optional systemCode to provide more information on the CDM specific error

An alternative to this would be to revisit using DOMExceptions more broadly for CDM specific errors, and encode the numeric codes as strings in the optional message attribute.  We have concluded this could be workable, but I am not sure it is appropriate to fire DOMExceptions for CDM execution errors not associated with a specific API call.

Comment 10 David Dorwin 2014-10-31 21:57:41 UTC

As discussed at TPAC, there are two issues:
1) Should we expose key-system specific values in a way that applications could switch on? (And how can we prevent/discourage that?)
2) If so, how should we expose them? "Errors" related to EME can include HTMLMediaElement decode errors (MEDIA_ERR_DECODE), rejected promises, and unusable keys (bug 26372). If system codes apply to all, how should we report them. One argument is that any EME-related error that might cause playback to stop probably results in a key not being usable. Thus, the solution for the first category might be reported via the last one.

Jerry to drive resolution for #1 then propose something for #2.

Comment 11 Mark Watson 2014-11-03 17:25:37 UTC

(In reply to David Dorwin from comment #10)
> As discussed at TPAC, there are two issues:
> 1) Should we expose key-system specific values in a way that applications
> could switch on? (And how can we prevent/discourage that?)

No, client applications should not switch on key-system specific values - they should be for reporting to the server for offline error analysis only.

Not sure what we can do except say that applications should not switch on these, they might change etc.

> 2) If so, how should we expose them?

And if not, how should we expose them ;-)

 "Errors" related to EME can include
> HTMLMediaElement decode errors (MEDIA_ERR_DECODE), rejected promises, and
> unusable keys (bug 26372). If system codes apply to all, how should we
> report them. One argument is that any EME-related error that might cause
> playback to stop probably results in a key not being usable. Thus, the
> solution for the first category might be reported via the last one.
> 
> Jerry to drive resolution for #1 then propose something for #2.

Comment 12 David Dorwin 2014-12-12 23:44:34 UTC

(In reply to Mark Watson from comment #11)
> (In reply to David Dorwin from comment #10)
> > As discussed at TPAC, there are two issues:
> > 1) Should we expose key-system specific values in a way that applications
> > could switch on? (And how can we prevent/discourage that?)
> 
> No, client applications should not switch on key-system specific values -
> they should be for reporting to the server for offline error analysis only.

If this is really just for offline error analysis, maybe it should be exposed similar to performance metrics and other things we do not expect to affect the behavior of the application.
> 
> Not sure what we can do except say that applications should not switch on
> these, they might change etc.

If we report "metrics" rather than reporting the codes with specific events, it would be difficult, hacky, imprecise, and/or inadvisable to base application logic on it without affecting aggregate data collection.

Comment 13 Mark Watson 2014-12-13 00:42:15 UTC

(In reply to David Dorwin from comment #12)
> (In reply to Mark Watson from comment #11)
> > (In reply to David Dorwin from comment #10)
> > > As discussed at TPAC, there are two issues:
> > > 1) Should we expose key-system specific values in a way that applications
> > > could switch on? (And how can we prevent/discourage that?)
> > 
> > No, client applications should not switch on key-system specific values -
> > they should be for reporting to the server for offline error analysis only.
> 
> If this is really just for offline error analysis, maybe it should be
> exposed similar to performance metrics and other things we do not expect to
> affect the behavior of the application.
> > 
> > Not sure what we can do except say that applications should not switch on
> > these, they might change etc.
> 
> If we report "metrics" rather than reporting the codes with specific events,
> it would be difficult, hacky, imprecise, and/or inadvisable to base
> application logic on it without affecting aggregate data collection.

This is all true, but I think in practice we want to know both the error code and exactly what we were doing at the time. Also it would be important to be able to clearly associate the error with a session for the case of debugging individual customer problems (when looking at logs from an individual customer or if the customer is online with Customer Services).

Comment 14 David Dorwin 2014-12-13 00:54:36 UTC

(In reply to Mark Watson from comment #13)
> (In reply to David Dorwin from comment #12)
> > (In reply to Mark Watson from comment #11)
> > > (In reply to David Dorwin from comment #10)
> > > > As discussed at TPAC, there are two issues:
> > > > 1) Should we expose key-system specific values in a way that applications
> > > > could switch on? (And how can we prevent/discourage that?)
> > > 
> > > No, client applications should not switch on key-system specific values -
> > > they should be for reporting to the server for offline error analysis only.
> > 
> > If this is really just for offline error analysis, maybe it should be
> > exposed similar to performance metrics and other things we do not expect to
> > affect the behavior of the application.
> > > 
> > > Not sure what we can do except say that applications should not switch on
> > > these, they might change etc.
> > 
> > If we report "metrics" rather than reporting the codes with specific events,
> > it would be difficult, hacky, imprecise, and/or inadvisable to base
> > application logic on it without affecting aggregate data collection.
> 
> This is all true, but I think in practice we want to know both the error
> code and exactly what we were doing at the time. Also it would be important
> to be able to clearly associate the error with a session for the case of
> debugging individual customer problems (when looking at logs from an
> individual customer or if the customer is online with Customer Services).

An application could check the reported error(s) in response to other failures. It seems unlikely that there would be more than one at a time.

We could report the codes via MediaKeySession or perhaps along with their session ID via MediaKeys. The latter would allow us to use the same mechanism for reporting playback or other session-independent errors.

Comment 15 Jerry Smith 2014-12-16 15:28:41 UTC

I think it is correct to view system codes as performance metrics for the key system.  They should not trigger a client response, but be returned as telemetry to track and resolve broader issues affecting the user experience.

We've previously talked about returning these in string form with any DOMExceptions we've defined.  That has some limitations:

1.  It's only a partial solution, since key system errors may occur at times not associated with any defined EME use of DOMExceptions.
2.  EME DOMExceptions fire generally based on well defined operating errors and likely aren't a complete solution for obtaining key system metrics.

If we can agree on returning this data under the MediaKeySession, I think that would provide a more complete mechanism, and would satisfy Mark's request for services being able to associate codes with specific session activity.

Comment 16 David Dorwin 2015-01-09 01:04:42 UTC

At the last meeting, Jerry said he would add a proposal (for an attribute) to this bug [1].
[1] http://www.w3.org/2014/12/16-html-media-minutes.html#item07

Comment 17 Chris Pearce 2015-01-13 21:51:24 UTC

(In reply to David Dorwin from comment #16)
> At the last meeting, Jerry said he would add a proposal (for an attribute)
> to this bug [1].
> [1] http://www.w3.org/2014/12/16-html-media-minutes.html#item07

A proposal for an attribute seems sub-optimal, as if there are multiple errors to be reported by the CDM, only the last can be reported, which could result in un-observed error codes being overwritten by subsequent errors.

Why can't we just subclass something other than DOMException with a systemCode attribute? Like Error. How about:

interface MediaKeyError : Error {
  readonly attribute unsigned long systemCode;
  readonly attribute DOMString message;
};

If we can't subclass Error, then how about Event?

Comment 18 Boris Zbarsky 2015-01-14 02:12:14 UTC

That's more a question for Anne than me, I think...

In terms of current IDL syntax, "interface MediaKeyError : Error" doesn't work because Error is not an IDL interface.  But you could presumably hand-write the ES bits for adding an Error subclass if that's a reasonable thing to do.

Comment 19 Domenic Denicola 2015-01-14 02:17:32 UTC

I would just subclass ES's Error, instead of DOMException. You'll need to do the same amount of work either way (manually specifying inheritance that falls outside WebIDL's capabilities), and DOMException doesn't buy you anything besides a probably-confusing code chosen from a restrictive list.

Comment 20 David Dorwin 2015-01-16 21:40:04 UTC

(In reply to Chris Pearce from comment #17)
> (In reply to David Dorwin from comment #16)
> > At the last meeting, Jerry said he would add a proposal (for an attribute)
> > to this bug [1].
> > [1] http://www.w3.org/2014/12/16-html-media-minutes.html#item07
> 
> A proposal for an attribute seems sub-optimal, as if there are multiple
> errors to be reported by the CDM, only the last can be reported, which could
> result in un-observed error codes being overwritten by subsequent errors.
Implementations could ensure that the relevant value is recorded. Applications are likely to check the value after some other event, so it's unlikely that system codes from unrelated events will be overwritten.

If we are concerned about this, we could use a sequence of system codes instead.
> 
> Why can't we just subclass something other than DOMException with a
> systemCode attribute?
Where/how would you use this new object?
See #2 in comment 10 for all the places where a system code _might_ be useful.

> If we can't subclass Error, then how about Event?
One of the concerns is exposing implementation-specific values to the application (see, for example, Anne's comments in bug 25896), especially in a way that might affect application logic. The primary use case is reporting to the server for offline error analysis (comment 11). Thus, it would be better if the design encouraged reporting and discouraged switching. Providing the code in an event would do the opposite (comment 12).

Comment 21 Jerry Smith 2015-02-17 02:50:05 UTC

We believe an event is likely sufficient, something like:

X.XThe KeySystemEvent is used to return Key System specific status messages that provide human readable information on events or errors outside those defined in this specification.

Events are constructed as defined in Constructing events [DOM].
Constructor(DOMString type, KeySystemEventInit EventInit)]
interface KeySystemEvent : event {
  readonly 	attribute DOMString message;
};

Comment 22 Jerry Smith 2015-02-17 02:54:55 UTC

We believe an event is likely sufficient, something like:

X.X  KeySystemEvent

The KeySystemEvent is used to return Key System specific messages that provide human readable information on events or errors outside those defined in this specification.

Events are constructed as defined in Constructing events [DOM].

Constructor(DOMString type, KeySystemEventInit EventInit)]
interface KeySystemEvent : event {
  readonly 	attribute DOMString message;
};

I've drafted this as returning a message since that seems more clearly intended to provide information.

Comment 23 Joe Steele 2015-02-18 03:24:12 UTC

(In reply to Jerry Smith from comment #22)
> We believe an event is likely sufficient, something like:
> 
> X.X  KeySystemEvent
> 
> The KeySystemEvent is used to return Key System specific messages that
> provide human readable information on events or errors outside those defined
> in this specification.
> 
> Events are constructed as defined in Constructing events [DOM].
> 
> Constructor(DOMString type, KeySystemEventInit EventInit)]
> interface KeySystemEvent : event {
>   readonly 	attribute DOMString message;
> };
> 
> I've drafted this as returning a message since that seems more clearly
> intended to provide information.

This would satisfy Adobe's requirements.

Comment 24 Joe Steele 2015-02-18 05:25:25 UTC

(In reply to Joe Steele from comment #23)
> (In reply to Jerry Smith from comment #22)
> > We believe an event is likely sufficient, something like:
> > 
> > X.X  KeySystemEvent
> > 
> > The KeySystemEvent is used to return Key System specific messages that
> > provide human readable information on events or errors outside those defined
> > in this specification.
> > 
> > Events are constructed as defined in Constructing events [DOM].
> > 
> > Constructor(DOMString type, KeySystemEventInit EventInit)]
> > interface KeySystemEvent : event {
> >   readonly 	attribute DOMString message;
> > };
> > 
> > I've drafted this as returning a message since that seems more clearly
> > intended to provide information.
> 
> This would satisfy Adobe's requirements.

It would be good to have clarification on what object this event is fired at though. My assumption for now is that it is fired at the MediaKeySession object. Per the issue Chris raised in bug 27067, it is not clear whether this error would need to be used when a MediaKeySession is not around.

Comment 25 David Dorwin 2015-02-25 22:06:55 UTC

Jerry has an action to provide more information about the use cases he is targeting with this event and which object it should be fired at.

Comments in the meantime:
* It's unclear if the latest proposal solves the logging use case.
 * Many of the use cases for system code logging are related to keystatuses, which already has a separate event.
 * Others result in rejected promises.
  - Note: Chrome is currently appending the system code to the DOMException's message.
* Some of the use cases may be covered by and/or better discussed in bug 27067.
* There are reasons to avoid an event (see comment 20). While a string provides some level of disincentive to make application decisions, it is still possible and less useful than a numeric value.

I believe we agreed not to subclass Error. Among other things, we might need to define our own script bindings.

Comment 26 Jerry Smith 2015-03-17 01:31:42 UTC

It was my intent to target the KeySystemStatus events at MediaKeySession.  Examples of when it might be used are:

-  Keys used to start playback are no longer usable.
-  Pipeline conditions required for playback change after start.

Some of these might be reported with the keystatusevent, though it would need to be modified to return the systemcode value.  I proposed this previously in comment 9 on this bug.  I switched to proposing a keysystemstatus event to provide a broad means for general errors in the key system to be reported to the website.

Comment 27 David Dorwin 2015-03-17 15:31:53 UTC

I think we are in agreement that these examples should be reported via key status. The remaining question is how to report the system codes in a way that discourages applications from depending on them and/or changing behavior based on them. See also comment #20.

One proposed solution is to put text in the spec telling applications that the values could change over time and not to depend on them.

Another possible solution would be to provide a getter that returns a sequence of system codes (or add an attribute that is a sequence of system codes). In this case, the application could check the getter (or attribute) at any time, including on a keystatuseschange event, after a playback error, or before disposing of the session object.

Comment 28 Jerry Smith 2015-03-31 01:11:46 UTC

It seems like it would be sufficient to have a systemCode attribute attached to either MediaKeySession or MediaKeys.  I recommend using MediaKeySession since it has other playback attributes related to the current playback activity (e.g. expiration and MediaKeyStatusMap).

It is true that multiple systemCodes might occur, suggesting a Sequence of values might be most flexible; however, for telemetry uses, a single value seems like it would meet a large majority of all needs.

A numeric code still is our preference, suggesting we add this to MediaKeySession:

  readonly attribute unsigned long systemCode;

I agree that some language could be added that discouraged using this attribute to trigger site behavior changes.  Given that it is not directly supported by an event and is limited to one value at a time, the risk of misusing it seems like it would be very low anyway.

Comment 29 Mark Watson 2015-10-30 02:56:34 UTC

Moved to https://github.com/w3c/encrypted-media/issues/120