Bug 20760 - <video> Expose statistics for tracking playback quality
<video> Expose statistics for tracking playback quality
Status: RESOLVED FIXED
Product: HTML WG
Classification: Unclassified
Component: Media Source Extensions
unspecified
All All
: P2 normal
: ---
Assigned To: Aaron Colwell
HTML WG Bugzilla archive list
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2013-01-24 16:27 UTC by Adrian Bateman [MSFT]
Modified: 2013-04-24 19:12 UTC (History)
12 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Adrian Bateman [MSFT] 2013-01-24 16:27:44 UTC
Problem description: (as mostly described in bug 14970 and the existing proposal already http://wiki.whatwg.org/wiki/Video_Metrics#Proposal)
 
For adaptive video streaming using the video element with the Media source Extensions (MSE) it is important that the User Agent should expose video quality metrics for the applications to make decisions on media stream quality, e.g. bitrate and resolution, etc. This allows for the scenario where the network is capable of transferring higher quality video than the device is capable of playing and the application needs to step down to a lower quality.

Proposed changes: A subset of the existing proposal (http://wiki.whatwg.org/wiki/Video_Metrics#Proposal).  The attributes presentedFrames and playbackJitter should be able to cover the typical scenarios used by the MSE.
Here is the proposed text for the <video> tag.  We'd like to have be added to MSE for now since it’s important for this scenario but in future it might be part of HTML 5.1. 

partial interface HTMLVideoElement : HTMLMediaElement {
  // new attribute
  MediaPlaybackQuality getPlaybackQuality();
};

interface MediaPlaybackQuality {
  readonly attribute double currentvideoframerate;
  readonly attribute Date systemTime;
  readonly attribute unsigned long presentedFrames;
  readonly attribute double playbackJitter;
};

The "currentvideoframerate" attribute represents the current framerate of the active video track.  The UA should provide the frame value based on metadata in the video stream.  The UA can expose the value as NaN when it is not deterministic.  In that case, the app can use the presentedFrames to estimate the actual video frame rate.

The "systemTime" attribute represents the current system time (in UTC) at the point when the statistics were acquired.  This can be compared between calls to precisely determine how playback quality is changing over time.

The "presentedFrames" attribute represents the number of frames of video that have been presented for compositing into the page. The initial value of the presentedFrames attribute should be 0 for this element.  The value of the presentedFrames attribute should be set to 0 after running each instance of media element load algorithm for this element.

Note: The presentedFrames could be used to calculate frames per second.  The presentedFrames could also be used to measure the display quality that the user perceives and can determine the performance of the rendering engine given the performance of the network and decoding pipeline. If, for example, the system receives sufficient data from the network, but the rate of presented frames per second is below 15, we can assume that the user gets a poor presentation because the rendering engine was too slow, the machine is likely overloaded or not capable of rendering this video at this quality.  In this case, the application should probably move to a lower bitrate (resolution or frame rate) resource for the video.

The "playbackJitter" attribute represents the sum of all duration errors for frames intended to be presented to the user, where: 
Ei = Desired duration of frame i spent on the screen (to nearest microsecond) 
Ai = Actual duration frame i spent on the screen (if the frame is never presented to the user, then Ai == 0). 
then: 
playbackJitter = sum(abs(Ei - Ai))
Comment 1 Aaron Colwell (c) 2013-01-29 19:31:47 UTC
As mentioned on the call today, I have a few concerns about adding this to the MSE spec.

- This feels like scope creep. I don't want MSE to start including a variety of media related things that didn't make it into HTML 5.

- Implementing MSE does not require these metrics. Some web applications that use MSE may benefit from them, but MSE is still useful without them. I'd prefer to keep the MSE spec focused on the minimal set of extensions needed to implement the media buffering & splicing functionality.

- It seems like this should be pursued in the broader metrics proposal & discussion that Adrian linked to. Perhaps a metrics extension spec could be created or this could be added to HTML 5.1. In either case it seems like the old discussions should be revived.

- There may be stakeholders that care about these metrics and others, but have no interest in implementing MSE. What is the spec compliance story for them? Why is it better to do this in the MSE/Media TF context instead of the standard HTML-WG one?

I do think that metrics like these are useful. I just don't think the MSE spec is the right home for their definition.
Comment 2 Simon Waller 2013-03-18 10:48:14 UTC
These metrics look like they have been designed to report on a software decoder. Most embedded devices (TVs, STBs) use a hardware based decoder which will be able to decode every frame it is asked to. It may be difficult for some implementations to know how many frames have been decoded since that is being done in this low level hardware and the appropriate hooks to get the information probably don't exist.

For hardware decoders, the jitter is not measurable but would anyway be zero (or near enough not to be a valid quality metric).

Another point to bear in mind is that the frame rate is not always constant. In the UK, the HD broadcasts on DTT dynamically switch between 50i and 25p (this is a valid thing to do and is within the video coding specifications - AVC in this case).

When using hardware decoders, the quality of the video is not determined by the speed of the decoder (which is always fast enough for the video codec profile being used) but the bandwidth on the IP connections, ie it is determined by the quality of the video encoding at the maximum bitrate that the IP connection can support.

BTW: Since this is my first post, a few words of introduction. My background is in STB and digital TV development but I now mainly cover standardisation activities such as DVB, OIPF and HbbTV. I will be joining the joint conf call tomorrow.
Comment 3 Jerry Smith 2013-03-18 21:44:33 UTC
1. The goal of the proposed metrics is to measure the playback quality of the entire media playback stack, including decoding as well as data appending/parsing and rendering process.  Although the video decoder is offloaded to the hardware already for most modern platforms (incl. Windows PC), the video rendering process is still part of the rendering process of a web page by the user agent, which can control the overall composition process either explicitly or implicitly.  So, the user agent can collect the statistics and report accordingly on how many frames are actually being presented as well as how much jitter.  Jitter will have to measured based on the refresh rate of the display device, so it might not be zero when the refresh rate of the display doesn't have perfect match of the actual video frame rate. 

2. We agree frame rate can change dynamically, so based on the proposal, the "currentvideoframerate" property reflects the frame rate at the time the quality metric is queried rather than as a static property on the media element itself.  

3. Based on the MSE design, it is largely the app's responsibility to track the network bandwidth and data download rate.  We agree the video decoder is not a bottleneck for most systems with hardware decoders, however video rendering can be affected by other operations as part of the overall web page rendering process.
Comment 4 Mark Watson 2013-04-09 15:27:04 UTC
My suggestion would be to look for something 'as simple as possible' to address the problem of poor playback quality on low-power devices. MSE newly provides the ability for the page to work around this by switching to a lower complexity stream.

Practically we could just add a single "droppedFrames" attribute to the media element. The script can query this periodically to get a sense of how bad frame dropping is.
Comment 5 Aaron Colwell 2013-04-18 18:49:27 UTC
I too would like to see something a little simpler. How about this?

partial interface HTMLMediaElement {
  attribute readonly MediaPlaybackQuality playbackQuality
}

interface MediaPlaybackQuality {
  readonly attribute unsigned long totalVideoFrames;
  readonly attribute unsigned long droppedVideoFrames;
};

- Each time playbackQuality is fetched a new object is created.
- Put this on the HTMLMediaElement instead so this could potentially be used in the future for audio metrics?
- Perhaps s/Quality/Metrics in the naming above?
- totalVideoFrames represents the number of frames that could have been decoded and displayed. This is the count on the entrance to the decoder (ie. before predecode B-Frame  or P-Frame dropping)
- droppedVideoFrames represents the number of frames dropped either predecode or dropped because they were late coming out of the decoder.

This seems like a very simple first step that should be pretty easy to support and I think would address everyones concerns for at least having something.
Comment 6 Adrian Bateman [MSFT] 2013-04-18 21:15:24 UTC
(In reply to comment #5)
> I too would like to see something a little simpler. How about this?
> 
> partial interface HTMLMediaElement {
>   attribute readonly MediaPlaybackQuality playbackQuality
> }

We made this a method because we don't think the return object should be live. It's tough to use the values if they change while you're using them.

> interface MediaPlaybackQuality {
>   readonly attribute unsigned long totalVideoFrames;
>   readonly attribute unsigned long droppedVideoFrames;
> };

Since we didn't want the return value to be live, we added a timestamp so that you could use that to compare against a previous time.
Comment 7 Aaron Colwell 2013-04-18 22:56:36 UTC
(In reply to comment #6)
> (In reply to comment #5)
> > I too would like to see something a little simpler. How about this?
> > 
> > partial interface HTMLMediaElement {
> >   attribute readonly MediaPlaybackQuality playbackQuality
> > }
> 
> We made this a method because we don't think the return object should be
> live. It's tough to use the values if they change while you're using them.

I agree with that idea. That is why I suggested that a new object be created on each read. The idea is to make it like the buffered attribute that indicates that a new static object is created on each read. 

> 
> > interface MediaPlaybackQuality {
> >   readonly attribute unsigned long totalVideoFrames;
> >   readonly attribute unsigned long droppedVideoFrames;
> > };
> 
> Since we didn't want the return value to be live, we added a timestamp so
> that you could use that to compare against a previous time.

I figured the application could just fetch whatever timestamp it needed right before reading the attribute. This could be a Date or say a requestAnimationFrame() timestamp or something else. I don't think the latency between the two calls would be significant enough to throw off accuracy in any major way.
Comment 8 Adrian Bateman [MSFT] 2013-04-23 20:22:04 UTC
Discussed in the face-to-face meeting.
Next action: Aaron to add this proposal to the spec:

Solution includes total frames, dropped frames, and timestamp. We will not include jitter at this point pending implementation experience.
Comment 9 Aaron Colwell 2013-04-24 19:12:51 UTC
Changes committed.
https://dvcs.w3.org/hg/html-media/rev/237133bfdd57