24382 – <track> missed cues fail to reach the application & show difficulty of identifying which cues triggered the cuechange event

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24382 - <track> missed cues fail to reach the application & show difficulty of identifying which cues triggered the cuechange event

Summary: <track> missed cues fail to reach the application & show difficulty of identi...

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	HTML (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Ian 'Hixie' Hickson
QA Contact:	contributor

URL:
Whiteboard:
Keywords:

Depends on:	24161
Blocks:
	Show dependency tree / graph

Reported:	2014-01-24 11:06 UTC by Silvia Pfeiffer
Modified:	2014-09-15 22:02 UTC (History)
CC List:	13 users (show)

See Also:

Attachments

Description Silvia Pfeiffer 2014-01-24 11:06:40 UTC

The "time marches on" algorithm for media playback will trigger the oncuechange event for cues that are entered or exited. After that it is possible to identify which cues are active among all the cues in a TextTrack by looking at .activeCues and .cues in the TextTrack object.

Short cues (less than 250ms long) may be missed by the "time marches on" algorithm, yes still cause a oncuechange event (since they have been entered and exited). It is, however, not possible from looking at .activeCues and .cues to identify these cues.

Other than registering an onenter and onexit event on every cue, determining those missed cues is hard. Going through the list of all cues and determining which cues have just been passed and are short is more heuristics than a proper approach.

There is more discussion of this in Bug #24161 with a suggestion to introduce a missedCues attribute on the TextTrack interface.

Another option would be to make the cuechange event a CustomEvent with a TextTrackCueList that contains the list of cues that triggered the cuechange event.

Comment 1 Philip Jägenstedt 2014-01-24 16:59:02 UTC

If all that is required is to do something with all cues which are passed in the normal course of playback, even if they're skipped, it seems you can already do this.

If you remember video.currentTime from the previous cuechange event, you can find the cues which should have become active since then and do your thing with them. Listen to the seek event to reset some state to not mess up when the user seeks with the native controls.

Did I misunderstand the scenario?

Comment 2 Ian 'Hixie' Hickson 2014-01-24 23:25:27 UTC

What's the use case here?

Comment 3 Silvia Pfeiffer 2014-01-25 05:41:27 UTC

(In reply to Ian 'Hixie' Hickson from comment #2)
> What's the use case here?

FWICT the idea is to run a function on the data of every cue that has triggered a cuechange event - something like displaying a second copy of the cue text elsewhere on the Webpage, or working with the cue text e.g. for speech synthesis or something similar.

What Philip says is of course possible: you can track the video.currentTime from the previous cuechange event, go through the list of cues in .cues and find those that should have been activated between then and "now".

I'm just thinking that it should be easier to find the cues that have triggered the cuechange event, that it should be part of the data provided to the callback, and that it should not be necessary to keep track of history to determine which cues caused the event.

Comment 4 Ian 'Hixie' Hickson 2014-01-27 20:03:10 UTC

Do you have a more concrete use case?

The reason I ask for a concrete use case is I'm trying to understand what would happen in those use cases in response to the user seeking.

Comment 5 Silvia Pfeiffer 2014-01-27 21:52:29 UTC

David Lewis should be able to provide more information.

The request originated from http://lists.w3.org/Archives/Public/public-inbandtracks/2013Dec/0004.html , which mentions "a notification that a goal has been scored during a live Football match" or "a boundary between two sections of a live programme". These would be delivered as in-band @kind="metadata" tracks via HTTP adaptive streaming (MPD files through MSE in the example case).

FWICT, the important issues are that the cues are small, and arrive successively and in chuncks, which makes it difficult to attach event handlers to the cues since they change all the time. Also, it's important that the cues are not missed and they must trigger some JavaScript function (e.g. update an overlay with the current score, or with the new programme title). HTH.

Comment 6 Philip Jägenstedt 2014-01-28 08:51:45 UTC

That sounds like metadata cues that won't be rendered and where the end time doesn't matter, so why not just make those cues a full second long or something?

Comment 7 David Lewis 2014-01-28 10:29:14 UTC

Comment 5 sums up the use case, but please do see the list post and related bug which have more details. 

We don't expect this type of cue to be directly rendered, it will be metadata for the application so our current work-around is to force cues to use a duration greater than 250ms. However this seems like an arbitrary minimum which doesn't match underlying data sources e.g. DASH events may have duration=0, VTT doesn't seem to specify a minimum. It also doesn't match real world events, for example a goal being scored is an instantaneous event, it may have happened at a certain time but it didn't take 250ms to cross the line. 

Ian, I can see your point that if we update a score overlay with e.g. JSON.parse(cue.text).score when an event came in, seeking past that event would leave an incorrect number displayed. So a seek event might prompt the application to look for the last cue containing the score or make an XHR request for that data.

Comment 8 Ian 'Hixie' Hickson 2014-01-28 20:17:33 UTC

The right way to do this kind of thing is to have a cue that is the length of time that the score should be shown. That way, when you seek into the cue, you get the right score immediately. This makes seeking backwards show the right score also, with minimal effort. Similarly, the current TV programme should just have one cue that covers the entire programme, so that any time you seek you get the right cue.

I wouldn't recommend using short cues for catching notifications like this. It just won't work right.

Also, I don't understand the relevance of the 250ms number above. There's nothing about the "time marches on" algorithm that does anything with cues based on some time cut-off like that. The only thing that's rate-limited in the spec is the "timeupdate" event, but you'd never use that for doing anything with cues, that's just for updating the seek bar or time display.

You might miss cues however long they are. If the CPU is under heavy load, "time marches on" might not run for minutes, and there'd be a whole batch of missed cues. Alternatively, if the CPU is fast, then it will be able to run so fast that no cues are missed, even 1ms-long cues, with them all having their time in "activeCues". Relying on noticing missed cues is a losing proposition.

I've added some prose to this effect in the spec.

Comment 9 contributor 2014-01-28 20:17:42 UTC

Checked in as WHATWG revision r8428.
Check-in comment: Add some best practices notes regarding how to use metadata cues.
http://html5.org/tools/web-apps-tracker?from=8427&to=8428

Comment 10 David Lewis 2014-01-29 09:27:12 UTC

My understanding was that "time marches on" is usually started by timeupdate which in current browsers is run every 250ms. I'm aware the spec suggests this should happen more or less frequently subject to load but it doesn't seem to be the case at the moment.

Thanks for the clarification to the spec, I will pass this info on.

Comment 11 Ian 'Hixie' Hickson 2014-01-29 17:54:57 UTC

"time marches on" happens any time the current position changes, which is to say, continually while the video is playing. The "timeupdate" _event_ is rate-limited within the "time marches on" algorithm.

The "time marches on" algorithm is what changes which cues are visible. If it only ran every 250ms, then the visible cues would have pretty horrible precision. :-)

Comment 12 Ian 'Hixie' Hickson 2014-01-30 20:12:27 UTC

Given the new section, are the proposed APIs still needed, or is it sufficient?

Comment 13 Silvia Pfeiffer 2014-01-30 20:53:51 UTC

(In reply to Ian 'Hixie' Hickson from comment #12)
> Given the new section, are the proposed APIs still needed, or is it
> sufficient?

I guess we can leave it for now. What is suggested in #c1 should work for scripts for now.

Comment 14 Brendan Long 2014-03-24 21:14:58 UTC

I'm not seeing why this bug and 24161 are marked "fixed". Forcing JavaScript to keep track of all previous cues just so we can tell which cues were effected by a cuechange event isn't a solution, it's a workaround.

There are several use-cases for this, and they all boil down to "short cues that we don't want to miss". The best-practices change doesn't help because it assumes a static file, but the problems are mainly an issue with live content (which doesn't work properly in out-of-band WebVTT at the moment, but works fine for in-band, except for all of the static-file assumptions being made in W3C and WHATWG).

Just at an "ease of use" level, I find it baffling that we have an "*change" event that doesn't tell you what changed. Is there any case where someone would listen for this event, and then not immediately want to know which cues changed?

The current solution is even non-ideal in situations where it works: You get a cuechange event, then look in TextTrack.activeCues, then filter out cues that you already dealt with because all of you care about is new cues (unless you're just throwing away whatever you did to this cues and redoing it every time)..

(Of course, if we want to go all-the-way towards ease of use, it would be better to have three events, for cues added, changed, or removed..)

Comment 15 Brendan Long 2014-03-24 21:55:04 UTC

We talked about this a little bit more internally, it sounds like the problem might be that cuechange is designed for cases where you want to keep track of the active cues, but it only sort-of works for live or in-band cues.

Maybe a better solution is to just add one new "cuesadded" event, with an attribute listing cues that have been added to the list of cues (because we've found more in-band cues, or once live out-of-band cues work, because there's new live cues). These very short cues are generally metadata, so all we want is the ability to intercept every cue that gets added and add onenter and onexit events (for displaying ads for example).

Comment 16 Ian 'Hixie' Hickson 2014-03-31 22:11:54 UTC

(In reply to Brendan Long from comment #14)
> There are several use-cases for this, and they all boil down to "short cues
> that we don't want to miss".

Can you elaborate on these use cases? So far, all the use cases that people have put forward are better done as long-lived cues and thus not affected by this.

Comment 17 Bob Lund 2014-04-01 15:40:43 UTC

(In reply to Ian 'Hixie' Hickson from comment #16)
> (In reply to Brendan Long from comment #14)
> > There are several use-cases for this, and they all boil down to "short cues
> > that we don't want to miss".
> 
> Can you elaborate on these use cases? So far, all the use cases that people
> have put forward are better done as long-lived cues and thus not affected by
> this.

SCTE-35 [1] and ETV [2] signaling are non-video/audio elementary streams
in MPEG-2 TS that will be exposed as metadata text tracks. The user agent
needs to pass the private section data to the web app as closely in time
to when the UA received it as possible. The web app uses information in
the private data to synchronize its operation with media stream.

An instantaneous cue is a better model than a long lived cue.

[1] http://www.scte.org/documents/pdf/Standards/ANSI_SCTE%2035%202013a.pdf
[2] http://www.cablelabs.com/wp-content/uploads/specdocs/OC-SP-ETV-BIF1.0.1-120614.pdf

Comment 18 Ian 'Hixie' Hickson 2014-04-01 16:38:48 UTC

> An instantaneous cue is a better model than a long lived cue.

Not for the Web.

I don't think it makes sense to directly port legacy standards from a different industry into the Web. You should design things specifically to work for the Web. The Web has very different characteristics, such as the user having control of the playback device, the playback device needing to share CPU with other tasks, the connection being unreliable, etc.

Supporting legacy formats on the Web isn't a use case. Use cases are things that describe what the user sees. To put it another way: supporting a legacy format is a solution, not a problem. What's the problem?

Comment 19 Brendan Long 2014-04-01 17:06:55 UTC

(In reply to Ian 'Hixie' Hickson from comment #18)
> Supporting legacy formats on the Web isn't a use case. Use cases are things
> that describe what the user sees. To put it another way: supporting a legacy
> format is a solution, not a problem. What's the problem?

Ad insertion is the simplest thing I can think of off the top of my head:

Say I have a video and I want to display ads every 5 minutes. I would create a metadata track with a cue with startTime = endTime = 00:05:00, then 00:10:00, etc.. If this was a static file, I could easily look in that metadata track before the video starts and set 'onenter' to a function which pauses the video, displays another on top, then when that ends, remove the overlay and unpause the video. If the file is live, then the situation becomes annoying:

  * We can't set onenter beforehand because the cues don't necessarily exist yet.
  * We can't necessarily catches new cues using oncuechange, because they're almost guaranteed to be "missed" (because they duration = 0).
  * Even if we set the duration to be some arbitrary value, there's still the chance that something would keep the processor busy long enough that the cue would still be missed.

As far as I can tell, the only reliable way of handling this right now is to keep track of every cue you've ever seen, then when you see "cuechange", you compare your list to the track.cues list to find out which cues have been added.

I think this situation would occur in any situation where you have live cues that can't be skipped (for subtitles, missing a cue is fine, but skipping an ad insertion cue is *bad*). Right now the only time you can have live cues is in-band, but presumably this will be a problem for out-of-band too once we figure out how to make that work with streaming.

Comment 20 Jon Piesing (HbbTV) 2014-04-03 09:24:41 UTC

(In reply to Ian 'Hixie' Hickson from comment #18)
> > An instantaneous cue is a better model than a long lived cue.
> 
> Not for the Web.
> 
> I don't think it makes sense to directly port legacy standards from a
> different industry into the Web. You should design things specifically to
> work for the Web. The Web has very different characteristics, such as the
> user having control of the playback device, the playback device needing to
> share CPU with other tasks, the connection being unreliable, etc.
> 
> Supporting legacy formats on the Web isn't a use case. Use cases are things
> that describe what the user sees. To put it another way: supporting a legacy
> format is a solution, not a problem. What's the problem?

The problem is presenting live TV from an HTML5 page.

Live TV includes in-band triggers, events (etc).

The standards SCTE-35 [1] and ETV [2] mentioned by Bob are US-centric solutions but there are similar standards used in Europe and probably in the Japanese derived TV systems.

Comment 21 Ian 'Hixie' Hickson 2014-05-07 18:36:11 UTC

Why would you want to present live TV on an HTML page? How would that even work?

Comment 22 Jon Piesing (HbbTV) 2014-05-08 09:36:27 UTC

(In reply to Ian 'Hixie' Hickson from comment #21)
> Why would you want to present live TV on an HTML page?

What follows is from a European perspective and I know the US is different ....

Most TV sets other than the really low-end models now include a browser and either WiFi or ethernet. These browsers are used for a variety of services but the thing most people want to do with a TV set is watch TV on it. Many cable / satellite / IPTV network operators include a browser in their set-top boxes.

So one of the main things these browsers are used for is finding and presenting TV content. Many TV broadcasters offer their content direct to consumers without (or as well as) going through someone like Netflix or a cable operator. As an example, the BBC have their iPlayer service and there's an HTML version of that which enables users to find and play TV content from the BBC. The German broadcasters ARD and ZDF have their Mediathek and there's an HTML version of this.

These HTML services all offer "catch-up" TV but many of them also offer (or want to offer) live as well.

> How would that even work?

These HTML services can do live TV in several ways;
- Simple HTTP streaming (a file of logically infinite size although only a sliding window from it will really be available in the network at any time). This is basically taking the content that would go into the broadcast and also encoding & packaging it for HTTP.
- MPEG DASH with the MPD dynamically updating as time goes on. This is basically taking the content that would go into the broadcast and also encoding & packaging it for MPEG-DASH.
- Some TV sets or STBs support presenting broadcast (cable/satellite/terrestrial) video via the object element. If enough of the TVs/STBs in a market support this then this can be relevant to content providers/distributors. In the future, something like the work just starting in the W3C-ish "TV Control API Community Group" might enable this to be done using the HTML5 video element rather than the object element.
- Some TV sets or STBs may support presenting broadcast video using something like the 'tv:' URL with the HTML5 video element. If enough of the TVs/STBs in a market support this then this can be relevant to content providers/distributors.

Obviously the last 2 of these would only work on browsers in a device that has a TV tuner and where the browser and the TV tuner are connected but this is the case in many TV sets now.

Comment 23 Ian 'Hixie' Hickson 2014-05-08 17:51:41 UTC

That's not "presenting live TV on a Web page", that's some internal feature of the TV. We don't need HTML to support that. You can just make your browser do whatever it is you need as a custom extension.

Comment 24 Brendan Long 2014-05-09 16:31:23 UTC

(In reply to Ian 'Hixie' Hickson from comment #23)
> That's not "presenting live TV on a Web page", that's some internal feature
> of the TV. We don't need HTML to support that. You can just make your
> browser do whatever it is you need as a custom extension.

What we're asking for in this bug report isn't a TV-specific feature. We just want to be able to live stream videos and handle metadata properly. Some of these UIs will be available on the network, and I'd personally like for them to be usable on normal browsers where it's reasonable to do so.

Consider this use-case:

My TV company provides a box which has an HTTP server and a transcoder, and it presents a standard HTML5 UI with a streaming WebM video with embedded WebVTT metadata (could be used for various things like ad-insertion, displaying overlays, etc.). It would be really nice if I could just point the browser on my laptop at that server and watch TV. The interface wouldn't be doing anything special, but it wouldn't work right because there's no reasonable way to handle the metadata cues, even if we use the HTML5-standard-ish formats.

Comment 25 Ian 'Hixie' Hickson 2014-05-09 17:14:14 UTC

I don't understand the problem here.

If you want to pause the video occasionally to insert some ad, then just have a cue that represents the "chapter" which is uninterrupted by ads, with pauseOnExit, and then at the end of that chapter, the UA pauses, and you can play the ad, and then resume the original stream. You clearly have to know where you're going to insert the ad ahead of time, otherwise how would you precache the ad? I really don't understand what you're trying to do here.

The descriptions of the problems from Jon and Brendan seem different, too.

It may be that there are multiple issues here and that we are getting confused by cross-talk.

I would recommend filing new bugs that succinctly describe the exact user-facing use case without referencing solutions like WebM or HTML5 or WebVTT or MPEG2, etc. For example, "I want to stream content from one server, and occasionally splice ads from another server", or something, and then, once the description like that is written, separately add a description of why that is not currently possible.

Comment 26 Brendan Long 2014-05-12 21:35:47 UTC

(In reply to Ian 'Hixie' Hickson from comment #25)
> If you want to pause the video occasionally to insert some ad, then just
> have a cue that represents the "chapter" which is uninterrupted by ads, with
> pauseOnExit, and then at the end of that chapter, the UA pauses, and you can
> play the ad, and then resume the original stream. You clearly have to know
> where you're going to insert the ad ahead of time, otherwise how would you
> precache the ad? I really don't understand what you're trying to do here.

Ok, just one more step: How do you do this in MPEG-TS? Or if you'd prefer, how do we transcode from MPEG-TS to this format with a live stream? Or is the "solution" that we convince all of the content creators to start using this new proposed format? That might work eventually, but it's not going to happen quickly.

I'm confused about this whole thread. What's so controversial about want a way to not miss cues? It's not like we're asking for some feature that could only be used for obscure TV services, we just want to know every time we pass a cue, even if it won't be displayed.

As for pre-caching ads, it would also be nice if there was an event when cues are added to a cue list (in a live stream, for example).

Comment 27 Ian 'Hixie' Hickson 2014-05-13 18:39:19 UTC

(In reply to Brendan Long from comment #26)
> Ok, just one more step: How do you do this in MPEG-TS?

I don't understand why the format of the video would matter here. Can you elaborate? What about MPEG-TS makes it hard to know when you'll insert the ad?


> I'm confused about this whole thread. What's so controversial about want a
> way to not miss cues?

All new features are expensive. Unless there's a strong reason to add a new feature, we shouldn't add it. So all feature requests face this same need to justify their cost. It's nothing particular to this feature request.


> It's not like we're asking for some feature that could
> only be used for obscure TV services, we just want to know every time we
> pass a cue, even if it won't be displayed.

That's a solution, not a problem. What one presumably really wants is to display an ad, or a subtitle, or whatever. How we do this, whether it's by detecting a missed zero-width cue, or detecting when we're inside a cue, or whatnot, is the solution.


> As for pre-caching ads, it would also be nice if there was an event when
> cues are added to a cue list (in a live stream, for example).

That makes sense. Can you file a bug for that? http://whatwg.org/newbug

Comment 28 Brendan Long 2014-05-13 21:12:27 UTC

> > As for pre-caching ads, it would also be nice if there was an event when
> > cues are added to a cue list (in a live stream, for example).
> 
> That makes sense. Can you file a bug for that? http://whatwg.org/newbug

https://www.w3.org/Bugs/Public/show_bug.cgi?id=25693

I think this alone would be enough for our use-case, since we could use onenter or onexit events.

  1. "cueadded" event fires.
  2. Start pre-loading the ad.
  3. Add "cueenter" event handler to switch to the ad.

It's a bit more complicated than just checking the contents of a list in "cuechange" (if it existed), but in this case we would need the cueadded callback anyway. I'll let you know if there any cases where this would be particularly annoying or complex.

Comment 29 Ian 'Hixie' Hickson 2014-07-31 22:35:22 UTC

Can you elaborate on the first part of comment 26?

> Ok, just one more step: How do you do this in MPEG-TS?

I'm not following how the format affects the suggestion in comment 25 ("just have a cue that represents the "chapter" which is uninterrupted by ads, with pauseOnExit, and then at the end of that chapter, the UA pauses, and you can play the ad, and then resume the original stream").

Comment 30 Brendan Long 2014-08-06 20:16:44 UTC

(In reply to Ian 'Hixie' Hickson from comment #29)
> Can you elaborate on the first part of comment 26?
> 
> > Ok, just one more step: How do you do this in MPEG-TS?
> 
> I'm not following how the format affects the suggestion in comment 25 ("just
> have a cue that represents the "chapter" which is uninterrupted by ads, with
> pauseOnExit, and then at the end of that chapter, the UA pauses, and you can
> play the ad, and then resume the original stream").

I think I misunderstood your suggestion. With the change in bug #25693, we should be able to handle ad-insertion by adding handlers and pause-on-exit as soon as the cue is added to the track.

I think this change may still be worthwhile because it simplifies things (you just add one event handler to the track instead of a handler on every cue), but it's not strictly necessarily.

Comment 31 Ian 'Hixie' Hickson 2014-09-12 17:14:19 UTC

I just noticed I never replied to comment 19. My apologies.

(In reply to Brendan Long from comment #19)
> 
> Ad insertion is the simplest thing I can think of off the top of my head:
> 
> Say I have a video and I want to display ads every 5 minutes. I would create
> a metadata track with a cue with startTime = endTime = 00:05:00, then
> 00:10:00, etc.

The way to do this (if you really want to use text tracks for this, which I find a bit weird to be honest) is to instead create a metadata cue with startTime = 0, endTime = 5min, and another with startTime = 5min, endTime = 10min, etc. Then each one would have pauseOnExit set, and the track would have a listener for 'cuechange' events (or you could put a listener on each cue, that's as easy).


>   * We can't set onenter beforehand because the cues don't necessarily exist
> yet.

This is why I wouldn't use cues for this, for what it's worth. If you really just want an ad every five minutes (five minutes! that's obscene, from a user perspective) then just set a timer or something like that. No need for a metadata track and all that.


>   * We can't necessarily catches new cues using oncuechange, because they're
> almost guaranteed to be "missed" (because they duration = 0).

This is a non-issue if you make them last the entire segment. If you miss them, then the user probably didn't get a user experience worth giving an ad for in the first place! (Or maybe she seeked past the entire segment, or some such, in any case, showing an ad is pretty lame in that situation.)

In any case, 'cuechange' does fire for missed cues.


>   * Even if we set the duration to be some arbitrary value, there's still
> the chance that something would keep the processor busy long enough that the
> cue would still be missed.

I'm not suggesting an arbitrary value. I'm suggesting making it the length of the segment. See above for a discussion about still missing cues even in that case.


> I think this situation would occur in any situation where you have live cues
> that can't be skipped (for subtitles, missing a cue is fine, but skipping an
> ad insertion cue is *bad*).

Um, no. Missing an ad insertion cue is a trivial concern. In fact, it improves the user experience. The only impact is marginal lost revenue. Missing a subtitle cue, on the other hand, means the user is missing content. That's as critical as it gets.

Comment 32 Brendan Long 2014-09-15 16:19:47 UTC

I can see you point, and bug #25693 fixes the biggest problem. I can just close this if you want?

Comment 33 Ian 'Hixie' Hickson 2014-09-15 22:01:27 UTC

If you are happy to close the bug, that works for me!