14260 – <track> "text tracks ready" and HTMLMediaElement.readyState

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 14260 - <track> "text tracks ready" and HTMLMediaElement.readyState

Summary: <track> "text tracks ready" and HTMLMediaElement.readyState

Status:	RESOLVED FIXED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	Other other

Importance:	P3 normal
Target Milestone:	---
Assignee:	Silvia Pfeiffer
QA Contact:	HTML WG Bugzilla archive list

URL:	http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2011-09-23 13:32 UTC by contributor
Modified:	2012-10-12 09:33 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description contributor 2011-09-23 13:32:57 UTC

Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html
Multipage: http://www.whatwg.org/C#the-text-tracks-are-ready
Complete: http://www.whatwg.org/c#the-text-tracks-are-ready

Comment:
<track> "text tracks ready" and HTMLMediaElement.readyState

Posted from: 83.218.67.122 by philipj@opera.com
User agent: Opera/9.80 (X11; Linux x86_64; U; Edition Next; en) Presto/2.9.205 Version/12.00

Comment 1 Philip Jägenstedt 2011-09-23 13:44:52 UTC

"The text tracks of a media element are ready if all the text tracks whose mode was not in the disabled state when the element's resource selection algorithm last started now have a text track readiness state of loaded or failed to load."

When the parser inserts a <video> element into a document this starts the resource selection algorithm, but at that time there will be no child <track> elements at all. By this definition, the text tracks will be immediately ready in the typical case, which clearly is not the intention.

Keeping track of what the state was at some point in the past is a bit annoying, if at all possible it would be nice if readiness only depends on the current state.

My assumption is that this will be fixed so that adding tracks after HAVE_METADATA does not drop back to HAVE_NOTHING, instead it just tries to catch up with the new track ASAP.

Comment 2 Philip Jägenstedt 2011-09-23 14:00:14 UTC

Tied into this problem is really how to avoid waiting for </video> before going to HAVE_METADATA, which we've avoided so far with <source>. However, it seems unavoidable that if we don't wait for </video> there's no way to guarantee that all tracks sourced from <track> elements will actually be loaded by HAVE_METADATA. Having a single event that guarantees that is kind of useful, I used it in my demos at least.

Comment 3 Simon Pieters 2011-09-23 14:21:17 UTC

I think text tracks shouldn't block HAVE_METADATA. If the text tracks are slow to load, this would mean the video is stuck at HAVE_NOTHING even if the first frame is available. The user might not even want to see the beginning of the video but wishes to seek to the middle of the video ASAP. If the video is stuck at HAVE_NOTHING while the text tracks are loading, the user can't start the seek.

Instead I suggest that we don't transition to HAVE_ENOUGH_DATA until the </video> end tag has been parsed and all text tracks are ready. This prevents autoplay to start before the tracks are ready. If an author wants to know when a track is loaded, he can listen for 'load' on the track itself.

Comment 4 Philip Jägenstedt 2011-09-23 14:53:03 UTC

Do you also think that adding a new track when readyState==HAVE_ENOUGH_DATA should revert the state to HAVE_FUTURE_DATA or HAVE_CURRENT_DATA?

Comment 5 Simon Pieters 2011-09-24 12:58:04 UTC

The only reason to block HAVE_ENOUGH_DATA is to prevent autoplay from starting until the tracks are ready. If autoplay has (or would have) already started, there's no use in flipping back readyState, I think.

Comment 6 Philip Jägenstedt 2011-09-26 08:21:21 UTC

Given that both HTMLMediaElement and TextTrack have a readyState, perhaps the cleanest thing to do is to not entangle these states in another. If all we want is to delay autoplay, perhaps we could fiddle with the autoplaying flag and check it again once all tracks are loaded?

Comment 7 Simon Pieters 2011-09-26 08:35:09 UTC

Yes.

Comment 8 Ian 'Hixie' Hickson 2011-10-02 18:14:27 UTC

The idea here is that it would be as bad to start playback without the enabled tracks loaded as it would be to start playback without the audio or video track loaded. With audio and video, we can do incremental loading, which is good. With the text tracks, we can't, but they're small, so it's not a big deal.

Maybe the solution is to prevent <video> from getting to HAVE_CURRENT_DATA at any time there are outstanding loads? (But then we'd need a way to load metadata them in the background without blocking.) There'd still be a race if the parser stalled after <video> and before <track> for long enough for the video itself to load, but then as soon as a subtitle track was enabled the video would pause again while it loaded.

I'd like to avoid depending on </video>, but we do have precedent for that (with </object>). So it's not the end of the world if we do it.

Comment 9 Simon Pieters 2011-10-03 08:05:16 UTC

(In reply to comment #8)
> The idea here is that it would be as bad to start playback without the enabled
> tracks loaded as it would be to start playback without the audio or video track
> loaded. With audio and video, we can do incremental loading, which is good.
> With the text tracks, we can't, but they're small, so it's not a big deal.

Right.

> Maybe the solution is to prevent <video> from getting to HAVE_CURRENT_DATA at
> any time there are outstanding loads? (But then we'd need a way to load
> metadata them in the background without blocking.)

That's not much better than the status quo (you wouldn't render the first frame).

> There'd still be a race if
> the parser stalled after <video> and before <track> for long enough for the
> video itself to load, but then as soon as a subtitle track was enabled the
> video would pause again while it loaded.
> 
> I'd like to avoid depending on </video>, but we do have precedent for that
> (with </object>). So it's not the end of the world if we do it.

Indeed, I think we need to wait for </video>. I also prefer Philip's suggestion to let the "got video end tag" flag and the tracks' readyState influence the autoplaying flag instead of influencing the video's readyState.

Comment 10 Ian 'Hixie' Hickson 2011-10-04 00:42:15 UTC

Well we still need to block just a regular .play(), as far as I can tell. So it can't just be about autoplay.

Comment 11 Philip Jägenstedt 2011-10-04 09:29:31 UTC

Is the intention to block only the initial playback, or to pause playback while switching tracks mid-stream as well?

Comment 12 Ian 'Hixie' Hickson 2011-10-20 23:36:36 UTC

The idea is to treat active text tracks the same way as active audio or video tracks — just like we don't ever play anywhere when we're buffering audio or video, we shouldn't ever play anywhere while we're still loading an active text track.

I don't really mind how we fix this so long as we keep that invariant.

Comment 13 Silvia Pfeiffer 2011-10-21 00:03:12 UTC

We could instead treated text tracks as part of the audio or video data. This would mean that until the header part (of the WebVTT file) is downloaded, the video element is prevented from going to HAVE_METADATA, and until the cue for the playback time (or the next cue after that or the file end) is downloaded, it is prevented from going to HAVE_ENOUGH_DATA. This would allow text tracks to be streamable, too, and support live scenarios.

Comment 14 Philip Jägenstedt 2011-10-21 08:00:31 UTC

If we really treat them as we do audio and video tracks and have them influence HTMLMediaElement.readyState, then switching text tracks while playing would need to bring readyState back to HAVE_NOTHING while the new track is loading and playback would have to pause. That's not the experience you get in most good desktop media players, where the text simply appears when it's ready. Also, if you drop readyState back to HAVE_NOTHING, then videoWidth and videoHeight would start returning 0, setting currentTime would be a no-op and you'll eventually fire loadedmetadata again.

Comment 15 Silvia Pfeiffer 2011-10-21 09:51:50 UTC

In essence, I am trying to make it such that the behaviour is the same when the text track comes from an in-band track as from an external one.

Comment 16 Philip Jägenstedt 2011-10-21 10:57:46 UTC

(In reply to comment #15)
> In essence, I am trying to make it such that the behaviour is the same when the
> text track comes from an in-band track as from an external one.

AFAICT, we can't let out-of-band tracks both affect readyState as if they were part of the media resource and not stall playback when switching, so out-of-band and in-band can't really have the same behavior.

API- and implementation-wise, it seems sanest to not conflate the readyState of the media resource with that of the text track. UX-wise, it seems best to not cause playback to stall when switching tracks.

Comment 17 Ian 'Hixie' Hickson 2011-10-25 22:40:06 UTC

I don't mind skipping tracks when switching. The user can always pause before doing that. It's the ones at the start of playback that matter more.

How about blocking going to HAVE_FUTURE_DATA until both:

a) <video> is out of the stack of open elements in the parser, and
b) any active <track>s at that time are ready

Comment 18 Ian 'Hixie' Hickson 2011-10-25 22:45:14 UTC

(This would require changing the definition of "the text tracks are ready".)

Comment 19 Simon Pieters 2011-10-26 08:45:00 UTC

Sounds reasonable.

Comment 20 Philip Jägenstedt 2011-10-26 09:23:56 UTC

/me agrees, if it can only block on first load and the resource selection algorithm resets that state, all should be well.

Comment 21 Ian 'Hixie' Hickson 2011-10-26 20:41:30 UTC

Ok, I'll do that.

Comment 22 Ian 'Hixie' Hickson 2011-10-26 22:29:29 UTC

Hmm... I started speccing this, and I'm not sure I'm 100% comfortable with some of the implications here.

What exactly are we blocking on? Consider what happens if a text track is removed from a media element, is disabled, or is enabled, between the <track> element being parsed and the </video> tag being parsed. Which ones should block? What about tracks that are dynamically added? Consider a setTimeout() script racing the parser to add or remove something from the list, or an onerror handler on a <track> that updates the src="", racing with the parser:

   <video>
     <track ... onerror="update track src">
     ...
   </video>

Exactly which tracks should we wait for?

Comment 23 contributor 2011-10-26 22:36:01 UTC

Checked in as WHATWG revision r6767.
Check-in comment: Move the requirement that tracks be loaded to blocking HAVE_FUTURE_DATA rather than HAVE_CURRENT_DATA. (WIP)
http://html5.org/tools/web-apps-tracker?from=6766&to=6767

Comment 24 Philip Jägenstedt 2011-10-28 12:26:35 UTC

(In reply to comment #22)
> Hmm... I started speccing this, and I'm not sure I'm 100% comfortable with some
> of the implications here.
> 
> What exactly are we blocking on? Consider what happens if a text track is
> removed from a media element, is disabled, or is enabled, between the <track>
> element being parsed and the </video> tag being parsed. Which ones should
> block? What about tracks that are dynamically added? Consider a setTimeout()
> script racing the parser to add or remove something from the list, or an
> onerror handler on a <track> that updates the src="", racing with the parser:
> 
>    <video>
>      <track ... onerror="update track src">
>      ...
>    </video>
> 
> Exactly which tracks should we wait for?

Hmm, exactly what problem are we trying to solve? If it is only to block first play (auto or manual) when there are tracks in the markup, then a working but kludgy solution would be to consider only the <track> elements that are children of <video> at the time that </video> is parsed.

As for an error event handler changing src, I suggest that a track be considered "ready" the first time it either loads or fails.

However, if the entire point is to not play while the tracks are loading, perhaps we shouldn't fiddle with readyState but instead cause the transition from paused to playing to block on track readiness? That way changing the active tracks on a playing resource doesn't stall, but if it was paused it won't play until the tracks are ready. Would that work?

Comment 25 Ian 'Hixie' Hickson 2011-11-03 16:11:17 UTC

I think we established that once the video has started playing, we don't want to block, since that would make the user experience worse.

So basically, your proposal is to have each <track> element have a "did load once" state, which is set to true whenever a track loads or fails to load, and is only ever set to false at element creation time? And when the <video> element is popped off the stack of open elements, or when it is created by any means other than a parser, the <video> element takes a snapshot of its list of child <track> elements, and delays transitioning to HAVE_FUTURE_DATA until that list's tracks have all set "did load once"?

We still have the problem of a timeout racing the parser to insert a track, but I guess we can live with that.

Comment 26 Philip Jägenstedt 2011-11-03 17:11:08 UTC

As Simon pointed out in IRC, it's a bit odd that <track> inserted by script behaves completely differently from <track> inserted by the parser. Perhaps another way is to define that </video> invokes the resource selection algorithm and to instead consider all child <track> elements the last time the resource selection algorithm was invoked to block the transition to HAVE_CURRENT_DATA?

To avoid taking an actual snapshot of the list of <track> elements, I'd probably implement it as a "blocking" flag that is set to true for child <track> elements when the resource selection algorithm runs.

P.S. I suggest blocking HAVE_CURRENT_DATA rather than HAVE_FUTURE_DATA because a cue might begin at 0.

Comment 27 Simon Pieters 2011-11-04 06:13:35 UTC

(In reply to comment #26)
> As Simon pointed out in IRC, it's a bit odd that <track> inserted by script
> behaves completely differently from <track> inserted by the parser. Perhaps
> another way is to define that </video> invokes the resource selection algorithm
> and to instead consider all child <track> elements the last time the resource
> selection algorithm was invoked to block the transition to HAVE_CURRENT_DATA?
> 
> To avoid taking an actual snapshot of the list of <track> elements, I'd
> probably implement it as a "blocking" flag that is set to true for child
> <track> elements when the resource selection algorithm runs.

This sounds good to me.

> P.S. I suggest blocking HAVE_CURRENT_DATA rather than HAVE_FUTURE_DATA because
> a cue might begin at 0.

Blocking HAVE_CURRENT_DATA means you can't show the first frame, which is bad. I think blocking HAVE_FUTURE_DATA is right, since it means you can show a frame but you can't start playback. The track doesn't need to appear at the same time as the first frame.

Comment 28 Philip Jägenstedt 2011-11-04 09:44:34 UTC

(In reply to comment #27)
> (In reply to comment #26)
> > P.S. I suggest blocking HAVE_CURRENT_DATA rather than HAVE_FUTURE_DATA because
> > a cue might begin at 0.
> 
> Blocking HAVE_CURRENT_DATA means you can't show the first frame, which is bad.
> I think blocking HAVE_FUTURE_DATA is right, since it means you can show a frame
> but you can't start playback. The track doesn't need to appear at the same time
> as the first frame.

Oh, Opera always has a frame in HAVE_METADATA, but I guess that's a "bug" then.

Comment 29 Ian 'Hixie' Hickson 2011-11-11 22:16:11 UTC

So:

- if the media element is still on a stack of open elements, then abort the resource selection algorithm at the top.
- when a media element is popped off a stack of open elements, run the resource selection algorithm.
- when the resource selection algorithm is started, flag any <track> elements in the media element as being load-blocking text tracks.
- when a text track loads or fails to load, clear its load-blockingness.
- prevent transition to HAVE_FUTURE_DATA when there are any tracks in the media element's list of text tracks that are load-blocking.

Sound right?

Comment 30 Silvia Pfeiffer 2011-11-12 21:45:32 UTC

(In reply to comment #29)
> So:
> 
> - if the media element is still on a stack of open elements, then abort the
> resource selection algorithm at the top.
> - when a media element is popped off a stack of open elements, run the resource
> selection algorithm.
> - when the resource selection algorithm is started, flag any <track> elements
> in the media element as being load-blocking text tracks.
> - when a text track loads or fails to load, clear its load-blockingness.
> - prevent transition to HAVE_FUTURE_DATA when there are any tracks in the media
> element's list of text tracks that are load-blocking.
> 
> Sound right?

Sounds good to me.

Can we additionally have a timeout on the load-blockingness of tracks? Maybe "fails to load" already has a timeout, then that's fine. I just want to make sure the video doesn't get starved because a connection waits indefinitely on a text track.

Comment 31 Philip Jägenstedt 2011-11-14 09:34:23 UTC

(In reply to comment #29)
> So:
> 
> - if the media element is still on a stack of open elements, then abort the
> resource selection algorithm at the top.
> - when a media element is popped off a stack of open elements, run the resource
> selection algorithm.
> - when the resource selection algorithm is started, flag any <track> elements
> in the media element as being load-blocking text tracks.
> - when a text track loads or fails to load, clear its load-blockingness.
> - prevent transition to HAVE_FUTURE_DATA when there are any tracks in the media
> element's list of text tracks that are load-blocking.
> 
> Sound right?

Yes, sounds good to me.

(In reply to comment #30)
> Can we additionally have a timeout on the load-blockingness of tracks? Maybe
> "fails to load" already has a timeout, then that's fine. I just want to make
> sure the video doesn't get starved because a connection waits indefinitely on a
> text track.

That will eventually cause a network timeout error, so it won't block indefinitely.

Comment 32 Simon Pieters 2011-11-14 10:14:07 UTC

(In reply to comment #29)
> So:
> 
> - if the media element is still on a stack of open elements, then abort the
> resource selection algorithm at the top.

I'm not so happy with this. This sounds like something browsers will want to work around by speculatively running the resource selection algorithm to preload the right <source>. (Consider the case where there are external scripts inside the video which block the parser from seeing the </video> end tag.) If we can avoid blocking loading of <source>, I think it would be good.

> - when a media element is popped off a stack of open elements, run the resource
> selection algorithm.
> - when the resource selection algorithm is started, flag any <track> elements
> in the media element as being load-blocking text tracks.
> - when a text track loads or fails to load, clear its load-blockingness.
> - prevent transition to HAVE_FUTURE_DATA when there are any tracks in the media
> element's list of text tracks that are load-blocking.
> 
> Sound right?

How about:

- When resource selection starts, if the media element is on a stack of open elements, set wait_for_pop to true.
- When a media element is popped off the stack of open elements, set wait_for_pop to false and set load-blocking on any <track> children (that are not already loaded or have failed to load).
- When a text track loads of fails to load, clear its load-blockingness.
- Prevent transition to HAVE_FUTURE_DATA when wait_for_pop is true or there are any tracks in the media element's list of text tracks that are load-blocking.

Comment 33 Ian 'Hixie' Hickson 2012-01-26 21:38:35 UTC

Simon's idea seems reasonable to me.

To recast it the way I would spec it:

"The text tracks of a media element created by a parser are ready if the media element has been popped off the stack of open elements and all its text tracks whose mode was not in the disabled state when the element was popped off the stack of open elements now have a text track readiness state of loaded or failed to load.

The text tracks of a media element not created by a parser are ready if the media element's text tracks whose mode was not in the disabled state when the element's resource selection algorithm
last started now have a text track readiness state of loaded or failed to load."

How does that sound?

Comment 34 Simon Pieters 2012-01-27 09:49:39 UTC

(In reply to comment #33)
> Simon's idea seems reasonable to me.
> 
> To recast it the way I would spec it:
> 
> "The text tracks of a media element created by a parser are ready if the media
> element has been popped off the stack of open elements and all its text tracks
> whose mode was not in the disabled state when the element was popped off the
> stack of open elements now have a text track readiness state of loaded or
> failed to load.
> 
> The text tracks of a media element not created by a parser are ready if the
> media element's text tracks whose mode was not in the disabled state when the
> element's resource selection algorithm
> last started now have a text track readiness state of loaded or failed to
> load."
> 
> How does that sound?

Ah, my suggestion didn't cover the non-parser case.

For the non-parser case, I think "when the resource selection algorithm last started" should be after step 2 in the resource selection algorithm, since the <track> children might not have been inserted yet at the time the algorithm *started*.

I'm having slight trouble determining whether your text is equivalent to mine (ignoring the non-parser case). Therefore I think I prefer if this was specced by using flags on the elements and checks in the resource selection algorithm. It also seems more likely to be implemented correctly if it's specified this way.

Comment 35 Ian 'Hixie' Hickson 2012-04-25 18:10:24 UTC

So we define a list of pending text tracks, which is initially empty. When a parser-created media element is popped off the stack of open elements, and when a non-parser-created media element's resource selection algorithm reaches a stable state in step 2, the list of pending tracks is populated with all the text tracks whose mode is not in the disabled state. Whenever a track element's parent node changes, it is removed from any media element's list of pending tracks. Whenever a new media resource is selected, any in-band text tracks from the previous media resource are removed from the element's list of pending text tracks. Whenever a text track in a media element's list of pending text tracks change text track readiness state to loaded or failed to load, they are removed from the media element's list of pending text tracks.

The text tracks of a media element are ready when both the media element's list of pending text tracks is empty and either the element is not parser-created or the media element has been popped off the stack of open elements.

And these steps are put into the relevant algorithms, not defined in one paragraph like above.

That sound ok?

Comment 36 Simon Pieters 2012-04-25 19:02:56 UTC

That sounds OK to me!

Comment 37 contributor 2012-06-26 22:12:36 UTC

Checked in as WHATWG revision r7148.
Check-in comment: Try to make the text track pending thing work better.
http://html5.org/tools/web-apps-tracker?from=7147&to=7148

Comment 38 Ian 'Hixie' Hickson 2012-06-26 22:12:51 UTC

Please review the change! :-)

Comment 39 Ian 'Hixie' Hickson 2012-06-27 21:49:34 UTC

(reopening this since the HTMLWG spec still needs this fix)

Comment 40 contributor 2012-07-18 07:30:36 UTC

This bug was cloned to create bug 17992 as part of operation convergence.

Comment 41 Silvia Pfeiffer 2012-09-07 19:21:53 UTC

> (reopening this since the HTMLWG spec still needs this fix)

Merged into HTMLWG spec: https://github.com/w3c/html/commit/108b001874d9c4bf7f50b1412f18fe8e1aa5af6a

Comment 42 Philip Jägenstedt 2012-10-12 09:33:51 UTC

Sorry for being slow, but I've now reviewed the spec change, it looks good! I haven't tried implementing it yet, but if I did I would probably just implement it as a flag on <track> elements on not a list, since the definitions seems to exclude DOM-created and in-band tracks from ever being added to the list of pending text tracks.