This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The WebVTT syntax specification says about the start time of a cue: "The time represented by this WebVTT timestamp must be greater than or equal to the start time offsets of all previous cues in the file." This is an authoring requirement. However, when we go to the parsing section of the spec, there is no step in the parser that makes sure that cues that are out of time order are ignored. This means that an implemented parser will pick up such cues and enter them into the list of cues to be used at the time that they are relevant. Instead, I propose that we should enter a check into the parser and drop late cues onto the floor. I believe this is the correct thing to do, because the cue is out of order. I am particularly concerned about this for consistency with two situations: encapsulation into media files and live streaming text. In the encapsulation case, if a cue is encapsulated into a media file and it comes too late, then the demuxer and decoder will only come across this cue at a time where it's too late to present it and therefore will have to drop it on the floor. Similar reasoning applies to the live streaming case. I therefore suggest to include such a requirement before step 38 of the parser at http://dev.w3.org/html5/webvtt/#parsing .
I disagree. SRT parsers don't ignore out of order cues. SRT content has out of order cues (I don't have numbers readily available but I could check for it). It seems plausible that WebVTT files will have out of order cues, too. That it doesn't work for encapsulation and streaming is a good reason to make it non-conforming, but it's not a reason to drop cues for static files.
(In reply to comment #1) > I disagree. SRT parsers don't ignore out of order cues. Some do and some don't. It's not well enough specified as to what the right approach is. > SRT content has out of > order cues (I don't have numbers readily available but I could check for it). > It seems plausible that WebVTT files will have out of order cues, too. Since it's an authoring requirement in the WebVTT spec, authors should expect their cues to be dropped on the floor if they come out of order. Why have that requirement in the first place when we ignore it later on during actual processing? Also, what about the processing of chapters? We introduced a means to create chapter trees by nesting of cues, see http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#text-tracks-describing-chapters . If we allow cues to be out of order, that makes it hard to incrementally create the chapter tree. > That it doesn't work for encapsulation and streaming is a good reason to make > it non-conforming, but it's not a reason to drop cues for static files. How about consistency? A file that is used through embedding in a video and compare to a statically referenced file would have different cues presented. That's a situation that we should really avoid. As an author I'd much rather find out that I've misplaced a cue in the file during authoring of the static file than notice that as the file is used elsewhere my cues are going missing.
I agree with Simon here, dropping cues in the parser is more work for a result that is worse. Validity and parsing don't need to be in perfect sync and I can't really see any downsides to handling this case silently. A parser could warn in the error console, though, I'll have a look at doing that in Opera.
Off topic: I guess I should stop putting <track> in the title now that we have the "TextTracks CG" Product?
(In reply to comment #4) > Off topic: I guess I should stop putting <track> in the title now that we have > the "TextTracks CG" Product? Well, where it applies to both the WebVTT file format and the TextTrack API in HTML, we likely need both.
I've added a console error to our implementation (no public build, yet) to warn about out-of-order cues now. It would be trivial to also drop cues in this situation, but I really don't think that's nice when handling it comes for free (since scripts can insert cues out of order).
I don't think the reasons in comment 0 are compelling. The reason it's non-conforming is because there are situations where it can cause problems (e.g. streaming, editing), but I don't see why we'd want to start dropping the cues. It would just hurt users.