This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 16706 - WebVTT v2: rendering of overlapping caption cues
Summary: WebVTT v2: rendering of overlapping caption cues
Status: RESOLVED WONTFIX
Alias: None
Product: TextTracks CG
Classification: Unclassified
Component: WebVTT (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: This bug has no owner yet - up for the taking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-12 04:02 UTC by Silvia Pfeiffer
Modified: 2012-12-05 07:13 UTC (History)
4 users (show)

See Also:


Attachments

Description Silvia Pfeiffer 2012-04-12 04:02:48 UTC
The WebVTT layout algorithm tries very hard to not move cues around once they've been displayed and to never obscure other cues. This means that for cues that overlap in time, the rendering will be in a place where there is space, sometimes with the earliest cue at the bottom. This is quite contrary to common reading order (top-to-bottom) and conflicts with the (mainly US?) style of scrolling captions up to make space for new caption text.

Users have expressed preferences for both rendering approaches: for rolling previous captions out of the way and for adding new ones above existing ones.

Authors also want to have a means of controlling the display of time-overlapping cues.

An analysis of the use cases for "rollup captions" can be found here, including several different suggestions for how explicit support for this rendering style could be added to the spec: http://www.w3.org/community/texttracks/wiki/RollupCaptions .
Comment 1 Ian 'Hixie' Hickson 2012-04-25 22:56:51 UTC
Other than in live captions as on US news programs, where can I see captions being scrolled out of the way to make room for overlapping captions? I've never seen this in DVDs, in the movie theatre, or anywhere that I can recall, either in the US or in Europe.

(I don't think US news live captioning is really the same as this case, that's more a matter of live captions in general, not scrolling overlapping captions. That is, the captions don't overlap, they're just incremental.)
Comment 2 Silvia Pfeiffer 2012-04-26 05:29:36 UTC
(In reply to comment #1)
> Other than in live captions as on US news programs, where can I see captions
> being scrolled out of the way to make room for overlapping captions? I've never
> seen this in DVDs, in the movie theatre, or anywhere that I can recall, either
> in the US or in Europe.

Other than for live captions on TV, you can find it on YouTube and other places where that same content is re-published. 

QuickTime supports it (see e.g. http://www.zeitanker.com/content/services/integration_services/pedestrian_closed_captions_roll_up). There are Flash plugins that support it, too (e.g. http://ncam.wgbh.org/webaccess/ccforflash/).

So, there must be a market otherwise the vendors would not implement it for their Web players. Apple have asked for this feature for WebKit, too.

In addition, YouTube want to use it to display captions created by speech recognition.

These are sufficient use cases IMO.


> (I don't think US news live captioning is really the same as this case, that's
> more a matter of live captions in general, not scrolling overlapping captions.
> That is, the captions don't overlap, they're just incremental.)

The way in which I think we have to deal with roll-up captions is to regard each line as its own WebVTT cue that is on screen for a certain duration and then disappears. During that time, new, time-overlapping cues are added to the roll-up and push the previous line out of the way (e.g. http://www.youtube.com/watch?v=H9Xv4DH6oI8). FAIK there is no better way to model roll-up captions - or do you have a proposal?
Comment 3 Ian 'Hixie' Hickson 2012-04-26 22:25:29 UTC
> These are sufficient use cases IMO.

They're not use cases at all, as far as I can tell.


> > (I don't think US news live captioning is really the same as this case, that's
> > more a matter of live captions in general, not scrolling overlapping captions.
> > That is, the captions don't overlap, they're just incremental.)
> 
> The way in which I think we have to deal with roll-up captions is to regard
> each line as its own WebVTT cue that is on screen for a certain duration and
> then disappears. During that time, new, time-overlapping cues are added to the
> roll-up and push the previous line out of the way (e.g.
> http://www.youtube.com/watch?v=H9Xv4DH6oI8). FAIK there is no better way to
> model roll-up captions - or do you have a proposal?

I haven't seen any evidence that we need to model roll-up captions at all. They're less usable, and rarely used. As I mentioned above, the only time I've personally seen them used is in real-time transcription. If that's the use case then let's examine it, and ff that use case ends up proving to need scrolling then fine, but the point is then we should start from that use case, not from the assumption that we need to do scrolling. Scrolling is a solution, not a use case.

See also e.g.:
   http://lists.w3.org/Archives/Public/public-texttracks/2011Dec/0010.html
Comment 4 Silvia Pfeiffer 2012-04-27 07:51:31 UTC
Did you see the report that I posted as an attachment to the wiki page at http://www.w3.org/community/texttracks/wiki/images/9/96/Live_captioning_report.PDF ?

I agree with everything you are saying (and also the analysis of PLY media) and still: half the users in that analysis said that they prefer the roll-up display style over the pop-on display style.

Also, Glenn posted this example: https://zewt.org/~glenn/overlapped-caption-example.mpg of time-overlapping captions. Even if you and Glenn can deal with the way that captions are displayed in this video, I do not find it natural at all, and neither did David. Natural *for us* would be if every line that was added moved the previous one up.

I am not saying that we have to remove the way in which you and Glenn want to have time-overlapping cues rendered. But I am saying that we absolutely have to add a means to display time-overlapping captions differently, too. Further I am saying that such an introduction will also solve the roll-up need for "live" captions as we know them from TV, which is a positive side effect.
Comment 5 Ian 'Hixie' Hickson 2012-10-01 23:20:38 UTC
That's a fascinating study. I misread it at first and was about to say that we should clearly support scrolling or word-by-word captions, but actually, the report supports the opposite position. They don't give their error margins, but section 3.1 details how in their study, deaf viewers spent about 25% more time reading instead of looking at the images when the subtitles were scrolling vs block captions. (Originally when I read it I misunderstood the %s they give in table 2, thinking they were time spent reading, not time spent on images, but the text below is clear about this.)

IMHO this means we should actively _avoid_ supporting scrolling captions.
Comment 6 Ian 'Hixie' Hickson 2012-10-01 23:28:43 UTC
(Regarding the survey in the study — first, surveys about what users want are notoriously misleading, so I don't think we should base our decisions on those. But even if we did, it's actually less than half of the respondents, and slightly less than the number who preferred blocks, who wanted scrolled captions. But it's possible that the reason they want this is because of the perceived delay problem discussed in the earlier questions, which doesn't apply on the Web. So even if we do look at this study, and even if we ignore that it wasn't a plurality of respondents who preferred scrolling, we should still be careful because it's possible that the situation on the Web is different enough that the results would be materially different if Web users were surveyed.)
Comment 7 Ian 'Hixie' Hickson 2012-12-04 22:20:07 UTC
This bug doesn't cite a use case.

Roll-up captions, which seem to be what this bug was meant to be about (?) are a misfeature, as discussed above.
Comment 8 Silvia Pfeiffer 2012-12-05 07:13:16 UTC
Sorry that your reading of the report came to a different conclusion to mine. When half the users prefer a specific format over another, then there is sufficient grounds for that feature IMO.

Also, just to be clear: the current approach that is specified in WebVTT is not what the report says is the better approach. Instead, what we have is a completely non-intuitive way of dealing with cues that overlap in time: they are rendered where there is space on screen, i.e. either above a cue that is still on screen, or potentially also below one if there is space. I bet that if a study was made along the lines of the cited study, it would come out worse than the rollup approach.