This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
http://dev.w3.org/html5/webvtt/#webvtt-cue-text-rendering-rules In step 10.13.15 of the rendering algorithm the cue boxes are moved back to the default position before trying the other direction, even the second time that point in the algorithm is reached. The net result is that if the cue did not fit at all, it's moved back to its default position where it will overlap some other cue. The first part of the fix is to switch the order of steps 10.13.15 and 10.13.16 so that when "giving up" the cue boxes are left wherever they ended up. However, this is not enough to avoid overlap in all cases, since the condition in 10.13.12 is to check if any part of the cue box is outside the video rendering area, which means that the cues may still overlap. There are a two alternatives that we can see: 1. In step 10.13.16, if the cue is still overlapping any other cue, remove it from the output. 2. Modify 10.13.12 to have a different condition depending on switched flag. if switched is false, step until any part of the cue box is outside the video rendering area. If switched is true, step until all of the cue box is outside the video rendering area.
After some experimentation with our implementation I think I've arrived at something sane that works well and is not too different from the current spec. The strategy is to accept the first non-overlapping and completely contained position before switching and to accept the first non-overlapping position after switching. In the terms of the spec: 11. Step loop. If switched is false, run the following substeps: 11.1. The current step 11 (possible jump to done positioning). 11.2. The current step 12 (possible jump to switch direction). 12. If switched is true and none of the boxes in boxes would overlap any of the boxes in output, then jump to the step labeled done positioning below. Steps 13-19 would remain as they are.
Backing up a bit, the goals of this algorithm should be, in order of priority: 1. Never overlap cues, it's better to not show a cue at all than to have it obscured by another cue. 2. Preserve the intended visual order of cues. 3. If a cue cannot fit within the (remaining) video rendering area, show as much of it as possible. The algorithm I suggested in Comment 1 fails on point 3, since any cue that won't fit is moved out of the rendering area completely. A cue that barely fits will be rendered, but when another line is added it will disappear, which doesn't seem great if some people will see the cue and some will not due to font size.
Instead, I suggest the current spec algorithm with the following modifications: Introduce a "fallback position" for boxes which is initially null. 11. Step loop: If none of the boxes in boxes would overlap any of the boxes in output, run the following substeps: 11.1. If all the boxes in output are within the video's rendering area, then jump to the step labeled done positioning below. 11.2. Otherwise, if the boxes fallback position is null or the area of the intersection of boxes and the video rendering area is bigger than the area of the intersection of boxes at their fallback position and the video rendering area, then remember the position of all the boxes in boxes as their fallback position. (Remember the non-overlapping but non-contained position that had the most amount of text within the video rendering area.) 12. If the intersection of boxes and the video rendering area is empty, jump to the step labeled switch direction. (Push the boxes all the way outside before switching directions to ensure that all fallback positions are considered. This can be optimized to stop sooner, but that only clutters the algorithm.) ... 15. Switch direction: If switched is true, run the following substeps: 15.1. If the boxes fallback position is not null, move boxes to their fallback position. 15.2. Jump to the step labeled done positioning below. 16. Move all the boxes in boxes back to their default position as determined in the step above labeled default. (Reverse the order of step 15 and 16 and check the fallback position before finally giving up.)
(In reply to comment #2) > Backing up a bit, the goals of this algorithm should be, in order of priority: > > 1. Never overlap cues, it's better to not show a cue at all than to have it > obscured by another cue. I'm not sure I agree with this goal. If you can see that something overlaps, you may be able to make the region larger (e.g. go fullscreen in the video). You have no such choice if you can't even see that there is another cue. > 2. Preserve the intended visual order of cues. > > 3. If a cue cannot fit within the (remaining) video rendering area, show as > much of it as possible. > > The algorithm I suggested in Comment 1 fails on point 3, since any cue that > won't fit is moved out of the rendering area completely. A cue that barely fits > will be rendered, but when another line is added it will disappear, which > doesn't seem great if some people will see the cue and some will not due to > font size. IMO all we should be doing with the overlap avoidance is best effort. If we've tried and it doesn't work, we should just render it where it originally was.
(In reply to comment #4) > (In reply to comment #2) > > Backing up a bit, the goals of this algorithm should be, in order of priority: > > > > 1. Never overlap cues, it's better to not show a cue at all than to have it > > obscured by another cue. > > I'm not sure I agree with this goal. If you can see that something overlaps, > you may be able to make the region larger (e.g. go fullscreen in the video). > You have no such choice if you can't even see that there is another cue. If the font size is given in vh/vw units, as it is by default, then the text scales with the video and fullscreen won't help. However, I agree that there is a goal for web authors here: 4. Make it obvious that there's not enough room to render all the cues and that they should reduce the amount of text. > > 2. Preserve the intended visual order of cues. > > > > 3. If a cue cannot fit within the (remaining) video rendering area, show as > > much of it as possible. > > > > The algorithm I suggested in Comment 1 fails on point 3, since any cue that > > won't fit is moved out of the rendering area completely. A cue that barely fits > > will be rendered, but when another line is added it will disappear, which > > doesn't seem great if some people will see the cue and some will not due to > > font size. > > IMO all we should be doing with the overlap avoidance is best effort. If we've > tried and it doesn't work, we should just render it where it originally was. I agree that we should make a best effort, better than the current spec, but the default position isn't actually a very good fallback. Consider a video which can fit 15 lines of text and this cue: 00:00:00.000 --> 00:00:10.000 Line 1 Line 2 Line 3 Line 4 Line 5 Line 5 Line 5 Line 6 Line 7 Line 8 Line 9 Line 10 Line 11 Line 12 Line 13 Line 14 Line 15 In both algorithms, this will render 15 lines: http://people.opera.com/philipj/2012/06/webvtt/15-lines.png If "Line 16" is added, the current spec will fail to find a non-overlapping and enclosed position and instead use the default position, showing only one line: http://people.opera.com/philipj/2012/06/webvtt/16-lines-spec.png Using a fallback position, however, as much text as possible is shown: http://people.opera.com/philipj/2012/06/webvtt/16-lines-opera.png In this case, the spec'd algorithm doesn't help authors detect the problem, while my suggested algorithm does. One case where my algorithm is not awesome is if a big cue is sandwiched between two small cues: 00:00:00.000 --> 00:00:10.000 line:0 Top 00:00:00.000 --> 00:00:10.000 Bottom 00:00:00.000 --> 00:00:10.000 Line 1 Line 2 Line 3 Line 4 Line 5 Line 5 Line 5 Line 6 Line 7 Line 8 Line 9 Line 10 Line 11 Line 12 Line 13 Line 14 http://people.opera.com/philipj/2012/06/webvtt/sandwich-spec.png http://people.opera.com/philipj/2012/06/webvtt/sandwich-opera.png In this case, neither algorithm is great, it's basically weighing readability (users) vs error discovery (authors). In general, filling up as much of the viewport is possible is a pretty strong hint to the author and not overlapping is better for users, and that's what I've tried to achieve.
Note also that overflow clipping (https://www.w3.org/Bugs/Public/show_bug.cgi?id=17473) is a prerequisite for my suggestions.
(In reply to comment #5) > If the font size is given in vh/vw units, as it is by default, then the text > scales with the video and fullscreen won't help. True. But a user can always change the font size in the browser. > However, I agree that there is > a goal for web authors here: > > 4. Make it obvious that there's not enough room to render all the cues and that > they should reduce the amount of text. I'm not even so worried about authors, who should actually know better and look at the outcome of what they are producing. What I am more worried about are users that change their font sizes because they need them bigger to be able to read the text. There's nothing an author can do to prevent overlap in that situation. > I agree that we should make a best effort, better than the current spec, but > the default position isn't actually a very good fallback. > > Consider a video which can fit 15 lines of text and this cue: [snip] > In both algorithms, this will render 15 lines: > > http://people.opera.com/philipj/2012/06/webvtt/15-lines.png Nice to see you're doing "rollup" rendering. ;-) > If "Line 16" is added, the current spec will fail to find a non-overlapping and > enclosed position and instead use the default position, showing only one line: > > http://people.opera.com/philipj/2012/06/webvtt/16-lines-spec.png I would have thought that the other lines continue to exist, too, so there would be overlap. > Using a fallback position, however, as much text as possible is shown: > > http://people.opera.com/philipj/2012/06/webvtt/16-lines-opera.png I do like this solution - it delays the problem further. But what happens if you add another one? You're still going to have to create overlap now, don't you? > http://people.opera.com/philipj/2012/06/webvtt/sandwich-spec.png > http://people.opera.com/philipj/2012/06/webvtt/sandwich-opera.png > > In this case, neither algorithm is great, it's basically weighing readability > (users) vs error discovery (authors). > > In general, filling up as much of the viewport is possible is a pretty strong > hint to the author and not overlapping is better for users, and that's what > I've tried to achieve. Hmm, I would have expected the 14 lines to be rendered on top of the bottom one overlapping the existing explicitly positioned top line.
It seems that we agree that the spec isn't perfect when not all cues can be fit non-overlapping within the video rendering area. Now, let's find the best solution! (In reply to comment #7) > (In reply to comment #5) > > If "Line 16" is added, the current spec will fail to find a non-overlapping and > > enclosed position and instead use the default position, showing only one line: > > > > http://people.opera.com/philipj/2012/06/webvtt/16-lines-spec.png > > I would have thought that the other lines continue to exist, too, so there > would be overlap. No, the lines of a single cue are not rearranged, only the cues themselves. > > Using a fallback position, however, as much text as possible is shown: > > > > http://people.opera.com/philipj/2012/06/webvtt/16-lines-opera.png > > I do like this solution - it delays the problem further. But what happens if > you add another one? You're still going to have to create overlap now, don't > you? Per the changes I suggested, the rendering wouldn't change, since line 17 would end up outside the rendering area. This could be changed by changing the fallback position to pick the latest candidate of the same size instead of the first, but it's probably better to show the beginning of the cue than the end of it. > > http://people.opera.com/philipj/2012/06/webvtt/sandwich-spec.png > > http://people.opera.com/philipj/2012/06/webvtt/sandwich-opera.png > > > > In this case, neither algorithm is great, it's basically weighing readability > > (users) vs error discovery (authors). > > > > In general, filling up as much of the viewport is possible is a pretty strong > > hint to the author and not overlapping is better for users, and that's what > > I've tried to achieve. > > Hmm, I would have expected the 14 lines to be rendered on top of the bottom one > overlapping the existing explicitly positioned top line. Not according to the spec, no. The default position is simply the first position tested, which for multi-line cues at line:-1 shows only the first line.
(In reply to comment #8) > It seems that we agree that the spec isn't perfect when not all cues can be fit > non-overlapping within the video rendering area. Now, let's find the best > solution! Agree. > (In reply to comment #7) > > (In reply to comment #5) > > > If "Line 16" is added, the current spec will fail to find a non-overlapping and > > > enclosed position and instead use the default position, showing only one line: > > > > > > http://people.opera.com/philipj/2012/06/webvtt/16-lines-spec.png > > > > I would have thought that the other lines continue to exist, too, so there > > would be overlap. > > No, the lines of a single cue are not rearranged, only the cues themselves. So a one-line overlapping cue suppresses a cue of 16 lines out of which 15 are perfectly fine showing? It seems that not trying to avoid overlap would be better than this? > > > Using a fallback position, however, as much text as possible is shown: > > > > > > http://people.opera.com/philipj/2012/06/webvtt/16-lines-opera.png > > > > I do like this solution - it delays the problem further. But what happens if > > you add another one? You're still going to have to create overlap now, don't > > you? > > Per the changes I suggested, the rendering wouldn't change, since line 17 would > end up outside the rendering area. This could be changed by changing the > fallback position to pick the latest candidate of the same size instead of the > first, but it's probably better to show the beginning of the cue than the end > of it. The beginning being the first line? If so, it's probably better to show the last line than the first, because the first line may already have been read by the user before the overlap happens. > > > http://people.opera.com/philipj/2012/06/webvtt/sandwich-spec.png > > > http://people.opera.com/philipj/2012/06/webvtt/sandwich-opera.png > > > > > > In this case, neither algorithm is great, it's basically weighing readability > > > (users) vs error discovery (authors). > > > > > > In general, filling up as much of the viewport is possible is a pretty strong > > > hint to the author and not overlapping is better for users, and that's what > > > I've tried to achieve. > > > > Hmm, I would have expected the 14 lines to be rendered on top of the bottom one > > overlapping the existing explicitly positioned top line. > > Not according to the spec, no. The default position is simply the first > position tested, which for multi-line cues at line:-1 shows only the first > line. That sounds like a terrible consequence. I'm increasingly wondering if we are trying to be too clever with avoiding overlap.
> > (In reply to comment #7) > > > (In reply to comment #5) > > > > If "Line 16" is added, the current spec will fail to find a non-overlapping and > > > > enclosed position and instead use the default position, showing only one line: > > > > > > > > http://people.opera.com/philipj/2012/06/webvtt/16-lines-spec.png > > > > > > I would have thought that the other lines continue to exist, too, so there > > > would be overlap. > > > > No, the lines of a single cue are not rearranged, only the cues themselves. > > So a one-line overlapping cue suppresses a cue of 16 lines out of which 15 are > perfectly fine showing? It seems that not trying to avoid overlap would be > better than this? The problem is returning to the "default position", which isn't great at all. > > > > Using a fallback position, however, as much text as possible is shown: > > > > > > > > http://people.opera.com/philipj/2012/06/webvtt/16-lines-opera.png > > > > > > I do like this solution - it delays the problem further. But what happens if > > > you add another one? You're still going to have to create overlap now, don't > > > you? > > > > Per the changes I suggested, the rendering wouldn't change, since line 17 would > > end up outside the rendering area. This could be changed by changing the > > fallback position to pick the latest candidate of the same size instead of the > > first, but it's probably better to show the beginning of the cue than the end > > of it. > > The beginning being the first line? If so, it's probably better to show the > last line than the first, because the first line may already have been read by > the user before the overlap happens. Since all these lines (Line 1 - Line 17) are all in the same cue, I believe the first line couldn't have been read before (unless it appears in a different cue). So the clipped part would never be visible in this situation (be it the beginning or the end of the cue). The only way to display both the start and the end of the cue would be to actually split the cue: either in multiple cues, either artificially inserting inner timestamps. This would basically mean that for badly authored cues there's still the possibility to see everything, but gradually over the initial interval of the cue. This is rather complicated, though, and we might encourage authors to rely on a such a feature rather than properly author their captions. > > > > http://people.opera.com/philipj/2012/06/webvtt/sandwich-spec.png > > > > http://people.opera.com/philipj/2012/06/webvtt/sandwich-opera.png > > > > > > > > In this case, neither algorithm is great, it's basically weighing readability > > > > (users) vs error discovery (authors). > > > > > > > > In general, filling up as much of the viewport is possible is a pretty strong > > > > hint to the author and not overlapping is better for users, and that's what > > > > I've tried to achieve. > > > > > > Hmm, I would have expected the 14 lines to be rendered on top of the bottom one > > > overlapping the existing explicitly positioned top line. > > > > Not according to the spec, no. The default position is simply the first > > position tested, which for multi-line cues at line:-1 shows only the first > > line. > > That sounds like a terrible consequence. Yes, I agree that this default position should be changed. > I'm increasingly wondering if we are trying to be too clever with avoiding > overlap. Since we are talking about possible issues, here's something else that appeared while I played with the implementation. Consider the following cues, in text track cue order: Vertical Growing Left and Horizontal at line 0 (and start alignment). Both cues: http://swarm.cs.pub.ro/~victor/track/vertical_horizontal_1.png Just the horizontal cue: http://swarm.cs.pub.ro/~victor/track/horizontal_1.png Ideally, to preserve author's intention it would be better to just have the horizontal cue move actually by the left, rather than moving it down (having a longer vertical cue would basically make any such horizontal cue disappear).
> Ideally, to preserve author's intention it would be better to just have > the horizontal cue move actually by the left, rather than moving it down > (having a longer vertical cue would basically make any such horizontal > cue disappear). I wanted to say "move actually by the right of the vertical cue". One other thing I wanted to add is that I keep wondering why "absolutely positioned" (snap-to-lines not set) cues are in the same flow (in terms of collision detection) as the ones positioned by default or line related (snap-to-lines set). I think of cues positioned at (x%, y%) of the video similar to absolute positioned elements in a CSS layout - which are always out of the flow. If the author intended that cue to be at (x%, y%) then it should just be there, even if it, unfortunately, overlaps something.
(In reply to comment #11) > One other thing I wanted to add is that I keep wondering why "absolutely > positioned" (snap-to-lines not set) cues are in the same flow (in terms of > collision detection) as the ones positioned by default or line related > (snap-to-lines set). > > I think of cues positioned at (x%, y%) of the video similar to absolute > positioned elements in a CSS layout - which are always out of the flow. If the > author intended that cue to be at (x%, y%) then it should just be there, even > if it, unfortunately, overlaps something. I agree. We should exclude absolutely/explicitly positioned elements from the collision avoidance algorithm.
Cues for which snap-to-lines flag is not set are already handled in a separate branch of step 10.13, what would you change and why?
(In reply to comment #9) > > (In reply to comment #7) > > > (In reply to comment #5) > > > > If "Line 16" is added, the current spec will fail to find a non-overlapping and > > > > enclosed position and instead use the default position, showing only one line: > > > > > > > > http://people.opera.com/philipj/2012/06/webvtt/16-lines-spec.png > > > > > > I would have thought that the other lines continue to exist, too, so there > > > would be overlap. > > > > No, the lines of a single cue are not rearranged, only the cues themselves. > > So a one-line overlapping cue suppresses a cue of 16 lines out of which 15 are > perfectly fine showing? It seems that not trying to avoid overlap would be > better than this? The example wasn't a 1-line cue plus a 16-line cue, but the spec'd algorithm wouldn't handle that very well, no. My suggestion would show the 1-line cue and as much as possible of the 16-line cue, but not overlap them. Note that the order matters, if the 16-line cue is positioned first the 1-line cue will not be shown at all.
(In reply to comment #13) > Cues for which snap-to-lines flag is not set are already handled in a separate > branch of step 10.13, what would you change and why? Indeed, they are handled, when it comes to positioning, in a different branch. But in terms of collision detection, it doesn't matter the type of cue. So if a cue with snap-to-lines not set overlaps a cue with snap-to-lines set, it will try to move from the (x%, y%) position.
That seems like a feature to me, why overlap cues when we can avoid it?
(In reply to comment #16) > That seems like a feature to me, why overlap cues when we can avoid it? What if the cue doesn't make sense anymore if moved from that (x%, y%) position? It would be nice to have at least an option to fix the position; with the current spec, there's no way for an author to be guaranteed that the cue will be displayed at a certain position, regardless of what else is displayed.
Certainly it's better for the cues to be readable than to be at the exact position requested? They'll only be moved as far as necessary, which is desirable in the normal case where the cues happened to overlap slightly due to font size differences. You can construct cases where the cues will be moved further by filling the viewport with cues, but certainly we shouldn't make the algorithm worse in the common case to fix this?
I can definitely look into this algorithm again and see if there's something that would make non-fitting cues be handled better. To be honest I didn't spend much time on that case since it didn't seem likely to be common. I kind of think that it's better if a cue can't be fit without any overlap for it to be positioned where the author said to position it and not try to second-guess the author any further... at least that way the author can control where it ends up in cases where overlap is definitely going to happen.
I could live with an algorithm that falls back to a default position even if it's overlapping, but at the very least the default position would need to be redefined so that we don't get the problem in <http://people.opera.com/philipj/2012/06/webvtt/sandwich-spec.png>. A better default position should at least have as much of the cue within the rendering area as possible. That being said, I of course prefer my suggested algorithm, since I've already implemented it :)
Can you elaborate on the difference between this bug and bug 17473?
Bug 17473 is a suggestion to use CSS (overflow:hidden) to hide any (parts of) cues that aren't completely contained within the rendering area, with no changes to the layout algorithm. The primary justification is that it's easier to implement and the only effect is that some lines will be partially visible instead of removed. This bug is a suggestion to change what the algorithm does when no completely contained position was found. The justification is that the default position in the algorithm is very bad and that having overlapping cues is pointless since you can't actually read them. http://my.opera.com/desktopteam/blog/2012/08/03/summer-core-update implements the algorithm from Comment 3 if you want to try it.
Did comment 3.
What the spec now says differs from comment 3 in one way. What I implemented in Opera would only consider a non-overlapping position position as a candidate for "fallback position", while the spec uses all tested positions as candidates for "best position". In other words, the cue can still end up overlapping other cues as opposed to being pushed off screen. Was this deliberate? The fallback position is better now than it was, so I could probably live with it if there was some rationale.
I'm reopening this as we're now looking at implementing the "best position" bits in Blink: https://codereview.chromium.org/880873002/ We weren't too happy with the additional complexity and it doesn't seem warranted just to pick the best of several suboptimal positions when the title area is full of cues, a scenario with no really good outcome possible. After considering various options, we'd like to simply discard cues that don't fit: https://github.com/w3c/webvtt/pull/171 This ignores point 3 from comment #2 for the sake of simplicity in implementation and testing. It shouldn't change behavior for any case with a sane amount of cue text.
Rick, Ralph, this should interest you as well, as I see that vtt.js implements the best position steps which we want to remove.
While I disagree with discarding overlapping cues, this is hopefully an extremely rare case and the current rendering approach is worse. So, in the interest of making progress, I think the patch in https://github.com/w3c/webvtt/pull/171 is good to be applied. I'll give other browsers a week to comment before accepting the pull request.
Eric, Jer, are you OK with the proposed change in comment #25?
(In reply to Philip Jägenstedt from comment #28) > Eric, Jer, are you OK with the proposed change in comment #25? Yes, it seems like the most sensible solution.
https://github.com/w3c/webvtt/pull/171 merged