This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 22029 - <track> Overlapping avoidance for non-snapToLines cue is underspecced
Summary: <track> Overlapping avoidance for non-snapToLines cue is underspecced
Status: NEW
Alias: None
Product: TextTracks CG
Classification: Unclassified
Component: WebVTT (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Silvia Pfeiffer
QA Contact: This bug has no owner yet - up for the taking
URL:
Whiteboard: v1
Keywords:
Depends on:
Blocks:
 
Reported: 2013-05-14 16:05 UTC by Jer Noble
Modified: 2013-12-09 07:14 UTC (History)
10 users (show)

See Also:


Attachments

Description Jer Noble 2013-05-14 16:05:13 UTC
5.2.1.10.14.<non snapToLines condition>.4 states:

"If there is a position to which the boxes in boxes can be moved while maintaining the relative positions of the boxes in boxes to each other such that none of the boxes in boxes would overlap any of the boxes in output, and all the boxes in output would be within the video's rendering area,..."

The "if there is" clause does a lot of heavy lifting here. This implies the UA has to solve a difficult fitting problem without providing an algorithm to do so.

"then move the boxes in boxes to the closest such position to their current position, and then jump to the step labeled done positioning below. If there are multiple such positions that are equidistant from their current position, use the highest one amongst them; if there are several at that height, then use the leftmost one amongst them."

This clause implies that, not only must the UA find one solution to the fitting problem, but many-or-all possible solutions, and score them.

Without an actual algorithm describing this behavior, this section is basically unimplementable.
Comment 1 Jer Noble 2013-05-14 16:06:34 UTC
Additionally, the section is confusingly written, with repeated references to "boxes in boxes" (as opposed to "each box in boxes").
Comment 2 Jer Noble 2013-05-14 16:07:51 UTC
Furthermore, the clause "and all the boxes in output would be within the video's rendering area" is meaningless.  This probably should read "each box in boxes", rather than "all the boxes in output", since it is the boxes in boxes (argh!) which are being moved, not those in output.
Comment 3 Edward O'Connor 2013-05-14 17:42:12 UTC
Fixing component.
Comment 4 Jer Noble 2013-05-14 19:22:52 UTC
In addition, <http://cdn.memegenerator.net/instances/400x/37819581.jpg>.
Comment 5 Silvia Pfeiffer 2013-05-15 02:45:15 UTC
(In reply to comment #4)
> In addition, <http://cdn.memegenerator.net/instances/400x/37819581.jpg>.

It's in my court now - I'll unpack boxes in time ;-) (will be interested to look at existing implementations)
Comment 6 Silvia Pfeiffer 2013-07-13 07:41:20 UTC
None of the other subtitle formats that I'm aware of have overlap avoidance as a feature. CEA708 deals with overlap via a "priority" feature, which is similar to the CSS z-index. 

Should we just drop the algorithm for overlap avoidance?
Comment 7 Simon Pieters 2013-08-07 12:43:59 UTC
Isn't overlap avoidance nice to have for end users? In particular if two tracks are enabled at the same time.

I think it doesn't have to be perfect, but it should be there and we should just spec a simple algorithm for it.
Comment 8 vcarbune 2013-08-07 12:53:44 UTC
(In reply to comment #7)
> Isn't overlap avoidance nice to have for end users? In particular if two
> tracks are enabled at the same time.
> 
> I think it doesn't have to be perfect, but it should be there and we should
> just spec a simple algorithm for it.

I agree that it's nice to have for end users, but isn't it enough to have this feature for the snap-to-lines case? From the author standpoint, I think there should be at least one way to specify caption positioning exactly as it ends up on the video, without any other algorithm interfering with it (the non-snap-to-lines case seems suitable for this, IMHO).
Comment 9 Silvia Pfeiffer 2013-08-07 13:08:40 UTC
I guess the new region feature allows exact positioning and ignores overlap, so we could keep an overlap-avoidance for the non-region positioning. But we need to clarify it so that it will have the same effect in all browsers, is efficient, works even when the font size is changed, and doesn't move text off the video viewport.
Comment 10 Simon Pieters 2013-08-08 08:32:53 UTC
Ah, yeah I guess it's sufficient if only snap-to-lines does overlap avoidance.
Comment 11 Simon Pieters 2013-08-08 08:35:56 UTC
But what should happen in the following case

00:00:00.000 --> 00:00:10.000
First

00:00:01.000 --> 00:00:10.000 line:100%
Second

Should they overlap or should the first cue get out of the way?
Comment 12 Jer Noble 2013-08-11 16:32:17 UTC
(In reply to comment #8)
> I agree that it's nice to have for end users, but isn't it enough to have
> this feature for the snap-to-lines case? From the author standpoint, I think
> there should be at least one way to specify caption positioning exactly as
> it ends up on the video, without any other algorithm interfering with it
> (the non-snap-to-lines case seems suitable for this, IMHO).

Given the FCC requirement that end users can easily manipulate the style of text track cues, it will be nearly impossible for authors to "specify caption positioning exactly as it ends up on the video", and if they try, user-selected styles will definitely cause overlapping.
Comment 13 vcarbune 2013-11-01 11:09:03 UTC
(In reply to Simon Pieters from comment #11)
> But what should happen in the following case
> 
> 00:00:00.000 --> 00:00:10.000
> First
> 
> 00:00:01.000 --> 00:00:10.000 line:100%
> Second
> 
> Should they overlap or should the first cue get out of the way?

Some aspects of choosing one versus the other:
*) If the cue comes later in the text track cue order might imply that it's more relevant (e.g. describes what's currently happening in the video).
*) Re-positioning of existing cues sounds like chaos for the viewer.
*) An algorithm that triggers re-positioning of existing cues certainly has higher complexity than one that searches for a position for only the last one.

For simplicity, I suggest that the text track cue order dictates which one will get out of the way. In this case, First is positioned, Second comes afterwards and therefore needs to have a different position.

The algorithm should just solve the problem of positioning the K+1 boxes such that it doesn't overlap any of the other existing and fixed K boxes and the distance from the desired position is minimal.

Problems with any algorithm:
*) If a cue can't be positioned at all without overlap, should it be thrown away, or positioned where the overlap is minimal?

I would like to see it positioned where the overlap is minimal, or at least under a certain threshold value, but I can see how this will increase the algorithm complexity. 

Just as a side note: I don't think it should matter whether one has the snap-to-lines box and the other one doesn't, they should all be treated as boxes (... in boxes).
Comment 14 vcarbune 2013-11-01 11:21:19 UTC
(In reply to vcarbune from comment #13)
> Just as a side note: I don't think it should matter whether one has the
> snap-to-lines box and the other one doesn't, they should all be treated as
> boxes (... in boxes).

Maybe a minor (and obvious) clarification that could also be added later in the authors guidelines in the spec - they are all fixed boxes, but the way the new cue is re-positioned is dictated by the snap-to-lines flag:
*) for the snap-to-lines cues the search strategy, using the existing algorithm in the spec, is the line scan from bottom to top and filling the next line free
*) for the non-snap-to-lines cues the search strategy is the algorithm that we are trying to define in this bug.
Comment 15 Silvia Pfeiffer 2013-11-06 08:47:17 UTC
(In reply to vcarbune from comment #13)
> 
> Problems with any algorithm:
> *) If a cue can't be positioned at all without overlap, should it be thrown
> away, or positioned where the overlap is minimal?

I would propose that if it can't be positioned without overlap, it just be positioned where it would have been positioned had there been no other cues. This is for algorithmic simplicity and determinism - it would be terrible if the biggest space without overlap changed randomly across the video viewport with cues being with differing line lengths and number of newlines.

That raises a question:
If we go with position seeking just for snap-to-lines cues, do simply go through all of the available lines and check if they are occupied and pick a free one, or do we try to find the largest available space even if it's half of the screen and multi-line?
Comment 16 Philip Jägenstedt 2013-11-06 09:46:45 UTC
(In reply to Silvia Pfeiffer from comment #15)
> (In reply to vcarbune from comment #13)
> > 
> > Problems with any algorithm:
> > *) If a cue can't be positioned at all without overlap, should it be thrown
> > away, or positioned where the overlap is minimal?
> 
> I would propose that if it can't be positioned without overlap, it just be
> positioned where it would have been positioned had there been no other cues.
> This is for algorithmic simplicity and determinism - it would be terrible if
> the biggest space without overlap changed randomly across the video viewport
> with cues being with differing line lengths and number of newlines.

I think it's just as well to just abort the algorithm if one has already searched in both direction and haven't found anything. That removes the complexity of saving a fallback position, and having cues on top of other cues isn't helpful in any case.

> That raises a question:
> If we go with position seeking just for snap-to-lines cues, do simply go
> through all of the available lines and check if they are occupied and pick a
> free one, or do we try to find the largest available space even if it's half
> of the screen and multi-line?

I don't think I understand what this means, is the question whether or not absolutely positioned cues should be avoided by the positioning algorithm?
Comment 17 Silvia Pfeiffer 2013-11-06 11:45:53 UTC
(In reply to Philip Jägenstedt from comment #16)
>
> I think it's just as well to just abort the algorithm if one has already
> searched in both direction and haven't found anything.

Abort and not display the cue at all?

> That removes the
> complexity of saving a fallback position, and having cues on top of other
> cues isn't helpful in any case.

It sure is. If it has a non-transparent background color, overlaying it on top gives the user a chance to actually see it. Not displaying it at all is not helpful - deaf people might miss a lot of content.


> > That raises a question:
> > If we go with position seeking just for snap-to-lines cues, do simply go
> > through all of the available lines and check if they are occupied and pick a
> > free one, or do we try to find the largest available space even if it's half
> > of the screen and multi-line?
> 
> I don't think I understand what this means, is the question whether or not
> absolutely positioned cues should be avoided by the positioning algorithm?

No. I am suggesting an algorithm that checks all lines for existing occupancy (including region or percentage based cues) before placing a new line. Since we discussed earlier to just use avoidance in the snap-to-lines case, we only have to worry about which line or potentially which part of a line to place a new cue into. My suggestion is merely to avoid trying to place cues into parts of a line.
Comment 18 Philip Jägenstedt 2013-11-06 12:15:47 UTC
(In reply to Silvia Pfeiffer from comment #17)
> (In reply to Philip Jägenstedt from comment #16)
> >
> > I think it's just as well to just abort the algorithm if one has already
> > searched in both direction and haven't found anything.
> 
> Abort and not display the cue at all?

Yes.

> > That removes the
> > complexity of saving a fallback position, and having cues on top of other
> > cues isn't helpful in any case.
> 
> It sure is. If it has a non-transparent background color, overlaying it on
> top gives the user a chance to actually see it. Not displaying it at all is
> not helpful - deaf people might miss a lot of content.

If cues have an opaque background the cue going to obscure the cue behind it and if the cues have a semi-transparent background then likely neither the background nor the foreground cue will be readable.

An example of real-world captions where trying to salvage overflowing cues leads to an improved experience would be nice.

Note that an equivalent way of specifying this is to just find a non-overlapping and applying clipping to hide the ones that are outside of the viewport.

> > > That raises a question:
> > > If we go with position seeking just for snap-to-lines cues, do simply go
> > > through all of the available lines and check if they are occupied and pick a
> > > free one, or do we try to find the largest available space even if it's half
> > > of the screen and multi-line?
> > 
> > I don't think I understand what this means, is the question whether or not
> > absolutely positioned cues should be avoided by the positioning algorithm?
> 
> No. I am suggesting an algorithm that checks all lines for existing
> occupancy (including region or percentage based cues) before placing a new
> line. Since we discussed earlier to just use avoidance in the snap-to-lines
> case, we only have to worry about which line or potentially which part of a
> line to place a new cue into. My suggestion is merely to avoid trying to
> place cues into parts of a line.

The algorithm for positioning snap-to-line cues does avoid overlap, do you mean to modify it in some way, or to add another algorithm that applies to all cues? (Since I'm not clear what we're talking about, an example might help.)
Comment 19 vcarbune 2013-11-07 22:32:47 UTC
(In reply to Philip Jägenstedt from comment #18)
> If cues have an opaque background the cue going to obscure the cue behind it
> and if the cues have a semi-transparent background then likely neither the
> background nor the foreground cue will be readable.
> 
> An example of real-world captions where trying to salvage overflowing cues
> leads to an improved experience would be nice.

I too think that it's a bit unfair for deaf persons to have no idea that there's extra content that they are missing out, but I agree that displaying unreadable cues might make the situation worse.

Could we go with providing a mechanism of notifying this through the JS API (for custom players) and an UI extension of the native video player that could display a transcript of the cues not visible at that certain time?

> Note that an equivalent way of specifying this is to just find a
> non-overlapping and applying clipping to hide the ones that are outside of
> the viewport.

I can see clipping being applied anyway, and the cue considered dropped only if it's entirely out of the viewport.

(In reply to Silvia Pfeiffer from comment #17)
> No. I am suggesting an algorithm that checks all lines for existing
> occupancy (including region or percentage based cues) before placing a new
> line. Since we discussed earlier to just use avoidance in the snap-to-lines
> case, we only have to worry about which line or potentially which part of a
> line to place a new cue into. My suggestion is merely to avoid trying to
> place cues into parts of a line.

I think that according to Jer's comment #12, we should be specifying an algorithm that avoids overlapping for the non snap-to-lines case too.
Comment 20 Philip Jägenstedt 2013-11-08 06:46:36 UTC
(In reply to vcarbune from comment #19)
> (In reply to Philip Jägenstedt from comment #18)
> > If cues have an opaque background the cue going to obscure the cue behind it
> > and if the cues have a semi-transparent background then likely neither the
> > background nor the foreground cue will be readable.
> > 
> > An example of real-world captions where trying to salvage overflowing cues
> > leads to an improved experience would be nice.
> 
> I too think that it's a bit unfair for deaf persons to have no idea that
> there's extra content that they are missing out, but I agree that displaying
> unreadable cues might make the situation worse.

One bit of content that everyone is missing is the video itself, because it's completely covered with text. It would have to be a very peculiar case where this is done deliberately and the cues remain visible for long enough for everyone to finish reading them.

> Could we go with providing a mechanism of notifying this through the JS API
> (for custom players) and an UI extension of the native video player that
> could display a transcript of the cues not visible at that certain time?

IMHO, this failure mode is so far outside of the reasonable that we should just use the simplest possible error handling and nothing more. Even filling half of the video with text is a complete failure, so any solution to this should ideally trigger far before the whole video is covered.