20349 – [WebVTT] Need to be able to move group of cues to a different on-screen location

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 20349 - [WebVTT] Need to be able to move group of cues to a different on-screen location

Summary: [WebVTT] Need to be able to move group of cues to a different on-screen location

Status:	REOPENED

Alias:	None

Product:	TextTracks CG
Classification:	Unclassified
Component:	WebVTT (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Silvia Pfeiffer
QA Contact:	This bug has no owner yet - up for the taking

URL:
Whiteboard:	v2
Keywords:

Depends on:
Blocks:

Reported:	2012-12-12 00:04 UTC by Silvia Pfeiffer
Modified:	2013-12-09 07:28 UTC (History)
CC List:	7 users (show)

See Also:

Attachments

Description Silvia Pfeiffer 2012-12-12 00:04:31 UTC

Sometimes when you render captions on a video, there is burnt-in text appearing at exactly the location that your captions are rendered. So, you want to be able to move the cues that are currently rendered in that location as a group to another on-screen location while that text is shown and back to the original position again after this.

This can be the case both for pop-on or roll-up captions, e.g. with two cues, where they are both rendered bottom middle, then together moved to top middle to avoid text, then one disappears and the next cue is rendered at top middle, then these two cues move back to the original position bottom middle as the burnt-in text disappears.

The moving of the group of captions' rendering location should be possible in WebVTT. It is also a feature that should be possible interactively in the UA after rendering.

Comment 1 Glenn Maynard 2012-12-12 01:53:31 UTC

In my experience, the usual approach is to just not put the captions there in the first place, not by moving them, then moving them back.  Moving captions around while the viewer is trying to read them would be painfully disruptive, particularly with pop-on captions where the viewer isn't expecting it.

Comment 2 Silvia Pfeiffer 2012-12-12 02:11:44 UTC

It happens on TV already and it helps the reader to move their eyes to the new position where they read the cues rather than suddenly being confused that there is a different text in the location that they used to find the captions and having to actively search with their eyes for the captions in a different on-screen location.

Also, the problem is often-times synchronization. Let's say you have a cue that last for 4 sec (which is typical for a multi-line cue). The on-screen text, however, only comes in at 3 sec. If you paint the cue at the new location directly, the viewer is left to wonder for 3 sec why the cue is suddenly being shown in a different location.

Also, what if your default rendering locations are the top middle and the bottom middle - just as for most burnt-in text. What if you first have a presenter name being displayed at the bottom, then a soccer game score at the top. During that same time you have captions that are time-offset from the burnt-in text exactly by half. You'd have to split up your cues into half and duplicate them at the two different on-screen locations to make this work.

Essentially what I am trying to get to is that we need to separate the concept of making cues visible on screen from their "default" rendering location. It should be simple to say "just render those cues into this box" and then say "and the box should now be here - now be here" with whatever cues are now visible in this box. Incidentally, that's how CEA708 works.

Comment 3 Glenn Maynard 2012-12-12 02:35:37 UTC

(In reply to comment #2)
> It happens on TV already and it helps the reader to move their eyes to the
> new position where they read the cues rather than suddenly being confused
> that there is a different text in the location that they used to find the
> captions and having to actively search with their eyes for the captions in a
> different on-screen location.

Moving the text is even more confusing, since the text I was just trying to read suddenly disappeared at a weird time, and I have to figure out that it moved somewhere else (which I may not even realize) so I can continue reading it.  Worse, I'd probably have to start reading it from scratch, since I was interrupted.  If I already finished reading it and went back to the video, there's yet another artifact: I'll start reading the "new" caption until I realize I'm reading the same text a second time.

> Also, the problem is often-times synchronization. Let's say you have a cue
> that last for 4 sec (which is typical for a multi-line cue). The on-screen
> text, however, only comes in at 3 sec. If you paint the cue at the new
> location directly, the viewer is left to wonder for 3 sec why the cue is
> suddenly being shown in a different location.

I'd rather vaguely wonder about caption positioning than have captions shift around out from underneath me.  (I recall wondering about that now and then myself, for the second or two before the reason appears on screen, but it didn't impede my ability to read the subtitles.)

> Also, what if your default rendering locations are the top middle and the
> bottom middle - just as for most burnt-in text. What if you first have a
> presenter name being displayed at the bottom, then a soccer game score at
> the top. During that same time you have captions that are time-offset from
> the burnt-in text exactly by half. You'd have to split up your cues into
> half and duplicate them at the two different on-screen locations to make
> this work.

You'd probably only have to do that for one or two captions, which doesn't seem unreasonable if that's really how you want it to render.  It'd be a little "nicer" to not have to duplicate captions, but I doubt that would justify the added complexity.

Comment 4 Silvia Pfeiffer 2012-12-12 02:42:00 UTC

(In reply to comment #3)
> (In reply to comment #2)
> > Also, what if your default rendering locations are the top middle and the
> > bottom middle - just as for most burnt-in text. What if you first have a
> > presenter name being displayed at the bottom, then a soccer game score at
> > the top. During that same time you have captions that are time-offset from
> > the burnt-in text exactly by half. You'd have to split up your cues into
> > half and duplicate them at the two different on-screen locations to make
> > this work.
> 
> You'd probably only have to do that for one or two captions, which doesn't
> seem unreasonable if that's really how you want it to render.  It'd be a
> little "nicer" to not have to duplicate captions, but I doubt that would
> justify the added complexity.

Not by itself, but separating the cues from the rendering region (called "window" by the FCC) also plays towards the FCC requirement for rollup captions and the need to paint a different background on windows.

Comment 5 Silvia Pfeiffer 2012-12-12 02:43:57 UTC

Plus I completely forgot to go into the need to allow UAs to change the rendering target for cues through manual interaction of users. If a user moves a cue to a different on-screen location to avoid overlapping something they want to see underneath, they want to make sure that other cues rendered to that location are also moved, and continue to be rendered to the new location.

Comment 6 Ian 'Hixie' Hickson 2013-03-15 22:05:59 UTC


*** This bug has been marked as a duplicate of bug 17273 ***

Comment 7 Loretta Guarino Reid 2013-11-26 00:32:22 UTC

We would like the ability for the author to move or resize a region. This is different from the issue discussed in 17273.

One way to do this is to allow regions to be defined (and redefined) inline. This would be helpful for live captions, and would let authors move or resize regions by redefining them within the file.

Comment 8 Philip Jägenstedt 2013-11-26 19:54:39 UTC

Does 608 have this feature, or is it one required by the FCC? Just taken in isolation, it seems like a weird little feature which would add complexity.

Specifically, we'd have give regions timings, so that they are actually updated at a particular time, and not whenever the new definition happens to be parsed. Needs a bit of pondering...

Comment 9 Ian 'Hixie' Hickson 2013-11-26 20:11:44 UTC

Loretta, can you explain how this differs from the other bug?

Comment 10 Loretta Guarino Reid 2013-11-26 22:34:22 UTC

(In reply to Ian 'Hixie' Hickson from comment #9)
> Loretta, can you explain how this differs from the other bug?

Bug 17273 is wrestling with how cues can avoid overlapping, including reserving space that shouldn't be overlapped. 

Moving a cue to a different location is an operation under the control of the author or the user. The cues that are being moved have already been positioned and rendered once (possibly invoking overlap avoidance, depending upon how they were positioned). It may result in overlap (although ideally not). 

The two features clearly overlap, and there are problems that could use either solution. If this interpretation is a change in the intention of this bug, I can open a new one.

Comment 11 Silvia Pfeiffer 2013-11-27 04:07:57 UTC

(In reply to Loretta Guarino Reid from comment #10)
> If this interpretation is a change in the intention of this
> bug, I can open a new one.

Moving a group of cues as an authoring feature is certainly the feature that I had in mind when I opened the bug.

The overlap avoidance that this bug addresses is not about avoiding overlap with a region of the video that should never have any captions, but it's about avoiding overlap with a temporarily present feature in the video (such as: there is burnt-in text during the duration of this cue at the cue location).

Comment 12 Rick Eyre 2013-11-27 14:26:31 UTC

Wasn't there an issue with doing live WebVTT and being able to redefine regions as well? The ability to redefine a region anywhere in the file would solve Loretta's issue and the live WebVTT one.

Comment 13 Silvia Pfeiffer 2013-11-27 22:31:21 UTC

The live case that we discussed at FOMS related to HLS. We discussed that in the HLS case, where WebVTT segments are formed by grouping one or more cues (IIUC). In that case, every WebVTT segment needs to contain also a copy of the region definitions that are used by the cues in that segment. In this case it would be simple to have different region definitions for a group of cues (i.e. for a time interval that's only a part of the full WebVTT file timeline).

For general WebVTT files, introducing a region active time interval would indeed solve that problem. By default, a region spec would then be active for the full duration of the webvtt file.

Comment 14 Ian 'Hixie' Hickson 2013-11-27 23:50:26 UTC

If this is just about moving cues around the video viewport, then isn't that jut a WebVTT API issue? And a resolved one, at that, since you can manually change the position of any cue, from script?

I don't really understand the use case here.

Comment 15 Loretta Guarino Reid 2013-11-28 00:10:58 UTC

(In reply to Ian 'Hixie' Hickson from comment #14)
> If this is just about moving cues around the video viewport, then isn't that
> jut a WebVTT API issue? And a resolved one, at that, since you can manually
> change the position of any cue, from script?
> 
> I don't really understand the use case here.

User repositioning can (must?) be implemented using the WebVTT API. But author repositioning needs to be supported in the file itself.

Comment 16 Silvia Pfeiffer 2013-11-28 00:55:03 UTC

Ian: this bug is not a WHATWG bug - maybe you got confused.

Comment 17 Rick Eyre 2013-11-28 01:22:19 UTC

(In reply to Silvia Pfeiffer from comment #13)
> The live case that we discussed at FOMS related to HLS. We discussed that in
> the HLS case, where WebVTT segments are formed by grouping one or more cues
> (IIUC). In that case, every WebVTT segment needs to contain also a copy of
> the region definitions that are used by the cues in that segment. In this
> case it would be simple to have different region definitions for a group of
> cues (i.e. for a time interval that's only a part of the full WebVTT file
> timeline).
>
> For general WebVTT files, introducing a region active time interval would
> indeed solve that problem. By default, a region spec would then be active
> for the full duration of the webvtt file.

Works for me. Thanks for clarifying that Silvia.

Comment 18 Ian 'Hixie' Hickson 2013-12-02 20:25:02 UTC

Ah, I see, the assignee didn't get updated cos it was reopened. Nevermind! :-)

Comment 19 Philip Jägenstedt 2013-12-06 11:51:25 UTC

Sanity check: is it absolutely necessary to solve this declaratively? It should be quite straightforward to update a region using the API, after all.

To make this work declaratively, we'd have to add syntax for updating a region at a particular time.

If bug 17273 were fixed, could we live with not knowing exactly where the region is moved, safe in the knowledge that it at least won't overlap stuff in the video?

Comment 20 Philip Jägenstedt 2013-12-06 22:21:49 UTC

Valuable information from Ken via email:

---------- Forwarded message ----------

Hi folks,

Loretta forwarded this bug to me.  Since I'm not yet officially a
working group member (although I have finally applied as of today),
I thought I'd just send my response directly.  You can cut and paste
the bits of interest, or we can continue discussing it off-group
before reporting a conclusion.

With respect to using a JS API method to "move" a region, the
problem is that this API can only be invoked by a program.  A WebVTT file
is not a program (unless you squint really really hard).  The question
is really how, within WebVTT, to represent the fact that a region
which contains text is being moved to a different location.

Why is this needed?  Because that capability is present in
CEA-608 (and 708).  It is frequently used to move rollup
windows from top to bottom and vice versa, so as to get the
text out of the way of some underlying information in the video.

There are three ways to accomplish this:

(1) Use duplicate captions (cues).  Whatever text was in
the previous region gets repeated as a new cue in a different
region.  The downside of this approach is it makes it hard
to convert WebVTT cues into other forms, such as transcripts,
without having a lot of seemingly duplicated text.

(2) Use an explicit action operation saying "at time T,
move region thusly".  This turns out to cause a lot of
messy implementation problems, especially if you want
things to look right even while moving the play head around.

(3) The solution we've adopted with our own internal
formats is to give every region a time range (start time,
stop time) just like a cue, and re-use region identifiers
with different definitions.  For a re-size, the original
region definition and the newer one would have
contiguous but non-overlapping time ranges.

A cue that references a certain region ID thus gets a
region definition that depends on the actual time of
the play head.  This allows us to specify the cue text
only once (no duplicates), and works exactly as desired
even if the play head is seeking around at random.

As a bonus, this use of redefinition also lets us trivially
implement region re-sizing, which is another
CEA-608/708 operation that is also used in
broadcasts (changing the # of lines used in rollup
windows).

Hope this explanation helps!

--Ken