18501 – WebVTT: position:0%, position:50% and position:100% are all weird/broken

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 18501 - WebVTT: position:0%, position:50% and position:100% are all weird/broken

Summary: WebVTT: position:0%, position:50% and position:100% are all weird/broken

Status:	RESOLVED DUPLICATE of bug 20037

Alias:	None

Product:	TextTracks CG
Classification:	Unclassified
Component:	WebVTT (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	---
Assignee:	Ian 'Hixie' Hickson
QA Contact:	This bug has no owner yet - up for the taking

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2012-08-08 13:07 UTC by Philip Jägenstedt
Modified:	2012-11-21 19:52 UTC (History)
CC List:	5 users (show)

See Also:

Attachments
Screen shot video with middle aligned captions on the left (57.48 KB, image/jpeg) 2012-09-12 23:47 UTC, Silvia Pfeiffer	Details
left positioned middle aligned cue (52.67 KB, image/jpeg) 2012-09-12 23:52 UTC, Silvia Pfeiffer	Details

Description Philip Jägenstedt 2012-08-08 13:07:18 UTC

The concept is defined in http://www.whatwg.org/specs/web-apps/current-work/multipage/the-video-element.html#text-track-cue-text-position as

"A number giving the position of the text of the cue within each line, to be interpreted as a percentage of the video, as defined by the writing direction."

However, in http://dev.w3.org/html5/webvtt/#webvtt-cue-text-rendering-rules it influences the maximum size is non-obvious ways:

* For align:middle, both position:0% and position:100% results in a maximum size 0.
* For align:start, position:100% results in a maximum size 0.
* For align:end, position:0% results in a maximum size 0.

The default position:50% also interacts badly with align:start and align:end, where (for left-to-right horizontal text), align:start by default makes the text occupy the right half of the viewport, while align:end makes the text occupy the left half of the viewport.

Evidence of this being less than obvious is http://www.delphiki.com/html5/playr/examples/dw_trailer_low.vtt where the author clearly assumed that position:0% would do something useful, perhaps left-aligning?

It seems to me that what authors needs the ability to specify is:

1. The width of the boxes to fit the cue text into (0-100%)
2. The alignment of cue text within those boxes (start/middle/end)
3. The alignment of those boxes within the viewport if they are not 100% wide.

We have 1 (size:50%) and 2 (align:end) but 3 is not easy. The solution I would find most intuitive is to redefine position to be more similar to e.g. CSS background-position, such that 0% means "all the way to the left" and 100% means "all the way to the right". That way, size:50% position:100% would result in the right half of the viewport being used. The default position should remain 50%, but would have no effect unless width < 100%.

Comment 1 Ian 'Hixie' Hickson 2012-08-08 23:08:31 UTC

This is working as designed. If you center something, the algorithm attempts to position the box centered at the given position. Since it also avoids any overflow, that means that a centered box on the edge can't have any width.

I don't see why this is a problem.

We could have a different algorithm, but then we should start over and design it from scratch, not try to bolt-on a different behaviour for edge cases.

Comment 2 Philip Jägenstedt 2012-08-09 09:30:04 UTC

I think it's a problem that a position property valued 0-100% degenerates into something broken at 0% and 100% and is completely different from CSS's background-position and object-position.

Julien just linked to <http://leanbackplayer.com/other/webvtt.html#cue-settings-textposition>, which has the same misunderstanding.

To summarize, position currently means is:
 * for align:center, the point where the text is centered
 * for align:start, the point where the text starts
 * for align:end, the point where the text ends

This looks logical, but since the default position is 50%, it essentially becomes mandatory together with align:start and align:end.

At a minimum, we should make align:start and align:end by default occupy the full width of the viewport, but have the text left/right-aligned. Given this input:

WEBVTT

00:00:00.000 --> 00:00:30.000 align:start line:0
align:start

00:00:00.000 --> 00:00:30.000 align:middle line:1
align:middle

00:00:00.000 --> 00:00:30.000 align:end line:2
align:end


00:00:00.000 --> 00:00:30.000 size:50% align:start line:4
size:50% align:start

00:00:00.000 --> 00:00:30.000 size:50% align:middle line:5
size:50% align:middle

00:00:00.000 --> 00:00:30.000 size:50% align:end line:6
size:50% align:end


00:00:00.000 --> 00:00:30.000 position:0% size:80% align:start line:8
position:0% size:80% align:start

00:00:00.000 --> 00:00:30.000 position:50% size:80% align:middle line:9
position:50% size:80% align:middle

00:00:00.000 --> 00:00:30.000 position:100% size:80% align:end line:10
position:100% size:80% align:end

Compare the rendering of the current algorithm and a hacked algorithm:

http://people.opera.com/philipj/2012/08/webvtt/position-opera-next.png
http://people.opera.com/philipj/2012/08/webvtt/position-opera-hack.png

The quick hack is to remove the big chunk of logic for setting x-position based on alignment, and simply letting x-position be ("text track cue text position" * (100 - size)) / 100.

We could start over from scratch, but this bug is really only about how align, position and size interact, and it certainly seems possible to improve without starting over.

Comment 3 Silvia Pfeiffer 2012-08-09 12:31:05 UTC

I was under the impression that what Ronald demonstrated at http://leanbackplayer.com/other/webvtt.html#cue-settings-textposition was correct:

position: 0% of the video width implied that the 0% position of the rendering box was attached to 0% of the video width, similarly 50% and 100%. None of the these change the size of the rendering box.

I think that differs from either of your understandings of the spec.

I also think that what Ronny described is now implemented in WebKit.

Comment 4 Philip Jägenstedt 2012-08-10 09:23:42 UTC

Silvia, I'm unable to get WebVTT rendering at all in recent Chrome or Chromium, even though the DOM interfaces are there, are you sure the rendering has been implemented? Even though I dislike the current spec, it would be quite surprising if the implementors in WebKit had ignored it to the point you're suggesting. If you could poke whoever did the WebKit implementation to take a look at this bug and comment, that'd be much appreciated.

Comment 5 vcarbune 2012-08-10 10:00:14 UTC

(In reply to comment #4)

Hi Philip,

> Silvia, I'm unable to get WebVTT rendering at all in recent Chrome or Chromium,
> even though the DOM interfaces are there, are you sure the rendering has been
> implemented? Even though I dislike the current spec, it would be quite
> surprising if the implementors in WebKit had ignored it to the point you're
Discrepancies from the current spec are not on purpose and WebKit rendering is still work-in-progress. A patch for implementing rendering ad-literam to the current spec is on its way.

> suggesting. If you could poke whoever did the WebKit implementation to take a
> look at this bug and comment, that'd be much appreciated.
I have wrote an email about this exact same issue you pointed here, back in April [1], but I guess it got lost because I didn't file it as a proper bug. Your examples illustrate most of the limitations I had in mind as well.

Right now these properties (size, alignment and position) are quite strong coupled and I consider that this blocks the flexibility we can offer to authors. 

For point number 3) that you have mentioned in your initial comments, it would be very easy to achieve very flexible positioning using anchor points [2]. This would enhance a lot the possibility of annotating videos, using cues without snap-to-lines flag set (so both horizontal / vertical alignment within the viewport of a specified fixed width box that contains the cues).

[1] http://lists.w3.org/Archives/Public/public-texttracks/2012Apr/0055.html 
[2] https://www.w3.org/Bugs/Public/show_bug.cgi?id=15859

Comment 6 Silvia Pfeiffer 2012-08-10 10:35:24 UTC

(In reply to comment #4)
> Silvia, I'm unable to get WebVTT rendering at all in recent Chrome or Chromium,
> even though the DOM interfaces are there, are you sure the rendering has been
> implemented?

You need to activate the Text Track interfaces experiment in "chrome://flags". It hasn't been released.

Oh, and Victor, one of the core implementers, has already replied.

Comment 7 Philip Jägenstedt 2012-08-10 14:03:50 UTC

(In reply to comment #5)
> (In reply to comment #4)
> 
> Hi Philip,
> 
> > Silvia, I'm unable to get WebVTT rendering at all in recent Chrome or Chromium,
> > even though the DOM interfaces are there, are you sure the rendering has been
> > implemented? Even though I dislike the current spec, it would be quite
> > surprising if the implementors in WebKit had ignored it to the point you're
> Discrepancies from the current spec are not on purpose and WebKit rendering is
> still work-in-progress. A patch for implementing rendering ad-literam to the
> current spec is on its way.

Excellent! I hope that we can rapidly evolve the spec and implementations together, before people start depending on the buggy bits.

> > suggesting. If you could poke whoever did the WebKit implementation to take a
> > look at this bug and comment, that'd be much appreciated.
> I have wrote an email about this exact same issue you pointed here, back in
> April [1], but I guess it got lost because I didn't file it as a proper bug.
> Your examples illustrate most of the limitations I had in mind as well.

I remember seeing that email now, but at the time I didn't really understand what the spec was trying to do, and it looks like there was no discussion...

> Right now these properties (size, alignment and position) are quite strong
> coupled and I consider that this blocks the flexibility we can offer to
> authors. 

I think it should be possible with the current spec to have text cues of any width at any position with any text alignment inside, but I don't think it's very nice to work with. What do you think about:

1. Let size set the width (height) directly.
2. Let position distribute the remaining width (height) to the left or right (top or bottom), just like e.g. CSS's background-position. Remove the coupling to (maximum) size.
3. Let align set the alignment within the resulting box. Remove the coupling to position.

At least in our implementation, this was a simplification, and seems easier to work with as an author. It's tempting to also let align influence the default value of position so that align:start size:60% implies position:0% (the left 60% of the viewport) and not position:50% (middle 60% of the viewport), but that'd require introducing something like 'auto' as the default.

> For point number 3) that you have mentioned in your initial comments, it would
> be very easy to achieve very flexible positioning using anchor points [2]. This
> would enhance a lot the possibility of annotating videos, using cues without
> snap-to-lines flag set (so both horizontal / vertical alignment within the
> viewport of a specified fixed width box that contains the cues).
> 
> [1] http://lists.w3.org/Archives/Public/public-texttracks/2012Apr/0055.html 
> [2] https://www.w3.org/Bugs/Public/show_bug.cgi?id=15859

I'm not sure I fully understand this bug, can you express it in terms of changes to the existing spec for comparison with my suggestion?

Comment 8 vcarbune 2012-08-12 15:38:48 UTC

> 1. Let size set the width (height) directly.
> 2. Let position distribute the remaining width (height) to the left or right
> (top or bottom), just like e.g. CSS's background-position. Remove the coupling
> to (maximum) size.
> 3. Let align set the alignment within the resulting box. Remove the coupling to
> position.
The drawback that I see by removing completely the maximum size is that an author would have no way of actually specifying where the cue text can start and leave the remaining size by the end of the video to be automatically determined.

Since working with text requires a finite and exact box to where the text needs to flow and wrap (unlike an image, where the width / height are predefined) I think it would be good to still have a maximum size, defined exactly as you mentioned at 2 (the remaining width (height) ... depending on horizontal / vertical / ltr / rtl), but decoupled completely from the "align:" value.

In other words, I would prefer if the position would specify the resulting x-position of the left (right) edge of the box and the text to flow in the remaining width until the right (left) edge of the video.

> At least in our implementation, this was a simplification, and seems easier to
> work with as an author. It's tempting to also let align influence the default
> value of position so that align:start size:60% implies position:0% (the left
> 60% of the viewport) and not position:50% (middle 60% of the viewport), but
> that'd require introducing something like 'auto' as the default.
I would say that we need to properly define the box size and position in which the text is rendered completely independent of the align property.

Summing up, I agree with 1. and 3. with independent size and alignment, but at 2. I would keep the maximum size as being a sort of "auto" value for unspecified size, but decoupled from the alignment.

> > [2] https://www.w3.org/Bugs/Public/show_bug.cgi?id=15859
> 
> I'm not sure I fully understand this bug, can you express it in terms of
> changes to the existing spec for comparison with my suggestion?
Consider the resulting box (defined by width and height after text wrapping) in which the text is rendered. An anchor point would allow you to specify which point *within* the cue box actually matches the (text, line) position within the video.

Comparing to your suggestion, and in the case of snap-to-lines not set it would mean allowing the author to specify which point within the box is actually located at position% (the left edge, the center, the right edge or another variable point), instead of automatically assuming that the point at position% across the box width matches the position% across the video width (which happens for CSS background-position).

Comment 9 Silvia Pfeiffer 2012-09-12 23:47:44 UTC

Created attachment 1184 [details]
Screen shot video with middle aligned captions on the left

Screen shot video with middle aligned captions on the left

Comment 10 Silvia Pfeiffer 2012-09-12 23:51:21 UTC

The one thing that I think is really wrong with the layout is that the width of a cue (i.e. "size") is changed depending on the position, even if "size" was explicitly set by the author and even if there is sufficient space in the video viewport to display the cue with that "size".

For example: here is a cue that is middle aligned, but positioned at the left of the viewport:

00:00:04.367 --> 00:00:06.400 align:middle position:10% size:50%
I AM AT THE LEFT
OF THE SCREEN.

(also see the image I just attached).

Since the cue is middle aligned and positioned at 10% of the view port, the width of the cue is reduced to 50% * 1/2 - 10% = 15% rather than the requested 50% of the author. There is plenty of space for the 50%. This automatic change of the cue width makes it almost impossible to place middle aligned text anywhere but in the middle of the viewport.

Comment 11 Silvia Pfeiffer 2012-09-12 23:52:54 UTC

Created attachment 1185 [details]
left positioned middle aligned cue

left positioned middle aligned cue

Comment 12 Silvia Pfeiffer 2012-09-12 23:55:22 UTC

The problem cumulates when trying to render a middle aligned cue at the left edge of the video:

00:00:04.367 --> 00:00:06.400 align:middle position:0% size:50%
I AM AT THE LEFT
OF THE SCREEN.

(See the second attached image).

As Philip says: it reduces the width of the cue to 0. Both Opera and (as in the screenshot above) Chrome try to make something sensible of it and at least display some text, but according to the spec, the cue's width is 0 and nothing would be displayed.

This makes it impossible to position middle aligned text at the left edge of the video.

Comment 13 Simon Pieters 2012-09-13 06:14:11 UTC

So essentially, what is requested here is for size to be respected and position changed, instead of position respected and size changed, when they have values such that the text would end up outside the viewport if both were to be respected?

Comment 14 Silvia Pfeiffer 2012-09-13 06:52:03 UTC

Yes, I think that makes more sense than what we currently have.

Comment 15 Silvia Pfeiffer 2012-09-13 06:59:20 UTC

I also find the position calculation not intuitive.

I always thought that independent of the align property, a cue (that is not 100% wide) will be positioned x% along the video viewport's width *at x% of its own width*.

However, IIUC right now position x% of a cue box is specified to be *at 0% of its own width if it's left aligned*, *at 50% of its own width if it's middle aligned* and *at 100% of its own width when it's right aligned*.

If instead we regarded the cue text rendering area as a reduced-width but percentage-equivalent stretch-area of the 100% video width, it becomes easier to position it without having to make any changes to the size.

Comment 16 Silvia Pfeiffer 2012-10-28 16:10:18 UTC

Replace:

Position the boxes in boxes such that the point x% along the width of the bounding box of the boxes in boxes is x% of the way across the width of the video's rendering area, and the point y% along the height of the bounding box of the boxes in boxes is y% of the way across the height of the video's rendering area, while maintaining the relative positions of the boxes in boxes to each other.

With:

Position the boxes in boxes such that:

* for left aligned or (start aligned & ltr) horizontal cues the 0% point along the width of the bounding box of the boxes in boxes is x% of the way across the width of the video's rendering area,

* for middle aligned horizontal cues the 50% point along the width of the bounding box of the boxes in boxes is x% of the way across the width of the video's rendering area,

* for right aligned or (end aligned & ltr) horizontal cues the 100% point along the width of the bounding box of the boxes in boxes is x% of the way across the width of the video's rendering area.

(rtl and vertical cues analogously)

Comment 17 Silvia Pfeiffer 2012-10-28 16:12:19 UTC

Ignore that last comment - though related, I hit submit in the wrong bug with an only half-written comment. Please read bug 19178 comment #2 instead.

Sorry.

Comment 18 Ian 'Hixie' Hickson 2012-10-29 23:08:34 UTC

(In reply to comment #2)
> 
> 00:00:00.000 --> 00:00:30.000 align:start line:0
> align:start
> 
> 00:00:00.000 --> 00:00:30.000 align:middle line:1
> align:middle
> 
> 00:00:00.000 --> 00:00:30.000 align:end line:2
> align:end

> http://people.opera.com/philipj/2012/08/webvtt/position-opera-next.png

These aren't what I intended.

I think there's some logic to the idea of making size win instead of position, but I'll have to think about it some more to see what implications this has (especially with alignment and bidi).

Comment 19 Ian 'Hixie' Hickson 2012-11-21 19:52:01 UTC


*** This bug has been marked as a duplicate of bug 20037 ***