This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28257 - [webvtt] start/end linked to left/right [I18N-ISSUE-422]
Summary: [webvtt] start/end linked to left/right [I18N-ISSUE-422]
Status: RESOLVED MOVED
Alias: None
Product: TextTracks CG
Classification: Unclassified
Component: WebVTT (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: Web Media Text Tracks CG
URL:
Whiteboard: widereview
Keywords:
Depends on:
Blocks:
 
Reported: 2015-03-22 00:05 UTC by Silvia Pfeiffer
Modified: 2017-08-09 12:01 UTC (History)
8 users (show)

See Also:
silviapfeiffer1: needinfo? (ishida)
silviapfeiffer1: needinfo? (addison)


Attachments

Description Silvia Pfeiffer 2015-03-22 00:05:05 UTC
Feedback by Addison Phillips from W3C I18N group:
http://lists.w3.org/Archives/Public/public-tt/2015Mar/0056.html

I18N comment: https://www.w3.org/International/track/issues/422

http://www.w3.org/TR/webvtt1/#other-features

In section 1.3 there is feature for defining the width and alignment of text. The keywords used are 'start' and 'end', which is great. However, the text describing this feature implies that 'start' is always 'left' and end is always the opposite side. Here's what the text says:

--
The first cue has its cue box left aligned at the 10% mark of the video viewport width and the text is left aligned within that box - probably underneath a speaker on the left of the video image. "start" alignment of the cue box is the default for start aligned text, so does not need to be specified in "position". The second cue has its cue box right aligned at the 90% mark of the video viewport width. The same effect can be achieved with "position:55%,start", which explicitly positions the cue box. The third cue has middle aligned text within the same type of cue box as the first cue.
--

Explicit support for text directionality should be provided and a clear linkage established between direction of the text and 'start'/'end' positioning.
Comment 1 Philip Jägenstedt 2015-03-23 02:56:09 UTC
This is an example, I think this is accurately described in the model:
http://www.w3.org/TR/webvtt1/#dfn-text-track-cue-text-alignment

Is that sufficient, or is there a concrete change to the example to suggest?
Comment 2 Addison Phillips 2015-03-23 23:04:11 UTC
I think the text pointed to in Comment 1 is correct and meets the needs. There remain two issues here, one small and one large.

The larger issue is covered nicely by bug 28266, so let's use this bug to focus on the smaller issue of the example:

The example provided in 1.3 could be modified to at least acknowledge the linkage between text direction and the choice of left or right sides. While it is an example, reading it gives the distinct impression of start == left, particularly since the example says:

--
The first cue has its cue box left aligned at the 10% mark of the video viewport width and the text is left aligned within that box - probably underneath a speaker on the left of the video image.
--

If you wanted to ensure that the text was underneath the speaker on the left side, one would use the "left" keyword rather than "start". If the subtitles appeared translated into Arabic, for example, the "start" would be on the right side and no longer underneath the character.

I would suggest rephrasing as something like:

--
The first cue has its cue box "start" (in this case left) aligned at the 10% mark of the video viewport width with the text left aligned within that box - possibly underneath a speaker on the left of the video image.
--
Comment 3 Simon Pieters 2015-04-21 07:50:34 UTC
Alternatively this example should be using 'left' and 'right' since it seems to want to position the cue together with the speaker. We can add another example where 'start'/'end' make more sense and be clear about LTR vs RTL.
Comment 4 Silvia Pfeiffer 2015-06-27 13:24:48 UTC
The "position" cue setting always positions the cue box left, middle, right for horizontal cues. It does not react to text directionality (start/end are used here to deal with horizontal/vertical cues).

It's the "align" cue setting that reacts to directionality and has start, middle, end, left, right settings.

The example has a cue box width of 35% so that the cue box is positioned from 10-45% of the video's width, i.e. the left. This does not change with directionality. The "align" setting deliberately has "start" though, so the text within that box either goes from the left to the right, or the right to the left.

Thus, the first cue is correct and so are the others.

However, I've created a pull request that should really help explain this much better: https://github.com/w3c/webvtt/pull/200

See what you think.
Comment 5 Richard Ishida 2015-07-29 18:54:09 UTC
i'm inclined to think that mixing information about the alignment of the text with the positioning of the text in the explanation is one of the things that makes people jump to incorrect conclusions about what's going on.

Would this version help?

<p>Since the cues in these examples are horizontal, the position setting refers to a percentage of the width of the video viewpoint. If the text were vertical, the position setting would refer to the height of the viewport.</p>

<p>The first cue has its <a title="WebVTT cue box">cue box</a> positioned at the 10% mark. The "start" and "end" within the "position" setting indicates which side of the a title="WebVTT cue box">cue box</a> the position refers to. Since in this case the text is horizontal, "start" refers to the left side of the box, and the cue box is thus positioned between the 10% and the 45% mark of the video viewport's width, probably underneath a speaker on the left of the video image. If the cue was vertical, "start" positioning would be from the top of the video viewport's height and the <a title="WebVTT cue box">cue box</a> would cover 35% of the video viewport's height.</p>

<p>The "start" or "end" in this case only refers to the physical side of the box to which the position setting applies, in a way which is agnostic regarding the horizontal or vertical direction of the cue.  It does not affect or relate to the direction or position within the box of the text itself. </p>

<p>The text within the first cue's cue box is aligned using ...
Comment 6 Silvia Pfeiffer 2015-08-09 07:51:15 UTC
Richard - I've added most of your feedback to the PR at https://github.com/w3c/webvtt/pull/200 . Please have another look.
Comment 7 Silvia Pfeiffer 2015-09-30 23:08:12 UTC
Richard: ping. We'll land the patch in a week, if there is no new feedback.
Comment 9 Addison Phillips 2015-11-23 18:34:29 UTC
I'm not sure I agree with what has happened here. Reading the text Richard supplied that appears in the patch, I felt we were "on track", but reading the specific text for processing makes me think we haven't addressed the original problem.

In CSS there is a difference between the direction neutral keywords 'start'/'end' and the direction specific keywords 'left'/'right'. (Ditto their vertical friends 'before'/'after' and 'top'/'bottom'). The difference is that the direction or writing mode of the text affects the meaning of the direction neutral flavored items. In an RTL context, 'start' means 'right' and 'end' means 'left'. This is done so that when the base direction of the layout changes (for example: you localize the file), you don't have to go through and rebuild all of the positioning "backwards". It just works.

I understand the problem of needing to position cues under a specific part of the screen (such as which side the speaking character appears on). In this case, you would want to use direction-specific keywords like 'left' and 'right' to ensure that the cue box doesn't "mirror" when the language is changed to Arabic and end up on the wrong side. But in other, more general, cases, you *do* want the mirroring to occur.

In looking at the text this morning I see the formal description of these words in Section 3.1 under 'position' says:

---
If the cue text alignment is start or left, return 0 and abort these steps.

If the cue text alignment is end or right, return 100 and abort these steps.

If the cue text alignment is middle, return 50 and abort these steps.
---

This means that the direction neutral words are just synonyms for their direction specific brethren and we haven't solved the problem.
Comment 10 Silvia Pfeiffer 2015-11-24 11:46:49 UTC
(In reply to Addison Phillips from comment #9)
> I understand the problem of needing to position cues under a specific part
> of the screen (such as which side the speaking character appears on). In
> this case, you would want to use direction-specific keywords like 'left' and
> 'right' to ensure that the cue box doesn't "mirror" when the language is
> changed to Arabic and end up on the wrong side. But in other, more general,
> cases, you *do* want the mirroring to occur.


There is no use case in captioning where you want the mirroring of the cue box to happen. The thing that should change direction is the text inside it, not the box.


> In looking at the text this morning I see the formal description of these
> words in Section 3.1 under 'position' says:
> 
> ---
> If the cue text alignment is start or left, return 0 and abort these steps.
> 
> If the cue text alignment is end or right, return 100 and abort these steps.
> 
> If the cue text alignment is middle, return 50 and abort these steps.
> ---
> 
> This means that the direction neutral words are just synonyms for their
> direction specific brethren and we haven't solved the problem.


No it doesn't. You've skipped over the most important part of that section. Here is the full quote:

--
A text track cue has a text track cue computed text position whose value is that returned by the following algorithm, which is defined in terms of the other aspects of the cue:

1. If the text track cue text position is numeric, then return the value of the text track cue text position and abort these steps. (Otherwise, the text track cue text position is the special value auto.)

2. If the text track cue text alignment is start or left, return 0 and abort these steps.

3. If the text track cue text alignment is end or right, return 100 and abort these steps.

4. If the text track cue text alignment is middle, return 50 and abort these steps.
--


What it says is that: if the cue box position is not set explicitly (that's 1.), then the cue box's position depends on the alignment of the text inside the box.

Further, the note below that text explains how this works for right-to-left text:

--
Even for horizontal cues with right-to-left paragraph direction text, the cue box is positioned from the left edge of the video frame. This allows defining a rendering space template which can be filled with either left-to-right or right-to-left paragraph direction text. If such a cue box template is created with start or end aligned text, it is best to also specify a size since otherwise the text may flip from one side of the video frame to the other.
--


It's a little tricky to follow, but it's correct.
Comment 11 Simon Pieters 2015-11-24 13:53:46 UTC
Maybe we could make the auto position use 50 for start/end text alignment.

Currently,

    00:00:00.000 --> 00:00:10.000 size:50% align:start
    English text
    Arabic text

would renders as

+---------------------------------------+
|English text                           |
|         txet cibarA                   |
+---------------------------------------+
 ^ 0%               ^ 50%              ^ 100%

Arguably it is "wrong" for RTL languages. At least it's not clear to me why the box should appear on the left side.

We made the auto positioning "do the right thing" so it would suffice to specify align:left to get text flushed to the left, but doing the same thing for start-aligned text was maybe a bad call?
Comment 12 Philip Jägenstedt 2015-11-24 14:12:26 UTC
My mental model of this is that we have a cue box which in all typical cases should be 100% of the width, and then you can align the text within it. The only reason to use a smaller cue box is to avoid overlapping something on screen, but you still want to make it as large as possible to get as little line wrapping as possible.

In this model, it doesn't make sense to adjust the size or position of the box depending on text directionality. Mirroring makes especially little sense, because then you're pretty much guaranteed to overlap the thing that the box was moved in order to avoid.
Comment 13 Simon Pieters 2015-11-27 09:29:26 UTC
In addition, I think the names start and end are just wrong for position alignment. They should be left and right. They are the same as align:left and align:right. Text alignment also depends on vertical vs horizontal: in vertical text, left actually means top. I suppose it is really be inline-left, i.e. left in the inline direction.

Maybe there is a similar problem for line alignment. (Why are these called alignments, and not anchor points? Why do we have different solutions between cues and regions?)
Comment 14 Simon Pieters 2015-11-27 09:36:15 UTC
Regions allow placing the anchor arbitrarily, not just center or at the edges.

Since the original positioning scheme was like background-position, why don't we allow setting a percentage as the alignment? And maybe switch back to be like background-position if no "position alignment" is specified?
Comment 15 Philip Jägenstedt 2015-11-27 10:36:20 UTC
I support any harmonization between regions and cues. Another way to do that would be to get rid of arbitrary anchor points for regions and let them use only 9 anchor points as well.
Comment 16 Philip Jägenstedt 2015-11-27 10:39:18 UTC
(In reply to Simon Pieters from comment #13)
> In addition, I think the names start and end are just wrong for position
> alignment. They should be left and right.

If you mean PositionAlignSetting, then I agree.
Comment 17 Simon Pieters 2015-11-27 12:18:08 UTC
https://github.com/w3c/webvtt/pull/273
Comment 18 Silvia Pfeiffer 2015-11-29 12:56:52 UTC
I think there is a lot of misunderstanding here. Don't apply this patch yet. I will need to find time to explain all this, but your diagram in comment 11 is wrong, since the size of the cue is only 50% of the width of the video.
Comment 19 Silvia Pfeiffer 2015-11-29 13:04:05 UTC
(In reply to Silvia Pfeiffer from comment #18)
> I think there is a lot of misunderstanding here. Don't apply this patch yet.
> I will need to find time to explain all this, but your diagram in comment 11
> is wrong, since the size of the cue is only 50% of the width of the video.

Actually, I think maybe your diagram is right and your box does not represent the cue box, but the video size.

Think of caption authoring in this way:

1st you find a place on the video into which the text should go. This box depends heavily on the context of the video content and the speaker. So, if you decide to make the cue box only 50% width and place it , then you have decided that the text should go into that area.

2nd you write the text into that box. Whether that text is ltr or rtl is irrelevant.

Anyway ... I'll get back to you on the rest.
Comment 20 Martin Dürst 2015-11-29 22:53:34 UTC
(In reply to Silvia Pfeiffer from comment #19)

> 1st you find a place on the video into which the text should go. This box
> depends heavily on the context of the video content and the speaker. So, if
> you decide to make the cue box only 50% width and place it , then you have
> decided that the text should go into that area.
> 
> 2nd you write the text into that box. Whether that text is ltr or rtl is
> irrelevant.

Well, in general, that's okay. But the case we are worrying about is a case like the following: Due to what's on the video, you have a choice of putting the box in the lower left or the lower right (or the lower center). Because you think about left-to-right writing systems, you decide to put it lower left.

The 1st/2nd model will work for all left-to-right languages, but for right-to-left languages, the box will need to be replaced (which the translator tasked with creating the text may not be able to do) or the result will look weird.
Comment 21 Simon Pieters 2015-11-30 11:24:47 UTC
(In reply to Silvia Pfeiffer from comment #19)
> Actually, I think maybe your diagram is right and your box does not
> represent the cue box, but the video size.

Yes.

You have not explained why align:start should position the cue box on the left instead of being positioned the same as align:middle. I think it should be (again the "box" is the video):

+---------------------------------------+
|          English text                 |
|                   txet cibarA         |
+---------------------------------------+
 ^ 0%               ^ 50%              ^ 100%

This seems consistent with your thinking of first positioning and then writing any cue text, unless I'm missing something.
Comment 22 Richard Ishida 2015-12-14 18:34:33 UTC
i read through the thread again.  Here is my summary of the situation as i understand it (trying to bring the threads together in one coherent sum-up). Of course, please point out where i have misunderstood things.


[1] I think we have established that the base direction of the text (ie. the inline direction of the text contained inside a cue box) should have no affect on the placement of the frame of the cue box.  In respect of base direction, the cue box as simply a receptacle into which text flows.  The cue text may be rtl or ltr, or a mixture of the two, but the cue box stays where it is.  This makes sense, since most of the time a positioned cue box is positioned to match the geometry of what is going on behind it (eg. to avoid overlapping something, or to position relative to a speaker, etc.)

[2] The alignment of the lines of text within the cue box also has no effect on the position of the frame of the cue box (other than that you need to take into account the width of the box if you have some rtl and some ltr lines).

[3] On the other hand, the *writing-mode* of the text will indeed affect the placement of the box.  (NB: writing-mode has *nothing* to do with base direction, it only refers to the direction in which the lines progress, one after the other.) The positioning of the cue box varies between vertical and horizontal coordinates, depending on the writing-mode direction. 

As i understand it, the spec currently uses `start` to abstract away from whether a position relates to left (for horizontal writing-mode) or top (for vertical writing-mode), and end to mean either right or bottom.  I'm not sure the terms start and end are apposite here, but that's a terminology issue – start in this case does not always mean 'left', it may also mean 'top'.

[4] In most cases, as Philip says, if you position the text on the screen it is to avoid overlapping something, or to appear relative to something (eg. the speaker on the video behind). What appears in the background is unlikely to change based on the language of the webvtt script. 

[5] On the other hand, Addison i think is saying that, in cases where you are not positioning the text relative to what's happening behind it, but you *are* positioning it, you may want the position to be influenced by reading direction.  I think Philip would argue that in such cases there should be no positioning, and the text should be centred in the full width of the screen (the default), so as to reduce line breaking.

If Addison's use case holds, then i believe we may need to use words along the lines of start and end for horizontal text that are related to the script direction, but i think it relates to the language of the webvtt annotations *as a whole* (ie. for an arabic transcription do the mirroring, but do it throughout - the position is still not affected by the base direction of the text inside the box). However, i can't see how that use case would in practice be relevant for cue boxes for which the writing-mode is set to vertical – i think the necessary keywords would be relevant for horizontal writing-mode only.
Comment 23 Simon Pieters 2015-12-15 10:35:29 UTC
(In reply to Richard Ishida from comment #22)
> [2] The alignment of the lines of text within the cue box also has no effect
> on the position of the frame of the cue box (other than that you need to
> take into account the width of the box if you have some rtl and some ltr
> lines).

No, currently align:left and align:start will position the box on the left (unless it is explicitly positioned somewhere). My proposal https://github.com/w3c/webvtt/pull/273 is to let align:start/end be positioned the same as align:center.

> [3] On the other hand, the *writing-mode* of the text will indeed affect the
> placement of the box.  (NB: writing-mode has *nothing* to do with base
> direction, it only refers to the direction in which the lines progress, one
> after the other.) The positioning of the cue box varies between vertical and
> horizontal coordinates, depending on the writing-mode direction. 
> 
> As i understand it, the spec currently uses `start` to abstract away from
> whether a position relates to left (for horizontal writing-mode) or top (for
> vertical writing-mode), and end to mean either right or bottom.  I'm not
> sure the terms start and end are apposite here, but that's a terminology
> issue – start in this case does not always mean 'left', it may also mean
> 'top'.

Yeah, but this is pretty inconsistent with CSS, as is pointed out in comment 9.

> [5] On the other hand, Addison i think is saying that, in cases where you
> are not positioning the text relative to what's happening behind it, but you
> *are* positioning it, you may want the position to be influenced by reading
> direction.  I think Philip would argue that in such cases there should be no
> positioning, and the text should be centred in the full width of the screen
> (the default), so as to reduce line breaking.
> 
> If Addison's use case holds, then i believe we may need to use words along
> the lines of start and end for horizontal text that are related to the
> script direction, but i think it relates to the language of the webvtt
> annotations *as a whole* (ie. for an arabic transcription do the mirroring,
> but do it throughout - the position is still not affected by the base
> direction of the text inside the box). However, i can't see how that use
> case would in practice be relevant for cue boxes for which the writing-mode
> is set to vertical – i think the necessary keywords would be relevant for
> horizontal writing-mode only.

I think it's more a reasonable expectation that "start" position alignment would react to LTR/RTL, as it does in CSS, and as the 'align' setting does, instead of meaning line-left. The PR renames these to left and right, since they map to align:left and align:right sides.
Comment 24 Silvia Pfeiffer 2016-02-11 16:15:23 UTC
(In reply to Simon Pieters from comment #13)
> Maybe there is a similar problem for line alignment. (Why are these called
> alignments, and not anchor points? Why do we have different solutions
> between cues and regions?)

Think of regions as providing a boundary for groups of cues, particularly since the cues in a region are allowed to move (scroll). I agree that the differences should be minimized. But I don't think you can fully unify the two concepts: one is an explicitly paced and unmovable cue vs the other provides a boundary for cues that can push each other away.
Comment 25 Silvia Pfeiffer 2016-02-11 16:18:31 UTC
(In reply to Martin Dürst from comment #20)
> (In reply to Silvia Pfeiffer from comment #19)
> 
> > 1st you find a place on the video into which the text should go. This box
> > depends heavily on the context of the video content and the speaker. So, if
> > you decide to make the cue box only 50% width and place it , then you have
> > decided that the text should go into that area.
> > 
> > 2nd you write the text into that box. Whether that text is ltr or rtl is
> > irrelevant.
> 
> Well, in general, that's okay. But the case we are worrying about is a case
> like the following: Due to what's on the video, you have a choice of putting
> the box in the lower left or the lower right (or the lower center). Because
> you think about left-to-right writing systems, you decide to put it lower
> left.

You also can decide not to restrict the width of the box at all. You would only restrict it if there is something you're trying to avoid. That avoidance is independent of the writing direction.

> The 1st/2nd model will work for all left-to-right languages, but for
> right-to-left languages, the box will need to be replaced (which the
> translator tasked with creating the text may not be able to do) or the
> result will look weird.

Why would it need to be replaced? If it avoided something on the right, then rendering the text as in the image is completely correct.
Comment 26 Silvia Pfeiffer 2016-02-11 16:24:07 UTC
(In reply to Simon Pieters from comment #23)
> (In reply to Richard Ishida from comment #22)
> > [2] The alignment of the lines of text within the cue box also has no effect
> > on the position of the frame of the cue box (other than that you need to
> > take into account the width of the box if you have some rtl and some ltr
> > lines).
> 
> No, currently align:left and align:start will position the box on the left
> (unless it is explicitly positioned somewhere).

The reason for this is that it feels much more natural when you author a cue text and you say "align:start" and you've made your cue box smaller (e.g. 50% size) that this box sits at the beginning of the screen and not in the middle. Everything else is confusing.


> My proposal
> https://github.com/w3c/webvtt/pull/273 is to let align:start/end be
> positioned the same as align:center.

That has been found to be counter-intuitive by many captioners.


> > [3] On the other hand, the *writing-mode* of the text will indeed affect the
> > placement of the box.  (NB: writing-mode has *nothing* to do with base
> > direction, it only refers to the direction in which the lines progress, one
> > after the other.) The positioning of the cue box varies between vertical and
> > horizontal coordinates, depending on the writing-mode direction. 
> > 
> > As i understand it, the spec currently uses `start` to abstract away from
> > whether a position relates to left (for horizontal writing-mode) or top (for
> > vertical writing-mode), and end to mean either right or bottom.  I'm not
> > sure the terms start and end are apposite here, but that's a terminology
> > issue – start in this case does not always mean 'left', it may also mean
> > 'top'.
> 
> Yeah, but this is pretty inconsistent with CSS, as is pointed out in comment
> 9.

I could live with replacing 'start' and 'end' with 'left', 'top', 'right' and 'bottom'.
Comment 27 Silvia Pfeiffer 2016-02-11 16:30:19 UTC
(In reply to Simon Pieters from comment #21)
> (In reply to Silvia Pfeiffer from comment #19)
> > Actually, I think maybe your diagram is right and your box does not
> > represent the cue box, but the video size.
> 
> Yes.
> 
> You have not explained why align:start should position the cue box on the
> left instead of being positioned the same as align:middle. I think it should
> be (again the "box" is the video):
> 
> +---------------------------------------+
> |          English text                 |
> |                   txet cibarA         |
> +---------------------------------------+
>  ^ 0%               ^ 50%              ^ 100%
> 
> This seems consistent with your thinking of first positioning and then
> writing any cue text, unless I'm missing something.

Sure, but think about what a caption author expects when they decide to use a 50% wide box and don't provide explicit positioning.

A 50% wide box with left aligned text is most naturally aligned to the left edge of the viewport. As you can see in your example above, it doesn't look like it's left aligned at all - it looks like it's misplaced somewhere random.
Comment 28 Simon Pieters 2016-02-12 14:30:37 UTC
(In reply to Silvia Pfeiffer from comment #26)
> The reason for this is that it feels much more natural when you author a cue
> text and you say "align:start" and you've made your cue box smaller (e.g.
> 50% size) that this box sits at the beginning of the screen and not in the
> middle. Everything else is confusing.

So for RTL text, with your proposal, the text begins in the middle of the screen and extends leftwards. Are you saying that that is not confusing?

 +---------------------------------------+
 |                                       |
 |         txet cibarA                   |
 +---------------------------------------+
  ^ 0%               ^ 50%              ^ 100%


> > My proposal
> > https://github.com/w3c/webvtt/pull/273 is to let align:start/end be
> > positioned the same as align:center.
> 
> That has been found to be counter-intuitive by many captioners.

Do you have a citation?


> I could live with replacing 'start' and 'end' with 'left', 'top', 'right'
> and 'bottom'.

(In the PR we agreed with 'line-left' and 'line-right'.)
Comment 29 Simon Pieters 2016-02-12 14:36:14 UTC
(In reply to Silvia Pfeiffer from comment #27)
> Sure, but think about what a caption author expects when they decide to use
> a 50% wide box and don't provide explicit positioning.
> 
> A 50% wide box with left aligned text is most naturally aligned to the left
> edge of the viewport. As you can see in your example above, it doesn't look
> like it's left aligned at all - it looks like it's misplaced somewhere
> random.

The example doesn't use left aligned text. It uses *start* aligned text, which for RTL text is *right* aligned. We're not going to make progress here if you keep talking about align:start as if it were the same as align:left.
Comment 30 Silvia Pfeiffer 2016-02-14 00:28:40 UTC
(In reply to Simon Pieters from comment #28)
> (In reply to Silvia Pfeiffer from comment #26)
> > The reason for this is that it feels much more natural when you author a cue
> > text and you say "align:start" and you've made your cue box smaller (e.g.
> > 50% size) that this box sits at the beginning of the screen and not in the
> > middle. Everything else is confusing.
> 
> So for RTL text, with your proposal, the text begins in the middle of the
> screen and extends leftwards. Are you saying that that is not confusing?
> 
>  +---------------------------------------+
>  |                                       |
>  |         txet cibarA                   |
>  +---------------------------------------+
>   ^ 0%               ^ 50%              ^ 100%
> 

When the text is "align:start" and the box is 50% size, and you have RTL text, the text box sits in the right half, not the left half like this:

+---------------------------------------+
|                                       |
|                           txet cibarA |
+---------------------------------------+
 ^ 0%               ^ 50%              ^ 100%

That's the intention of automating the box placement based on the alignment and directionality.

If the algo doesn't describe it this way, then there's a bug for RTL.


> > > My proposal
> > > https://github.com/w3c/webvtt/pull/273 is to let align:start/end be
> > > positioned the same as align:center.
> > 
> > That has been found to be counter-intuitive by many captioners.
> 
> Do you have a citation?

Was mostly oral feedback. Specifically, when people used less than 100% width and specified alignment to be left. Then it seemed that the text would appear at random positions somewhat off the center and that was extremely confusing.


> > I could live with replacing 'start' and 'end' with 'left', 'top', 'right'
> > and 'bottom'.
> 
> (In the PR we agreed with 'line-left' and 'line-right'.)

Yes, I think that works even better!
Comment 31 Silvia Pfeiffer 2016-02-14 00:29:48 UTC
(In reply to Silvia Pfeiffer from comment #30)
> (In reply to Simon Pieters from comment #28)
> > (In reply to Silvia Pfeiffer from comment #26)
> > > The reason for this is that it feels much more natural when you author a cue
> > > text and you say "align:start" and you've made your cue box smaller (e.g.
> > > 50% size) that this box sits at the beginning of the screen and not in the
> > > middle. Everything else is confusing.
> > 
> > So for RTL text, with your proposal, the text begins in the middle of the
> > screen and extends leftwards. Are you saying that that is not confusing?
> > 
> >  +---------------------------------------+
> >  |                                       |
> >  |         txet cibarA                   |
> >  +---------------------------------------+
> >   ^ 0%               ^ 50%              ^ 100%
> > 
> 
> When the text is "align:start" and the box is 50% size, and you have RTL
> text, the text box sits in the right half, not the left half like this:
> 
> +---------------------------------------+
> |                                       |
> |                           txet cibarA |
> +---------------------------------------+
>  ^ 0%               ^ 50%              ^ 100%
> 
> That's the intention of automating the box placement based on the alignment
> and directionality.
> 
> If the algo doesn't describe it this way, then there's a bug for RTL.

Incidentally, that's why the keywords were "start" and "end" in the past and not "left" and right".
Comment 32 Simon Pieters 2016-02-15 09:34:49 UTC
(In reply to Silvia Pfeiffer from comment #30)
> When the text is "align:start" and the box is 50% size, and you have RTL
> text, the text box sits in the right half, not the left half like this:
> 
> +---------------------------------------+
> |                                       |
> |                           txet cibarA |
> +---------------------------------------+
>  ^ 0%               ^ 50%              ^ 100%
> 
> That's the intention of automating the box placement based on the alignment
> and directionality.
> 
> If the algo doesn't describe it this way, then there's a bug for RTL.

OK, what you say now contradicts what you said earlier (and what the spec currently says). Do you now want the base direction of the cue text to affect the position of the box? (The base direction of the first line? Note that each line can have different base direction.)


(In reply to Silvia Pfeiffer from comment #10)
> No it doesn't. You've skipped over the most important part of that section.
> Here is the full quote:
> 
> --
> A text track cue has a text track cue computed text position whose value is
> that returned by the following algorithm, which is defined in terms of the
> other aspects of the cue:
> 
> 1. If the text track cue text position is numeric, then return the value of
> the text track cue text position and abort these steps. (Otherwise, the text
> track cue text position is the special value auto.)
> 
> 2. If the text track cue text alignment is start or left, return 0 and abort
> these steps.
> 
> 3. If the text track cue text alignment is end or right, return 100 and
> abort these steps.
> 
> 4. If the text track cue text alignment is middle, return 50 and abort these
> steps.
> --
> 
> 
> What it says is that: if the cue box position is not set explicitly (that's
> 1.), then the cue box's position depends on the alignment of the text inside
> the box.
> 
> Further, the note below that text explains how this works for right-to-left
> text:
> 
> --
> Even for horizontal cues with right-to-left paragraph direction text, the
> cue box is positioned from the left edge of the video frame. This allows
> defining a rendering space template which can be filled with either
> left-to-right or right-to-left paragraph direction text. If such a cue box
> template is created with start or end aligned text, it is best to also
> specify a size since otherwise the text may flip from one side of the video
> frame to the other.
Comment 33 Silvia Pfeiffer 2016-02-15 10:49:02 UTC
(In reply to Simon Pieters from comment #32)
> (In reply to Silvia Pfeiffer from comment #30)
> > When the text is "align:start" and the box is 50% size, and you have RTL
> > text, the text box sits in the right half, not the left half like this:
> > 
> > +---------------------------------------+
> > |                                       |
> > |                           txet cibarA |
> > +---------------------------------------+
> >  ^ 0%               ^ 50%              ^ 100%
> > 
> > That's the intention of automating the box placement based on the alignment
> > and directionality.
> > 
> > If the algo doesn't describe it this way, then there's a bug for RTL.
> 
> OK, what you say now contradicts what you said earlier (and what the spec
> currently says).

Yeah, it's confusing. I think it also changed at some point in the past and I remembered a former spec.

Originally, there was a difference between 'align:left' and 'align:start' that also influenced the positioning of the box. That is what I remembered.

But now that I think back, I think we changed it.

I think it was a use case from YouTube where they had a 50% wide box with left aligned LTR text and were surprised when the box flipped over to the other side after a translation. That was was unexpected because the box had been reduced to avoid stuff on screen. Basically the argument was that a translation should not move the text box.


> Do you now want the base direction of the cue text to
> affect the position of the box?

It doesn't matter what I want - it's what makes sense to captioners that matters.

> (The base direction of the first line? Note
> that each line can have different base direction.)

For WebVTT, I would think it's the first line that determines the position of the box.

BTW: this is all a bit tricky and it took me a while to put it all in a logical system, because this box related positioning is so different from the CSS flow based positioning. Anyway - I'm sorry about the confusion.
Comment 34 Simon Pieters 2016-02-15 14:09:28 UTC
(In reply to Silvia Pfeiffer from comment #33)
> (In reply to Simon Pieters from comment #32)
> Yeah, it's confusing. I think it also changed at some point in the past and
> I remembered a former spec.
> 
> Originally, there was a difference between 'align:left' and 'align:start'
> that also influenced the positioning of the box. That is what I remembered.

OK.

> But now that I think back, I think we changed it.

Yep.

> I think it was a use case from YouTube where they had a 50% wide box with
> left aligned LTR text and were surprised when the box flipped over to the
> other side after a translation. That was was unexpected because the box had
> been reduced to avoid stuff on screen. Basically the argument was that a
> translation should not move the text box.

Right. So then we shouldn't move the box to the right for RTL, even though doing so would "make sense" in isolation.


> It doesn't matter what I want - it's what makes sense to captioners that
> matters.

Sure, it's just a level of indirection. ;-)


> For WebVTT, I would think it's the first line that determines the position
> of the box.

(It was until it was changed based on feedback from YouTube to not do that...)

> BTW: this is all a bit tricky and it took me a while to put it all in a
> logical system, because this box related positioning is so different from
> the CSS flow based positioning. Anyway - I'm sorry about the confusion.

Yeah, no worries. It seems we're on the same page now, so that's good. Also sorry if I came across as harsh or whatever. :-)

The remaining question is what is the least bad option: the current spec (align:start moves the box to the left) or my proposal (align:start leaves the box centered)?

The main reason I think my proposal is less bad is that moving the box to the left is "wrong" for RTL text, but leaving it centered is not. Having semantics where a keyword "start" doesn't reflect base directionality of the text confuses the terminology. Bidi is confusing as it is, we shouldn't make it worse.
Comment 35 Silvia Pfeiffer 2016-02-15 21:01:53 UTC
(In reply to Simon Pieters from comment #34)
> 
> > But now that I think back, I think we changed it.
> 
> Yep.
> 
> > I think it was a use case from YouTube where they had a 50% wide box with
> > left aligned LTR text and were surprised when the box flipped over to the
> > other side after a translation. That was was unexpected because the box had
> > been reduced to avoid stuff on screen. Basically the argument was that a
> > translation should not move the text box.
> 
> Right. So then we shouldn't move the box to the right for RTL, even though
> doing so would "make sense" in isolation.

Yeah, I think it has to stay where it would when it's LTR text. Basically, we're saying that LTR determines the box position. If you have RTL, you have to put explicit positioning on it with the offset.

> > For WebVTT, I would think it's the first line that determines the position
> > of the box.
> 
> (It was until it was changed based on feedback from YouTube to not do
> that...)

Yeah, it does seem unfair between LTR and RTL text. But maybe that's ok? I'd be curious about thoughts from actual RTL captioners!

> The remaining question is what is the least bad option: the current spec
> (align:start moves the box to the left) or my proposal (align:start leaves
> the box centered)?

I'd suggest moving it to the left. For a LTR box this is the natural thing to do and what captioners expect. This also covers the case where you're captioning in a LTR language and then add translations automatically. If the captioner instead starts captioning in RTL, the box is so obviously in the wrong position that they will probably explicitly position it anyway.


> The main reason I think my proposal is less bad is that moving the box to
> the left is "wrong" for RTL text, but leaving it centered is not.

Leaving it centered is also wrong, too. For RTL text that is in a 50% box and start aligned, it should be positioned at the right edge if there was not the translation use case.


> Having
> semantics where a keyword "start" doesn't reflect base directionality of the
> text confuses the terminology.

'align:start' still makes the text inside the box align correctly. It's just the box position that is confusing.

> Bidi is confusing as it is, we shouldn't make
> it worse.

Yeah, I worry about that, too.

Alternatively, we could ignore the translation use case and say that if you have intended to explicitly position the box on the left, you have to put a 'position:0%' on it, too, otherwise the box might move if the text directionality changes. A validator could even raise a warning in this case.

WDYT?
Comment 36 Simon Pieters 2016-02-16 10:13:54 UTC
(In reply to Silvia Pfeiffer from comment #35)
> Yeah, I think it has to stay where it would when it's LTR text. Basically,
> we're saying that LTR determines the box position. If you have RTL, you have
> to put explicit positioning on it with the offset.

I don't like baking in LTR semantics for this one thing when the rest of WebVTT is agnostic to LTR vs RTL. I think we should either be *clearly* LTR-by-default and require something explicit to opt into RTL semantics, or have *everything* be agnostic about the direction. Going half-way seems like it would be difficult to reason about what's going on.


> Yeah, it does seem unfair between LTR and RTL text. But maybe that's ok? I'd
> be curious about thoughts from actual RTL captioners!

WebVTT has clearly tried from the start to be agnostic about directionality. Personally I'd like to maintain that, it seems like a missed opportunity for a new format not to.

> I'd suggest moving it to the left. For a LTR box this is the natural thing
> to do and what captioners expect. This also covers the case where you're
> captioning in a LTR language and then add translations automatically. If the
> captioner instead starts captioning in RTL, the box is so obviously in the
> wrong position that they will probably explicitly position it anyway.

The last sentence is this bug exactly. :-)


> Leaving it centered is also wrong, too. For RTL text that is in a 50% box
> and start aligned, it should be positioned at the right edge if there was
> not the translation use case.

I'm not really convinced it's wrong. A cue can contain *both* LTR and RTL text, on different lines. Which side is the correct one in such a case?

Also consider if we were to add align:justified. Then text is aligned on both sides, regardless of directionality. Where should the box be?

Since center is the default, and there's no clear right answer, I still think center is the best choice. It might not be what the captioner wants, but it's not like it can't be set to something else. This is a heuristic to give a useful default, but we don't need to apply the heuristic for everything; we can avoid applying for ambiguous cases (like align:start).


> 'align:start' still makes the text inside the box align correctly. It's just
> the box position that is confusing.

Right.

> > Bidi is confusing as it is, we shouldn't make
> > it worse.
> 
> Yeah, I worry about that, too.
> 
> Alternatively, we could ignore the translation use case and say that if you
> have intended to explicitly position the box on the left, you have to put a
> 'position:0%' on it, too, otherwise the box might move if the text
> directionality changes. A validator could even raise a warning in this case.
> 
> WDYT?

I think if we want captioners to explicitly position their boxes, it is better to let the default position be center for all alignment values. Having heuristics and then advice against using it makes little sense to me. Translations seems like a very common use case, and sometimes it will be automated; if we go this route I would expect the net result will be that users see translated subtitles where the cues' positions are on the opposite side.

I could live with dropping the heuristic completely and always having center position by default, if it's confusing that align:left and align:start have different positions.
Comment 37 Silvia Pfeiffer 2016-02-16 10:53:57 UTC
(In reply to Simon Pieters from comment #36)
> 
> I don't like baking in LTR semantics for this one thing when the rest of
> WebVTT is agnostic to LTR vs RTL.

Yeah, I tend to agree. Maybe here consistency should go ahead of the translation use case. If a box is not expected to move, it has to be explicitly positioned.


> > Leaving it centered is also wrong, too. For RTL text that is in a 50% box
> > and start aligned, it should be positioned at the right edge if there was
> > not the translation use case.
> 
> I'm not really convinced it's wrong. A cue can contain *both* LTR and RTL
> text, on different lines. Which side is the correct one in such a case?

That's a third use case: LTR, RTL, and then mixed. If it was just RTL, right edge alignment would be correct.
Mixed text with 'align:start' is Example 15 and we haven't really made an example for when the size is restricted also. I guess mixed text should be center aligned.


> Also consider if we were to add align:justified. Then text is aligned on
> both sides, regardless of directionality. Where should the box be?

That should certainly be a centered box.


> Since center is the default, and there's no clear right answer, I still
> think center is the best choice.

Only for mixed directionality. I think for purely RTL, center is as wrong as for purely LTR.


> > Alternatively, we could ignore the translation use case and say that if you
> > have intended to explicitly position the box on the left, you have to put a
> > 'position:0%' on it, too, otherwise the box might move if the text
> > directionality changes. A validator could even raise a warning in this case.
> 
> I think if we want captioners to explicitly position their boxes, it is
> better to let the default position be center for all alignment values.
> Having heuristics and then advice against using it makes little sense to me.

Not advising against them. Only advising that in one particular case they have to be careful that it has the expected consequence.


> Translations seems like a very common use case, and sometimes it will be
> automated; if we go this route I would expect the net result will be that
> users see translated subtitles where the cues' positions are on the opposite
> side.

We are optimising for a small part of the translation problem though: one where we have 'align:start' and LTR text initially and then RTL text after translation.

As a consequence of optimising for this, we are making the default rendering of the common "align:left/right size:50%" case very confusing and forcing extra markup onto it:
* "align:left size:50% position:0%" or
* "align:right size:50% position:100%".

This is in contrast to merely adding position to the 'align:start/end' case like this:
* "align:start size:50% position:0%"
* "align:end size:50% position:100%"

If we keep the heuristic, we could even go as far as requiring 'position' to be added when 'align:start/end' is used - at least then there's a conformance error and we require explicit expression of intent by the captioner.
Comment 38 Simon Pieters 2016-02-16 13:29:39 UTC
(In reply to Silvia Pfeiffer from comment #37)
> Yeah, I tend to agree. Maybe here consistency should go ahead of the
> translation use case. If a box is not expected to move, it has to be
> explicitly positioned.

> That's a third use case: LTR, RTL, and then mixed. If it was just RTL, right
> edge alignment would be correct.
> Mixed text with 'align:start' is Example 15 and we haven't really made an
> example for when the size is restricted also. I guess mixed text should be
> center aligned.

OK... I'm not sure it's great to have the position change depending on the text in the cue. It would make the cues jump around if streaming pop-on is supported (which can be done today with the DOM API).

Since mixed directionality is presumably very rare, it also seems bad to add complexity to handle it (recording the base direction of all lines and have that influence the positioning).

 
> > Also consider if we were to add align:justified. Then text is aligned on
> > both sides, regardless of directionality. Where should the box be?
> 
> That should certainly be a centered box.

OK.

> Only for mixed directionality. I think for purely RTL, center is as wrong as
> for purely LTR.

Agreed.


> Not advising against them. Only advising that in one particular case they
> have to be careful that it has the expected consequence.
> 
> 
> > Translations seems like a very common use case, and sometimes it will be
> > automated; if we go this route I would expect the net result will be that
> > users see translated subtitles where the cues' positions are on the opposite
> > side.
> 
> We are optimising for a small part of the translation problem though: one
> where we have 'align:start' and LTR text initially and then RTL text after
> translation.

Right.

> As a consequence of optimising for this, we are making the default rendering
> of the common "align:left/right size:50%" case very confusing and forcing
> extra markup onto it:
> * "align:left size:50% position:0%" or
> * "align:right size:50% position:100%".

Right, I agree it would be good to keep the heuristic for align:left and align:right.

> This is in contrast to merely adding position to the 'align:start/end' case
> like this:
> * "align:start size:50% position:0%"
> * "align:end size:50% position:100%"
> 
> If we keep the heuristic, we could even go as far as requiring 'position' to
> be added when 'align:start/end' is used - at least then there's a
> conformance error and we require explicit expression of intent by the
> captioner.

How do you feel about align:start and align:end having center position by default and have an authoring conformance requirement that the position setting must be present when align:start/align:end? (And keep the heuristic for align:left/right.)
Comment 39 Silvia Pfeiffer 2016-02-16 14:43:45 UTC
(In reply to Simon Pieters from comment #38)
> 
> How do you feel about align:start and align:end having center position by
> default and have an authoring conformance requirement that the position
> setting must be present when align:start/align:end? (And keep the heuristic
> for align:left/right.)

Hmm, I guess that works.
Comment 40 Simon Pieters 2016-02-17 09:23:13 UTC
(In reply to Silvia Pfeiffer from comment #39)
> Hmm, I guess that works.

Yay! I've rebased https://github.com/w3c/webvtt/pull/273 and added new commits to address the feedback; PTAL.
Comment 41 David Singer 2016-10-11 18:07:58 UTC
We clarified start and end to be line-left and line-right; this was a breaking change and not yet reflected fully in implementations.
Comment 42 Silvia Pfeiffer 2017-08-09 12:01:15 UTC
Richard, Addison,
I know this has turned into a very long bug, but would you mind checking the resolution applied in https://github.com/w3c/webvtt/pull/273 and whether it is satisfactory to you for moving WebVTT to CR?