This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24129 - [WebVTT] Support for bidi
Summary: [WebVTT] Support for bidi
Status: RESOLVED DUPLICATE of bug 28266
Alias: None
Product: TextTracks CG
Classification: Unclassified
Component: WebVTT (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Silvia Pfeiffer
QA Contact: This bug has no owner yet - up for the taking
URL:
Whiteboard: v1
Keywords:
Depends on:
Blocks:
 
Reported: 2013-12-18 14:15 UTC by Richard Ishida
Modified: 2015-09-30 23:04 UTC (History)
7 users (show)

See Also:


Attachments

Description Richard Ishida 2013-12-18 14:15:03 UTC
If I understand correctly, the spec at http://dev.w3.org/html5/webvtt/ dated 14 November 2013 relies on the Unicode Bidirectional Algorithm's first strong character heuristic to determine the ltr/rtl base direction for a paragraph, and has no way of modifying that base direction where needed.

This method of detecting direction can sometimes yield undesirable results, and it would be good to have a way of overriding it, where necessary.

In addition, it may be necessary to provide directional disambiguation inline to obtain expected results in bidi situations. To see examples of why this is necessary, and how it is done in HTML, see http://www.w3.org/International/articles/inline-bidi-markup/update

Could we include directional controls in VTT to enable easier use for those who use Arabic, Hebrew, Thaana, etc scripts?

Unicode control characters could be used, but they sometimes cause problems due to the fact that they are invisible, and the recommended new isolating control characters are not yet supported by browsers, so something like an optional tag may be preferable. (This also has the added advantage that it can help avoid problems during the transition period and for older browsers when support for the newly introduced isolating control characters is not yet enabled. Styles can be associated with tags already to produce isolation rather than embeddings in the major browsers, and other browsers can support embedding as a fallback.)
Comment 1 Silvia Pfeiffer 2013-12-19 07:09:53 UTC
Richard, thanks for doing a i18n review of the WebVTT spec.

Let me first state the obvious: we don't have to worry about old browsers with WebVTT, because only new browsers support it. Also, we don't actually want to repeat the full set of HTML bidi markup in WebVTT, but only do as little as absolutely necessary given that we always have a UTF-8 document.

Given this background, what directional controls are you missing in VTT?

I was under the impression that Unicode covered all features that bidi text requires, seeing as it is used for much more than just HTML. If people are capable of creating fully functional bidi documents with using only Unicode, I don't think we need to replicate any Unicode functionality with duplicated markup.

As for what happens when WebVTT is mapped into HTML for rendering, that may be a different problem, in particular if some of the Unicode functionality is not yet fully supported by modern browsers.

In short, can you please help me understand the issues that you're seeing better?
Comment 2 Martin Dürst 2013-12-23 07:16:01 UTC
(In reply to Silvia Pfeiffer from comment #1)

Hello Silvia,

> Also, we don't actually
> want to repeat the full set of HTML bidi markup in WebVTT, but only do as
> little as absolutely necessary given that we always have a UTF-8 document.
> 
> Given this background, what directional controls are you missing in VTT?
> 
> I was under the impression that Unicode covered all features that bidi text
> requires, seeing as it is used for much more than just HTML. If people are
> capable of creating fully functional bidi documents with using only Unicode,
> I don't think we need to replicate any Unicode functionality with duplicated
> markup.

Let me give you some historical background. The spec that introduced bidi markup in HTML is RFC 2070 ("Internationalization of the Hypertext Markup Language" http://tools.ietf.org/search/rfc2070). As one of its authors, I can tell you with confidence that when we worked on it, questions like the above were exactly what we considered. (Bidi features have been significantly increased in Unicode 6.3, but essentially stayed the same before that since way before 1997.)

What we realized when working on RFC 2070, and what has stood the test of time since then, is that there is a strong correlation between document structure and bidi structure. Using Unicode bidi controls in a markup language obscures this correlation and complicates things. Making bidi controls available in markup (i.e. HTML and WebVTT) exposes and strengthens this relationship.

As simple examples, bidi embeddings (and the newer isolates) usually correlate with changes in language, quotations, changes in emphasis or other style changes, and so on.

So indeed Unicode bidi control characters and HTML bidi controls cover the same features, but from a markup point of view, the Unicode bidi controls are just there for cases where markup isn't available, and definitely not the solution of choice if markup is available.
Comment 3 Silvia Pfeiffer 2013-12-23 08:36:41 UTC
Richard, thanks for the history. I will have to read up on RFC 2070.

One big difference between HTML and WebVTT is that WebVTT is fundamentally a line-based file format, while HTML is fundamentally a markup-based format. Fir example, in WebVTT newlines are actually meaningful. We only introduce markup into WebVTT when there is no other remedy. In all other situations, UTF8 characters are what counts. WebVTT is more like a txt file than a html file in that respect.

Another big difference is that WebVTT has a very small and limited means of manipulating text - there are no tables, no canvas, not even image tags etc. WebVTT in particular doesn't have an extensive document structure like HTML.

That unicode control characters are hard to read (where? in a typical text editor? on the command line?) isn't a sufficient requirement to introduce a duplicate solution - an authoring tool for WebVTT could simply expose those characters better, thus providing a remedy for this problem.

We should only introduce new markup in cases where the unicode characters are not sufficient to provide the correct rendering.

So, I think we will have to re-do the analysis that you have made for HTML and apply it specifically to WebVTT. I'll need to take some time to read up on it.
Comment 4 Martin Dürst 2013-12-23 08:48:00 UTC
(In reply to Silvia Pfeiffer from comment #3)
> Richard, thanks for the history. I will have to read up on RFC 2070.

Sorry, but I'm not Richard :-)
Comment 5 Silvia Pfeiffer 2013-12-23 08:50:59 UTC
(In reply to Martin Dürst from comment #4)
> (In reply to Silvia Pfeiffer from comment #3)
> > Richard, thanks for the history. I will have to read up on RFC 2070.
> 
> Sorry, but I'm not Richard :-)

Haha, sorry! :-) I just assumed!
Comment 6 Richard Ishida 2013-12-23 12:17:47 UTC
Hi Sylvia, this is Richard :-)  To understand what features required for bidi that the Unicode Bidi Algorithm doesn't support, read the article I pointed you to at http://www.w3.org/International/articles/inline-bidi-markup/ 

Hope that helps. (PS: I'm out of the office now until mid-Jan.)
Comment 7 Philip Jägenstedt 2014-01-27 08:37:34 UTC
Silvia, can you label this as v1 or v2? Do ‎ and ‏ have anything to do with this?
Comment 8 Silvia Pfeiffer 2014-01-27 11:45:11 UTC
We should address this for v1 for now. Yes, I think the point is that ‎ and ‏ are not sufficient to change directionality. I haven't had the time to read up all the linked materials yet.
Comment 9 Silvia Pfeiffer 2015-09-30 23:04:49 UTC
Resolving as duplicate of bug 28266 because the rational of https://www.w3.org/Bugs/Public/show_bug.cgi?id=28266#c17 applies here, too.

*** This bug has been marked as a duplicate of bug 28266 ***