RE: Change Proposals toward Issue-9: "how accessibility works for <video> is unclear" from Sean Hayes on 2010-04-12 (public-html@w3.org from April 2010)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Mon, 12 Apr 2010 16:04:43 +0000
To: Henri Sivonen <hsivonen@iki.fi>, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: public-html <public-html@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B911A4ED36F@DB3EX14MBXC303.europe.corp.microsoft.c>
While I agree with you that the text track should be rendered in a separate context, I disagree that this is a browsing context. It’s a media context. The TTML DOM should not get exposed into the HTML DOM, and the HTML renderer should not be not part of the caption rendering. The point of TTML is that by defining a declarative model of time, the caption engine can pre-prepare overlay frames at video rates and inject them into the video pipeline, something your made up semantics doesn’t seem to deal with. TTML was designed to be implemented in a modern media engine, not a browser; and the trade-offs are different. But even assuming you could do a decent job of synchronising and drawing the HTML and video in the same space using your browser engine, injecting random bits of HTML from arbitrary URI's seems like a really bad idea to me. 

I would point out that since the text formatting model of TTML is XSL:FO, which  for the subset of properties in TTML is essentially CSS, it doesn’t reinvent it, it references it; However it's precisely because a formal model that elaborates how properties behave over time was required that TTML was defined as the intersection of XSL and SMIL.  Its largely irrelevant how TTML is defined however, as I don’t believe it should be included or mixed with HTML. However importing namespace elements and attributes in HTML doesn’t seem to be a problem for SVG or MathML, so I guess a similar arrangement could be made.

-----Original Message-----
From: public-html-request@w3.org [mailto:public-html-request@w3.org] On Behalf Of Henri Sivonen
Sent: Monday, April 12, 2010 4:01 PM
To: Silvia Pfeiffer
Cc: public-html
Subject: Re: Change Proposals toward Issue-9: "how accessibility works for <video> is unclear"

"Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:

> Change Proposal 1:
> http://www.w3.org/WAI/PF/HTML/wiki/Media_MultitrackAPI

> http://www.w3.org/Bugs/Public/show_bug.cgi?id=9452


I think this API should fire events when the user uses the UA-provided UI (e.g. context menu) to enable or disable a track. This way, author-supplied playback UI can stay in sync when the user uses the UA-supplied UI to change the track state.

I think the names used by this API should align with the names used by the <track> element. Comments on the attribute names on the <track> element below.

> Change Proposal 2:
> http://www.w3.org/WAI/PF/HTML/wiki/Media_TextAssociations

> http://www.w3.org/Bugs/Public/show_bug.cgi?id=9471

> 
> This proposal introduces declarative markup to associate external 
> timed text resources (such as captions and subtitles) with a media 
> resource. It introduces <track> and <trackgroup> elements to be used 
> inside media elements and provides some recommendations on how to 
> render the text resources.

The name <track> is an improvement over <text> or <itext>. Thanks. I feel the attribute names need some bikeshedding, though:
 * src: This is good.
 * name: Since this is a human-readable UI string, please call it @title. Using @title would be consistent with the way style sheet naming works. (@name is the old name for @id.)
 * role: Please find another name for this. @role is taken by ARIA for overriding how an element is exposed to accessibility APIs. This attribute isn't about overriding the native HTML semantic. This attribute is a native HTML semantic, and the value space of this attribute is different from the value space of @role as used by ARIA.
 * type: This is good.
 * media: This is good.
 * language: Please rename this to hreflang for consistency pre-existing HTML elements that have an attribute that designates the natural language of an external resource.

In order to integrate with the security model of the Web, I think captions should be rendered using a nested browsing context whose origin is the URL of the text track file instead of using a <div> like area.

I think TTML support shouldn't be required. On the contrary, I think implementing support for it should be actively discouraged in order to avoid making the Web depend on TTML support.

The text formatting model of the Web is CSS. I think it is counter-productive to introduce a format that is targeted at Web use but reinvents large parts of CSS. Furthermore, CSS formatting operates on a DOM. Since TTML is an XML vocabulary, it maps to a DOM. However, rendering this DOM using a CSS formatter would show all the timed text at the same time, so it would have to be specified how the formatting works over time. TTML isn't defined in terms of CSS Animations that deals with the problem of changing CSS properties as a function of time. Finally, TTML introduces 7 XML namespaces and uses namespaced attributes.

> Should they be included in plain sight in the DOM? Should they be 
> included in a shadow DOM?

It would be good if the accessibility TF didn't overload the term "shadow DOM" for non-XBL purposes.

> Should they be rendered into an iframe-like construct?

I suggest the following:
 1) Support two captioning formats: plain SRT (the timed strings are plain text) and HTML-extended SRT (like SRT but the timed strings are HTML fragments).
 2) Establish a nested browsing context that overlays the video frame exactly.
 3) Initialize a document into the nested browsing context as if "data:text/html;charset=utf8,<!DOCTYPE html>" had been loaded.
 4) Set the origin on the document in the nested browsing context to the URL of the time text file.
 5) Associate document in the nested browsing context with a UA style sheet along the lines of html {
  display: table;
  height: 100%;
  font-family: sans-serif;
  font-size: /* Computed magically from the size of the video frame. */;
  color: white;
  background-color: transparent; /* Show the video frame underneath */
  text-outline: black 0.1em 0.1em; /* Just guessing with 0.1em here. */ } body {
  display: table-cell;
  padding: 0.5em, 1em, 0.5em, 1em;
  vertical-align: bottom;
}
 6) When the video playback advances to a time given as the start time of a given timed text string, post a timed text display task on the main thread with that string as its |value|, the document of the nested browsing context as its |targetDocument| and a flag indicating whether the string is plain text or an HTML fragment.
 7) When the timed text display task fires, make it run the equivalent targetDocument.body.textContent = value; if the flag indicated plain text or targetDocument.body.innerHTML = value; if the flag indicated an HTML fragment.
 8) When video playback advances to a point where the timed text needs to be cleared from view, post a timed text display task on the main thread with "" as its |value|, the document of the nested browsing context as its |targetDocument| and a flag indicating that the string is plain text.

As a bonus, allow the author to designate a style sheet that cascades with the UA style sheet defined in point #5 above.

The suggestion, by design, doesn't support overlapping display time ranges for two timed text strings. The suggestion could be elaborated on by allowing the timed text container to have a top/bottom alignment flag on a per string basis and making the display task flip body's vertical-align from bottom to top accordingly.

The suggestion above is defined in terms of the HTML, CSS and task queue machinery that browser engines already have whereas TTML is defined in terms of XSL-FO which isn't supported by browser engines. I believe the suggestion above would be much simpler to implement in a modern browser engine than any subset of TTML. (I also concur with Jonas Sicking's concerns about subsets leading to feature creep towards the whole spec that the subset was a part of.)

--
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Monday, 12 April 2010 16:05:14 UTC