Re: Change Proposals toward Issue-9: "how accessibility works for <video> is unclear" from Henri Sivonen on 2010-04-13 (public-html@w3.org from April 2010)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Tue, 13 Apr 2010 00:11:41 -0700 (PDT)
To: Sean Hayes <Sean.Hayes@microsoft.com>
Cc: public-html <public-html@w3.org>, Henri Sivonen <hsivonen@iki.fi>, Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Message-ID: <840451708.153506.1271142701931.JavaMail.root@cm-mail03.mozilla.org>

"Sean Hayes" <Sean.Hayes@microsoft.com> wrote:

> While I agree with you that the text track should be rendered in a
> separate context, I disagree that this is a browsing context. It’s a
> media context. The TTML DOM should not get exposed into the HTML DOM,

What I suggested doesn't require having an API for scripts in the parent document to be able to reach the DOM in the nested browsing context.

> and the HTML renderer should not be not part of the caption rendering.

That's the big question: Should the Web Platform require runtime implementations (aka. browser engines) have more than one line layout engine? If the answer is that an Web Platform implementations should have only one line layout engine, TTML (and SVG 1.2 Tiny textArea) are harmful. If the answer is that it's OK to have multiple line layout engines, TTML is OK even if its overuse of XML Namespaces would remain an annoyance for authors.

My suggestion assumed that the answer is that the Web Platform should not require implementations to have more than one line layout engine and should not require the line layout engine to be capable of being run outside the main thread.

> The point of TTML is that by defining a declarative model of time, the
> caption engine can pre-prepare overlay frames at video rates and
> inject them into the video pipeline, something your made up semantics
> doesn’t seem to deal with.

Indeed, what I suggested, by design, doesn't try to achieve per-frame alignment of timed text and video frames. Instead, my suggestion optimizes for maximal reuse of the existing browser engine capabilities. I'm assuming that for video accessibility it is not critically important to achieve per-frame time alignment of text and video frames and it's acceptable to have slight jitter in the alignment of the video and timed text display times.

I was assuming that browser engines will have a way to composite a main thread-owned RGBA overlay on top of the video frame "for free" on the GPU, so that preparing the overlays and compositing them with the video frames ahead of video frame display time is a non-requirement for retaining proper video frame rate.

> TTML was designed to be implemented in a
> modern media engine, not a browser; and the trade-offs are different.

I understand that that's what TTML was designed for. It's not clear to me that that's the right design and set of tradeoffs for adding video captioning and subtitling to the Web Platform.

> But even assuming you could do a decent job of synchronising and
> drawing the HTML and video in the same space using your browser
> engine, injecting random bits of HTML from arbitrary URI's seems like
> a really bad idea to me. 

Why? What I suggested has the same security characteristics as an iframe. (It could be made more secure by clamping the nested browser context to have some of the new sandboxing features always on.) As for performance characteristics, sure authors could do something that wrecks the performance on the main thread, but authors can already wreck the user experience of Web pages in countless of ways already.

> I would point out that since the text formatting model of TTML is
> XSL:FO, which  for the subset of properties in TTML is essentially
> CSS, it doesn’t reinvent it, it references it;

I think the key question is whether the *code* of the CSS formatter already in each browser engine is reused. Not whether CSS spec concepts are reused.

> However importing
> namespace elements and attributes in HTML doesn’t seem to be a problem
> for SVG or MathML, so I guess a similar arrangement could be made.

The SVG and MathML arrangements are for inlining those vocabularies in text/html. TTML is proposed to be delivered in separate files. Building an almost-but-not-quite-XML parser just for captioning files would have a bad utility to implementation effort ratio. Also, for a language that the Web Platform doesn't depend on yet, I think the right fix is to fix the language, not to introduce a new parsing layer.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 13 April 2010 07:12:15 UTC