RE: Requirements for external text alternatives for audio/video from Sean Hayes on 2010-03-30 (public-html-a11y@w3.org from March 2010)

From: Sean Hayes <Sean.Hayes@microsoft.com>
Date: Tue, 30 Mar 2010 05:39:16 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: Laura Carlson <laura.lee.carlson@gmail.com>, Eric Carlson <eric.carlson@apple.com>, Geoff Freed <geoff_freed@wgbh.org>, "HTML Accessibility Task Force" <public-html-a11y@w3.org>, Matt May <mattmay@adobe.com>, Philippe Le Hegaret <plh@w3.org>
Message-ID: <8DEFC0D8B72E054E97DC307774FE4B9119EFF50B@DB3EX14MBXC315.europe.corp.microsoft.c>

I'm fully aware of what can be done with an interactive media system, I've worked on dozens of them over the last 20 odd years; what I'm saying is that you are trying to insert functionality here that the <video> and <audio> tag were not scoped for in HTML5, and doing so under the guise of accessibility seems to me somewhat contrived.

There are a few well understood modes of accessibility for media which we need to address with high priority: captions, description, and transcript. Captions are a time based text equivalent to audio, audio is not interactive; and neither should the captions be. Adding interactivity to captions would break the semantic idea that they match the audio, and could end up being badly abused and confusing for the user, as well as introducing unnecessary security and social engineering issues. Captions should be as near as possible the exact equivalent of the audio, with adequate typography to be easily readable. Captions also belong to the media, and so if any branding is to be supplied then it should matched to the video content, not to the player, and it would be up to the content owner to supply the styling. Such branding should not be at the expense of readability. A similar argument would apply to subtitles.  The caption text needs to be available to assistive technology, but that does imply that the HTML author needs to get involved to make that happen.

Now if you want to introduce interactive media into HTML5, without invoking the full SMIL model, then you could certainly define another kind of timed track, perhaps along the lines of ATVEF, which creates javascript events, and carries a payload which could be injected into the HTML DOM; this is quite powerful enough to do all the things you list and more, and I'd be happy to contribute to a debate on the pros and cons of such a model vs SMIL. However that debate should not be part of an accessibility discussion, and if we have it here I think there is a very real danger of derailing the whole concept of caption and subtitle support in HTML5.

-----Original Message-----
From: Silvia Pfeiffer [mailto:silviapfeiffer1@gmail.com] 
Sent: Monday, March 29, 2010 12:19 AM
To: Sean Hayes
Cc: Laura Carlson; Eric Carlson; Geoff Freed; HTML Accessibility Task Force; Matt May; Philippe Le Hegaret
Subject: Re: Requirements for external text alternatives for audio/video

On Mon, Mar 29, 2010 at 7:38 AM, Sean Hayes <Sean.Hayes@microsoft.com> wrote:
> I don't disagree with the need to provide appropriate alternatives to media, but the mechanism of providing a transcript is perhaps not best provided through the mechanism of trapping captions. As you say, captions would in fact probably not be an adequate replacement for the media without the text of description being included at minimum. Thus a transcript is more like the alt text on an image, a different semantic beast than captions, and probably better provided by other means.
>
> I think there is an important larger issue here. Is the text mechanism intended to provide captions and subtitles; or is the intention, as Silvia's examples would seem to suggest, to use it turn HTML5 into a time based media like SMIL or HTML+TIME.   If the latter, and this mechanism is intended to address corporate branding and advertising, then I think we are straying out of the remit of accessibility into something much larger which would need to be taken up in the wider group.

The two examples that you are providing are two extremes:
captions/subtitles on the one end, and SMIL/HTML+Time on the other.
Right now and for the purposes of this group we are focused on captions/subtitles. But already with the features of DFXP there is a possibility to go a step further, without going all the way to the complexity of SMIL/HTML+Time - which, IMO, needs to come in at a different level.

What I was describing is simply time-aligned text that is a bit more capable than just being plain text. In particular I am talking about hyperlinks, which are essentially nothing more than styled text, but provide Web functionality - something that should be very important to us in the given context. This has nothing to do with going all the way to SMIL/HTML+Time. It is still no more than captions or subtitles, but with the possibility of linking out at a given time.

Think about it: we could have captions that allow us to explain things further - e.g. a movie about a historic event with names of people mentioned and you could click through on the names of the people and find out what they were really like and why they are portrayed as they are in the movie. Directly related "supplementary material" - not banned to another resource as it currently is in DVDs. Actually available at your fingertip when you are interested in it.

Or we could have captions of a political discussion with links to explain some background on the speakers.

Or we could have captions that would link to a dictionary entry for words that are used very infrequently in a language.

Or, of course, we could have links in ads to the eCommerce site of the current ad, so we can directly go and purchase the product.

This is not difficult to do on top of what we have right now, but requires the ability to at least interact with links inside timed text.

Note that I am not even sure if current DFXP/TTML supports hyperlinks, but if it doesn't I would be very keen on introducing them because they are extremely useful. Since DFXP/TTML is declared as being easily extensible, that should not be so hard to do.

Regards,
Silvia.

Received on Tuesday, 30 March 2010 05:39:55 UTC