Accessibility of Media Elements in HTML 5 Gathering

1 November 2009

Palo Alto, California

The Hypertext CG plan to hold an informal gathering or two on the subject of Accessibility of Media Elements in HTML 5. The media elements are audio and video, and their supporting elements such as source.

Goals

The current specification of Timed Media elements HTML5 takes a fairly hard-nosed approach to what is presented as timed media: it is inside the timed media files that are selected from the sources. There is currently no provision for linking or synchronizing other material, and there is no discussion of how to manage the media so it's accessible. This needs addressing.

We would like to understand the 'landscape' and put in place good architectural support in general, as well as making sure that specific solutions exist to the more pressing problems. We anticipate working, in public, to develop proposals for any changes to specifications that might be suggested by the work, and also to develop a cohesive 'best practices' document that shows how those provisions can be used, by authors, by user agents (browsers), and users, to address the issues we identify.

We are aware that good accessibility rests on four legs (at least):

Proper provision in the specifications and documentation of those provisions and how to use them;
Willingness and ability to use those provisions effectively on the part of authors;
Provision of the right preferences, tools, and user interfaces in user agents to enable access to the provisions, perhaps automatically; and
The ability of those who need the provisions to find, enable or access them, and understand what they get.

It's easy to fail on one of these, and good accessibility is not then achieved.

Accessibility provisions for Timed Media might themselves be timed (e.g. captions) or un-timed (e.g. a readable screen-play or transcript). We wish to consider both categories.

Scope

The questions we would like to address include, but are not limited to the following:

What accessibility issues, and what are the 'classic' provisions for them, in timed media?
We are all aware of captioning for those who cannot hear the audio; less common is audio description of video, for those who cannot see. The BBC recently had some content that had optional sign-language overlays. Issues can also arise with susceptibility (e.g. flashing videos and epilepsy, color vision issues, and so on).
What solution frameworks already exist that would be relevant?
We are all aware of the existence, for example, of screen readers and perhaps even Braille output devices. We've seen tags in other parts of HTML that are there to support accessibility, and frameworks such as ARIA. Are there existing good practices that naturally extend to Timed Media?
Are there solutions that will benefit, be tested and seen by, and more likely authored by, the wider community?
There have been ongoing debates about whether 'unique' provision for accessibility (functions with no other purpose) are desirable. We do not intend to have this philosophical debate, but it would be useful to hear of related problems and opportunities that help make the debate irrelevant. For example, the provision of a transcript or separately accessible captions, in text form, makes indexing and searching content much easier. Are there problems like this that we can address that will make it more likely that authors build accessible timed media?
What new problems and new opportunities arise when we use digital media embedded in the world-wide-web?
Much of the work and research in this area has been done for isolated, analog, systems (classic television). Instead, we have a digital content presented in a rich context (web content). What new opportunities and solutions are opened up by this?
What technologies and solutions exist that we should notice?
The work of the W3C on a common Timed Text format, and the existence of general frameworks such as ARIA (Accessible Rich Internet Applications), suggest that there are pieces of the solution space we should consider. What are they?
What can be done today, given the structures we have? What experiments and proof-of-concept work should we notice?
We are aware that there are a number of pioneering organizations in this area. The BBC's work with sign-language has already been noted; workflows for captioning content have been developed in a number of places. There have been script-based experiments on captioning. What are some of these systems and experiments, and what can we learn from them?

Audience

We think that at least the following communities and groups might be affected:

HTML 5, the place where the Timed Media tags are specified, and the integration therefore must occur;
PFWG, where much thought has gone into this general problem space;
Media Annotations, who are concerned with metadata for Timed Media;
Timed Text, owners of DFXP, one of the likely text formats;
CSS, who define the styling of text, and also the nature of 'rendering surfaces' (and a presentation where a provision is needed, such as captions, might be seen as a rendering surface of a specific kind).
SYMM, for their synchronized multimedia integration language.

Requirements for Participation

If you feel prepared to attend, present, and work cooperatively on the problem outlined in the Scope section, please respond to the questionnaire as soon as possible. There is no registration fee, but registration is required. W3C membership is not required in order to participate in the gathering.

To attend the gathering, you must come prepared to present on one of the questions in this document, or a suitable other question, drawing from your experience or expertise to help inform the discussion and make progress on proposing solutions.

We expect the gathering to spend perhaps two-thirds of the time on these presentations, with short Q&A for each. Then we may have a panel session or two, or moderated discussion, to address focused questions. As stated in the introduction, we are looking for a framework and solutions with good 'longevity', simplicity, and efficacy, that will be embraced by the standards community, content authors, user agent developers, and end users. This is ambitious but achievable, we believe, and opportunities such as this to 'get it right from the start' come up all too rarely.

Gathering Organization

Organizers

David Singer: Apple, Inc.
John Foliot: Stanford University
+1.650.862.4603

This gathering is done under the auspices of the HyperText Coordination Group.

Team Contact

Philippe Le Hégaret, W3C

Logistics

This informal gathering will last one day, and the first one will be held in the Bay Area on November 1st at Stanford University. The meeting place is 20 minutes away from the TPAC 2009 hotel (see directions).

Meeting Location

Stanford University
Tresidder Union building, 2nd floor
459 Lagunita Dr,
Palo Alto, CA

There are numerous way-finding signs on campus for both Tressider Union, and the Faculty Club, which is next door. Free weekend parking is available in the lot across from Tressider Union.

John Foliot, Dave Singer, and the HyperText CG