Copyright © 1999 W3C (MIT, INRIA, Keio), All Rights Reserved. W3C liability,
trademark,
document
use and software
licensing rules apply.
This document summarizes the accessibility features of the Synchronized Multimedia Language (SMIL), version 1.0 Recommendation ([SMIL10]). This document has been written so that other documents may refer in a consistent manner to the accessibility features of SMIL.
This document is a draft W3C Note made available by the W3C and the W3C Web Accessibility Initiative. This NOTE has not yet been jointly approved by the WAI Education and Outreach Working Group (EOWG), the WAI Protocols and Formats Working Group (PFWG), and the Synchronized Multimedia (SYMM) Working Group.
Publication of a W3C Note does not imply endorsement by the W3C Membership. A list of current W3C technical reports and publications, including working drafts and notes, can be found at http://www.w3.org/TR.
The W3C Recommendation "Web Content Accessibility Guidelines" [WAI-WEBCONTENT] explains to authors how to create accessible content that is rich in (synchronized) multimedia. These guidelines explain to authors how to create content that may be used by people who cannot see, hear, move, or may not be able to process some types of information easily or at all. They explain how to design content that works for a variety of input devices, such as pointing devices, keyboards, head wands, or speech input. They explain how to provide equivalent information in formats that will meet the needs of users with blindness or low vision, who may not be able to use the visual part of a presentation (video, images, graphics, etc.), Other formats will help users who cannot use the audio part of a presentation (sound track, sound cues, etc.). Content that satisfies the "Web Content Accessibility Guidelines" will be accessible to users with disabilities and will benefit the Web community as a whole since it will be easier to manage, search, and render to a variety of devices (e.g., mobile devices).
The "Web Content Accessibility Guidelines" apply to synchronized multimedia presentations (such as SMIL 1.0 presentations) as well as less dynamic content (such as HTML documents [HTML]). Some issues discussed in the "Web Content Accessibility Guidelines" that arise specifically in the context of synchronized multimedia presentations include:
As with HTML documents, part of the responsibility for making SMIL 1.0 presentations accessible lies with the author and part with the user's software, the SMIL player. Authors must include equivalent alternatives for images, video, audio, etc. They must acknowledge that to ensure accessibility, users must be able to control style and layout. They must synchronize tracks correctly, describe relationships between tracks, provide useful default behavior, mark up the natural language of content, etc.
In turn, SMIL players must allow users to control style and layout (e.g., to control font size) and to choose from alternatives provided by the author. Users with some cognitive disabilities or people using combinations of assistive technologies such as an refreshable Braille and speech synthesis may require additional time to view a presentation or its captions. Since users do not process information at the same rate, players must allow them to speed up, slow down, or pause a presentation (as one can do with most home video players) since Users must also be able to turn on and off alternatives and control their size, position, volume, etc. Users might also want to specify how to render synchronized audio tracks, for instance, by using a male pitch voice for the auditory description to contrast with female voices in the audio track.
Users with some disabilities may require that time-sensitive information be rendered in a time-independent form. For example, SMIL 1.0 allows authors to create links whose destination varies over time. Some users may not have enough time to select these links, so players should provide access to them in a time-independent manner. Multimedia players can also offer an index to time-dependent information in a time independent form. For more information about accessible multimedia players, please consult the W3C "User Agent Accessibility Guidelines" [WAI-USERAGENT].
This Note describes the accessibility features of [SMIL10] and explains how authors and SMIL players should make use of them.
Note. Recommendations for authors and SMIL players are made in accordance with the recommendations made in [WAI-WEBCONTENT] and [WAI-USERAGENT].
Multimedia presentations have two main types of equivalent alternatives: discrete and stream. Discrete equivalents do not contain any time references or have intrinsic duration. Most common in SMIL are discrete text equivalents specified by attributes such as the alt attribute of the img element.
Stream equivalents, such as text captions or auditory descriptions, have intrinsic duration and may contain references to time. For instance, a text stream equivalent consists of pieces of text associated with a time code. Stream equivalents may be constructed out of discrete equivalents by using the par element (for parallel presentation) and seq element (for sequential presentation).
As explained in [WAI-WEBCONTENT], text equivalents are fundamental to accessibility since they may be rendered visually, as speech, or by a braille device. In multimedia presentations, text stream equivalents must be synchronized with other time-dependent media. We recommend embedding time codes in text streams in these cases.
The Web Content Accessibility Guidelines also require that, until user agents can automatically read aloud the text equivalent of a visual track, authors provide an auditory description of the important information of the visual track of a multimedia presentation. This benefits users who may not be able to read text or may not have access to software or hardware for speech synthesis or braille.
The following sections describe in more detail the SMIL features for specifying discrete and stream equivalents for video, audio, text, and other SMIL elements.
Authors specify discrete text equivalents for SMIL elements through the following attributes. Discrete text equivalents, when rendered by players or assistive technologies to the screen, as speech, or on a dynamic braille display, allow users to make use of the page, even if they cannot make use of all of its content. For instance, providing a text equivalent of an image that is part of a link will enable someone with blindness to decide whether to follow the link.
The following example includes a video element that graphical illustrates trends in Web commerce and privacy. The alt, title, and abstract attributes to specify discrete equivalents that provide different types of information and with different granularity. The longdesc attribute designates a more complete text equivalent of the video presentation, with details about what information is being displayed in the graph, the units of the graph, etc. The long description might also include links back to anchors associated with key points of the presentation.
<video src="rtsp://foo.com/graph.imf"
title="Web Trends: Graph 1"
alt="The number of online stores
and consumers is increasing, but privacy
is decreasing."
abstract="The number of Web users, online stores, and
the influence of Web communities are
all steadily increasing while privacy for
Web users is slowly diminishing. This graph
explains the trends and Web technologies
that will most impact the future of
Web commerce."
longdesc="http://foo.com/graph-description.htm"/>
Two stream equivalent formats that promote accessibility are captions and auditory descriptions. A caption is a text transcript of spoken words and non-spoken sound effects that is synchronized with an audio stream. Captions benefit people who with deafness or who are hard of hearing. They also benefit anyone in a setting where audio tracks would cause disturbance, where ambient noise prevents them from hearing the audio track, or when they have difficulties understanding spoken language.
An auditory description is a recorded or synthesized voice that describes key visual elements of the presentation including information about actions, body language, graphics, and scene changes. Like captions, auditory descriptions must be synchronized with the original audio stream, generally during natural pauses in the sound track. Synchronizing long auditory descriptions may also affect the the timing of the original audio and video tracks since natural pauses may not be long enough to include them. Auditory descriptions benefit people with blindness or low vision. They also benefit anyone in an eyes-busy setting or whose devices cannot show the original video or visual media object.
Below we discuss how to associate captions and auditory descriptions with multimedia presentations in SMIL 1.0 such that users may control the presentation of the alternative stream. We also examine how SMIL 1.0 supports multilingual presentations and how this affects stream equivalents for accessibility.
Note. The SMIL 1.0 specification explains how to synchronize events in one or more text streams with events in other tracks. The examples in the following sections do not include explicit information about synchronization.
In SMIL 1.0, captions may be included in a presentation with the textstream element. The following example includes a caption in addition to the audio and video tracks.
<par>
<audio src="audio.rm"/>
<video src="video.rm"/>
<textstream src="closed-caps.rtx">
</par>
The limitation of the previous example is that the user cannot easily turn on or off the caption. Style sheets (in conjunction with markup such as an "id" attribute) may be used to hide the text stream, but only for SMIL 1.0 players that support the particular style sheet language. Note. In CSS, authors may turn off the visual display of captions using 'display : none' and 'display: block' to turn them back on.
Since user control of presentation is vital to accessibility, SMIL 1.0 allows authors to create presentations whose behavior varies depending on how the user has configured the player. When a SMIL element such as textstream has the system-captions test attribute with value "on" and the user has configured the player to support captions, the element may be rendered. Whether the element is actually rendered depends on other markup in the document (such as language support).
The following example is a TV news presentation that consists of four media object elements: a video track that shows the news announcer, an audio track containing her voice, and two text streams containing a stream of stock values and captions. All the elements are to be played in parallel due to the par element. The caption will only be rendered if the user has configured the player to support captions.
<par>
<audio src="audio.rm"/>
<video src="video.rm"/>
<textstream src="stockticker.rtx"/>
<textstream src="closed-caps.rtx"
system-captions="on"/>
</par>
The system-captions attribute can be used with elements other than textstream. Like the other SMIL test attributes (refer to [SMIL], section 4.4), system-captions acts like a boolean flag that returns "true" or "false" according to the player configuration. Section 3.1 illustrates how system-captions can be used to specify different presentation layouts according to whether the user has configured the SMIL player to support captions.
Note. Authors should only use system-captions="on" for captions and system-captions="off" for caption-related effects such as layout changes. This allows players to distinguish accessibility captions from other types of content (which may allow them to avoid overlapping captions and other content automatically, for example).
In SMIL 1.0, auditory descriptions may be included in a presentation with the audio element. However, SMIL 1.0 does not provide a mechanism that allows users to turn on or off player support for auditory descriptions. Note. In CSS, authors may turn off auditory descriptions using 'display : none', but it is not clear what value of 'display' would turn them back on.
SMIL 1.0 allows authors to create multilingual presentations with subtitles (which are text streams) and overdubs in another language (which are audio streams). Multilingual presentations themselves do not pose accessibility problems. Indeed, providing additional tracks (even in a different language) will probably help many users.
However, multilingual presentations are linked to accessibility since these text and audio streams may co-exist with text and audio streams provided for accessibility, authors of accessible multilingual presentations should be aware of how they interact. For instance, captions and subtitles should be laid out so that they do not overlap on the screen (unless there is no other possibility or the user prefers it that way). Audio tracks should not overlap unless carefully synchronized.
In SMIL 1.0, the system-overdub-or-caption test attribute allows users to select (through the player's user interface) whether they would rather have the player render overdubs or subtitles. Note. The term "caption" in "system-overdub-or-caption" does not refer to accessibility captions. Authors must not use this attribute to create accessibility captions (use system-captions instead).
In the following example, the TV news are offered in both Spanish and English. If the user has the player configured to support both Spanish and overdubs, the Spanish audio track will be rendered. Otherwise the second audio track of the first switch element (the English audio track) will be rendered. Note that since there is only one set of captions (in English), they will be rendered when the user has configured the player to support captions.
<par>
<switch> <!-- audio -->
<audio src="audio-es.rm"
system-overdub-or-caption="overdub"
system-language="es"/>
<audio src="audio.rm"/>
</switch>
<video src="video.rm"/>
<textstream src="stockticker.rtx"/>
<textstream src="closed-caps.rtx"
system-captions="on"/>
</par>
To add Spanish subtitles to the example, the author would specify a second textstream element. The first textstream element will be rendered if the user has configured the player to prefer subtitles and Spanish. The second text stream will be rendered instead if the user has configured the player to support accessibility captions.
<par>
<!-- audio section same as before -->
<video src="video.rm"/>
<textstream src="stockticker.rtx"/>
<switch> <!-- captions or subtitles -->
<textstream src="subtitles-es.rtx"
system-overdub-or-caption="caption"
system-language="es"/>
<textstream src="closed-caps.rtx"
system-captions="on"/>
</switch>
</par>
Since captions include text descriptions of actions, sounds, etc. in addition to dialog, they can be more helpful than subtitles. Authors who provide captions need not provide subtitles in the same language since the two are so similar. The following example (based on the previous one) illustrates how to provide a text stream that may serve as either a caption or subtitle, without being rendered twice on the screen.
In the switch element, the three text streams are rendered in this order: the user prefers Spanish captions or Spanish subtitles or English captions. This design allows authors to reuse captions as subtitles and to ensure that the text stream is not rendered twice when the user has configured the player to support both.
<par>
<!-- audio section same as before -->
<video src="video.rm"/>
<textstream src="stockticker.rtx"/>
<switch> <!-- captions or subtitles -->
<textstream src="closed-caps-es.rtx"
system-captions="on"
system-language="es"/>
<textstream src="closed-caps-es.rtx"
system-overdub-or-caption="caption"
system-language="es"/>
<textstream src="closed-caps.rtx"
system-captions="on"/>
</switch>
</par>
Note. In SMIL 1.0, values for system-overdub-or-caption only refer to user preferences for either subtitles or overdubs; there are no values for the test attribute that refer to user preferences for neither or both.
Authors may specify the visual layout of SMIL 1.0 media objects through SMIL's own layout markup or with a style sheet language such as CSS [CSS1, CSS2]. In both cases, the layout element specifies the presentation information. The "Web Content Accessibility Guidelines" recommend style sheets for a number of reasons (refer to [CSS-ACCESS] for details): they are designed to ensure that the user has final control of the presentation, they may be shared by several documents, they make document and site management easier. Style sheets may not be supported by all SMIL players, however. SMIL's layout facilities allow authors to arrange rectangular regions visually (via the region element), much like frames in HTML.
The following example illustrates how to regain space when captions are turned off or not supported. In this example, the same layout is defined both with SMIL markup and CSS2 style sheets. Since both style sheets appear in a switch element, the SMIL player will use the CSS style sheet if supported, otherwise the SMIL style sheet. Note that the type attribute of the layout element specifies the MIME type of the style sheet language, here "text/css".
The style sheets in this example specify two layouts. When the user has chosen to view captions, they appear in a region (the "captext" region) that takes up 20% of available vertical space below a region for the video presentation (the "capvideo" region), which takes up the other 80%. When the user does not wish to view captions, the video region takes up all available vertical space (the "video" region). The choice of which layout to use depends on the value of the system-captions test attribute.
<smil>
<head>
<switch>
<layout type="text/css">
{ top: 20px; left: 20px }
[region="video"] {top: 0px; height: 100%}
[region="capvideo"] {top: 0px; height: 80%}
[region="captext"] {top: 80%; height: 20%; overflow: scroll}
</layout>
<layout>
<region id="video" top="0" height="100%" fit="meet"/>
<region id="capvideo" top="0" height="80%" fit="meet"/>
<region id="captext" top="80%" height="20%" fit="scroll"/>
</layout>
</switch>
</head>
<body>
<par>
<switch> <!-- if captions off use first region, else second -->
<video region="video" src="movie-vid.rm"
title="Video presentation of soccer match, 100% vert"
system-captions="off"/>
<video region="capvideo" src="movie-vid.rm"
title="Video presentation of soccer match, 80% vert"/>
</switch> <!-- if captions on render also captions -->
<textstream region="captext" src="closed-caps.rtx"
title="Caption of soccer match, 20% vert"/>
system-captions="on"/>
</par>
</body>
</smil>
In SMIL 1.0, the only style attribute that can be set in SMIL 1.0 is background-color but without other color definitions, that has little effect on accessibility.
SMIL 1.0 includes a number of interesting linking features, including HTML-like hyperlinks and image maps (as well as video maps). SMIL 1.0 also allows authors to create time-dependent links that may only be active only at certain times during a presentation (as defined by the author). To make these hyperlinks accessible, authors must provide textual information and SMIL players should allow users to control the link rendering.
To create an accessible image or video map, authors must must describe the nature of each link in the map for users who cannot see or use the visual information. Authors provide the description via the title attribute on the a and anchor elements. This text description may be rendered by SMIL players on the screen or by assistive technologies as speech or dynamic braille.
Here is an example of a video clip with an associated map. Each link describes the rectangular region of the video where it may be activated via the coords attribute.
<video src="http://www.w3.org/CoolStuff">
<anchor href="http://www.w3.org/AudioVideo"
coords="0%,0%,50%,50%"
title="W3C Multimedia Activity"/>
<anchor href="http://www.w3.org/Style"
coords="50%,50%,100%,100%"
title="W3C Style Sheet Activity"/>
</video>
Note that the anchor element is an empty element, while the a element has link content.
Until SMIL players are able to present this information to users on demand, authors should also make textual links available in addition to non-text links. Authors might want to control the presentation with the system-captions test attribute. In the following example, text links corresponding to those of the video map will be rendered when the user has configured the player to support captions. The example does not specify a particular screen layout.
<par>
<video src="http://www.w3.org/CoolStuff">
<anchor href="http://www.w3.org/AudioVideo"
coords="0%,0%,50%,50%"
title="W3C Multimedia Activity"/>
<anchor href="http://www.w3.org/Style"
coords="50%,50%,100%,100%"
title="W3C Style Sheet Activity"/>
</video>
<par system-captions="on">
<a href="http://www.w3.org/AudioVideo"
title="W3C Multimedia Activity">
<text src="./AudioVideo.txt">
</a>
<a href="http://www.w3.org/Style"
title="W3C Style Sheet Activity">
<text src="./Style.txt">
</a>
</par>
</par>
The time-dependent linking mechanisms offered by SMIL 1.0 pose an accessibility challenge to both authors and players. The following example from the SMIL 1.0 specification illustrates time-dependent linking. In the example, the duration of a video clip is split into two time intervals: from 0-5 seconds and from 5-10 seconds. A different link is associated with each of these intervals.
<video src="http://www.w3.org/CoolStuff">
<anchor href="http://www.w3.org/AudioVideo"
title="W3C Multimedia Activity"
begin="0s" end="5s"/>
<anchor href="http://www.w3.org/Style"
title="W3C Style Sheet Activity"
begin="5s" end="10s"/>
</video>
Some users require more time than anticipated by the author to interact with the presentation. Therefore, SMIL players should allow users to access all links in a time-independent manner. Until SMIL players enable this, authors should make all time-dependent links available in a static form. This may be done in a variety of ways. For instance, authors might list all time-dependent links in a separate document (and link to it from the presentation). The static list of links should include information about when the links are active during the presentation. This type of catalog will help all users, and allow people to find information about all links associated with a particular media object or that are active at a particular moment of the presentation.
Navigation mechanisms help all users but are particularly important for users with blindness or cognitive impairments who may not be able to grasp the structure of a page through visual clues. In addition to HTML-like linking mechanisms that may be used to create site maps and navigation bars, SMIL allows authors to create "temporal navigation bars" that allow users to navigate directly to important points in time of a presentation.
As an example, we first identify key points in a presentation that includes three interviews conducted sequentially (Joe, Tim, then Judy). Each segment is marked by an anchor (identified by the "id" attribute).
<smil>
<head>
<layout>
<region id="video" top="0" height="100%" fit="meet"/>
</layout>
<video src="http://www.w3.org/BBC"
region="video"
title="Future of the Web"
alt="Interview with Joe, Tim, and Judy for BBC"
abstract="The BBC interviews Joe, Tim, and Judy about
the Future of the Web. Joe and Tim talk about
social and technological impact. Judy
addresses the benefits to accessibility of
good design.">
<anchor region="joe" id="joe"
begin="0s" end="5s"
title="Joe's interview on Web trends"/>
<anchor region="tim" id="tim"
begin="5s" end="10s"
title="Tim's interview on Web trends"/>
<anchor region="judy" id="judy"
begin="10s" end="60s"
title="Judy's interview on Web accessibility"/>
</video>
Authors might add a temporal navigation bar in parallel with the presentation. The navigation bar takes up the lower 10% of the presentation, and consists of a photo of each of the interviewees that links to their interview. Selecting a link causes player to play that part of the video. Refer to the section on opening new windows for information about where the interview will be played.
<smil>
<head>
<layout>
<region id="video" top="0" height="90%" fit="meet"/>
<region id="joe" top="90%" height="10%" fit="meet"/>
<region id="tim" top="90%" height="10%"
left="35%" fit="meet"/>
<region id="judy" top="90%" height="10%" fit="meet"
left="70%" fit="meet"/>
</layout>
<par>
<video <-- Video presentation information here -->
</video>
<a href="#joe" title="Joe's interview on Web trends>
<img title="Photo of Joe" src="joe-photo.png"/>
</a>
<a href="#tim" title="Tim's interview on Web trends>
<img title="Photo of Tim" src="tim-photo.png"/>
</a>
<a href="#judy" title="Judy's interview on Web accessibility>
<img title="Photo of Judy" src="judy-photo.png"/>
</a>
</par>
The show attribute on the a and anchor elements controls the behavior of the source document containing the link when the link is followed. The default value of the attribute is "replace", which means that the destination presentation replaces the current presentation (in the same window, in audio, etc.).
Two other values for the attribute, "new" and "pause", cause the destination presentation to appear in a new context (e.g., window). Opening new windows without warning may disorient some users with blindness or cognitive impairments and may simply bother others. To promote accessibility, authors should cause new windows to open without warning. SMIL players should allow users to turn on and off support for opening new windows.
As mentioned earlier, the authors of SMIL 1.0 presentations can define alternative designs based on some user or system settings. The author can test these settings through test attributes set on various elements.
Test attributes for captions, overdubs, and language are described in Section 2.2. SMIL 1.0 also includes attributes to test the speed of connection and some characteristics of the player. Authors may use these tests to tailor the content or style of a presentation according to the user's device or connection. These are the SMIL 1.0 test attributes that may be used with synchronization elements:
These attributes may be used e.g., to deliver content more appropriately for various devices and connections. For example, if a connection is slow, the author may specify that images should not be downloaded. While these attributes may make some content more accessible, they may overly constrain what a user can access. For instance, users may still want to download important images despite a slow connection. Authors should use these attributes conservatively. In addition, players should offer possibilities for overriding the restrictions built in by the author when necessary.
The following example delivers different qualities of video based on available bandwidth. The player evaluates each of the choices in the switch element in order and chooses the first one whose system-bitrate value is equal to or greater to the speed of the connection between the media player and media server.
<switch> <!-- video -->
<video src="high-quality-movie.rm" system-bitrate="40000">
<video src="medium-quality-movie.rm" system-bitrate="24000">
<video src="low-quality-movie.rm" system-bitrate="10000">
</switch>
The first place to learn more about SMIL is the Recommendation itself [SMIL10]. The Synchronized Multimedia home page at the W3C Web site also includes information about SMIL tutorials, SMIL authoring tricks, examples of interesting presentations, player support for SMIL, and links to other sources of information about SMIL.
For more information about making SMIL presentations accessible, authors should consult the Web Content Accessibility Guidelines ([WAI-WEBCONTENT]) and accompanying techniques document ([WAI-WEBCONTENT]), which explains the guidelines in detail and with many examples. Although the techniques document emphasizes HTML and CSS, many of the principles and examples apply to SMIL as well.
Player developers should consult the User Agent Accessibility Guidelines ([WAI-USERAGENT]) and accompanying techniques document ([WAI-USERAGENT-TECHS]), which explains how to design accessible user agents, including synchronized multimedia players.
Developers of SMIL authoring tools should consult the Authoring Tool Accessibility Guidelines ([WAI-AUTOOLS]).
The SMIL 1.0 elements and attributes discussed in this document are listed here, followed by links to their definitions in the SMIL 1.0 specification.
W3C's Web Accessibility Initiative (WAI) addresses accessibility of the Web through five complementary activities that:
WAI's International Program Office enables partnering of industry, disability organizations, accessibility research organizations, and governments interested in creating an accessible Web. WAI sponsors include the US National Science Foundation and Department of Education's National Institute on Disability and Rehabilitation Research; the European Commission's DG XIII Telematics for Disabled and Elderly Programme; Telematics Applications Programme for Disabled and Elderly; Government of Canada, Industry Canada; IBM, Lotus Development Corporation, and NCR.
Additional information on WAI is available at http://www.w3.org/WAI.
Many people in W3C and WAI have given valuable comments to this document. The authors would like to thank Charles McCathieNevile, Philipp Hoschka, Judy Brewer, and the SYMM Working Group for their contributions.
A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.