Accessible Graphics and Multimedia on the Web
Marja-Riitta Koivunen, Charles McCathieNevile
Web Consortium (W3C)/MIT
Graphics and multimedia available on the Web have important roles in explaining, clarifying, illustrating, drawing attention to, grouping, and structuring information. With some effort and awareness of issues it is possible to make this information available for all users, including users with disabilities. The World Wide Web Consortium's (W3C) Web Accessibility Initiative (WAI) supports this goal by providing guidelines for accessible Web content, authoring tools and user agents. In addition, people designing the markup languages need to be aware of the accessibility issues in context of user experience. This paper presents a few easily remembered heuristics for designers of markup languages based on accessibility themes and principles of the WAI guidelines, and W3C Notes on accessibility features of graphics and multimedia languages.
Accessibility, design heuristics, multimedia, graphics, design process, Web.
Multimedia and graphics help users focus on important information, help them understand complicated concepts, provide concrete examples, visualize large models that are hard to understand otherwise, and provide engaging hands-on experience. Images, sound, animation, live video, different 3D models and interactive elements have become a normal part of the Web. They are often indispensable for users, particularly users with cognitive disabilities, who may not be able to easily understand complicated textual content. However, they can also create barriers for other users, such as blind users.
When multimedia and graphics are made accessible, the number of potential users increases considerably. Not only can users with disabilities access the material, but also users with nontraditional devices or users in noisy or mobile environments. Furthermore, accessibility often increases usability as well.
Web Accessibility Initiative (WAI) has developed several guidelines for accessibility: Web Content Accessibility Guidelines 1.0 (WCAG) , User Agents Accessibility Guidelines 1.0 (UAAG) , and Authoring Tools Accessibility Guidelines 1.0 (ATAG) . These guidelines have gone through extensive review while working groups including disabled users, disability organizations, software developers, researchers, and experts in accessibility, usability, Web design, and different Web technologies and application domains developed them.
Each markup language has its own set of techniques for reaching accessibility. When new, specialized markup languages and user agents to display them are developed for multimedia and graphics, it is important that they provide necessary means to support accessibility. The language designers are not always deeply knowledgeable in accessibility and need help to keep in mind accessibility requirements. Similarly HTML Web page designers benefit from WAI Quicktips that provides them a short and easily digestible reminder until they thoroughly know the full guidelines.
To help the designers of specialized multimedia and graphics languages, we developed a few simple heuristics that can help them remember and understand the core demands for accessibility. We selected the heuristics based on the themes and principles of accessible design and the experience gained from writing the Accessibility Notes for Scalable Vector Graphics (SVG) , and Synchronized Multimedia Integration Language (SMIL) . In addition, we extrapolated our experiences for 3D type interfaces. As the heuristics look at the user experience as a whole, they may also help Web tool and page designers while they are getting familiar with the WAI guidelines.
We first examine the design themes and principles that can be found in the WAI guidelines. Then we present the heuristics that are generated based on the experience of presenting accessibility features specific to SMIL and SVG and XML languages in general.
In the end we also examine how accessibility heuristics and guidelines could be employed in a usability process. Our aim is that accessibility becomes an integral part of usability methods and of the design process.
The user's experience of a Web page comes both from the design of the page as well as from the user agent that is used to view the page. Therefore we base these accessibility heuristics on themes in Web Content Accessibility Guidelines (WCAG)  and design principles in User Agent Accessibility Guidelines (UAAG) . The themes and principles provide a good abstraction level for developing accessibility heuristics. They are explained briefly in the following paragraphs.
The two general WCAG themes are 1) "Ensure graceful transformation", and 2) "Make content understandable and navigable". The first theme includes the following concepts:
The theme is further developed into WCAG guidelines 1-11. Similarly, the other main WCAG theme "Make Content Understandable and Navigable" is developed into guidelines 13-15. It includes the following issues:
The UAAG 1.0 design principles are written as guidelines:
In summary, these principles ensure access to all content and its structure, provide user control of the presentation, provide navigation mechanisms and orientation to the users, and ensure that content and user agent information is available to assistive technologies.
The presented themes and principles enhance accessibility but are not written in a form of a short list of accessibility heuristics that is easy to remember and keep in mind when designing new graphics or multimedia languages.
We first started selecting general principles that would help to explain the accessibility features of specific Web languages in an easy to understand use context. The first language specific Accessibility Note for SMIL originated some changes to the next version of the language. The Accessibility Note for SVG was written while the WG was refining the language so in some places it has been incorporated into SVG as a specification and in other cases referred to as how to produce SVG.
Based on the Accessibility Notes and the themes and principles described in the previous section we came up with six accessibility heuristics and their explanations. Our goal is that these heuristics are easy to remember and understand so that they can help designers of new markup languages to keep accessibility in their minds. The heuristics are presented in the following.
Heuristic 1: Provide alternative equivalents to make information suitable for auditive, visual, and tactile channels. In that way it is possible to follow the presentation even if a user has limitations with some senses, some cognitive limitations or the device cannot handle some media very well.
As text is easy to create and easy to transfer to almost any sense with the help of assistive technologies it can usually be used as an alternative to other media. For instance, assistive technology for a blind user can easily change an image with alternative text to voice or Braille format. With timed material, such as animations and video, the text often needs synchronization and therefore it might be easier to provide it also directly in audio. Section 3.1 explains more about alternative equivalents and their use.
Heuristic 2: Provide means to select equivalent content. Users should be provided flexible means to access the equivalent content in any combination that is most suitable for them because of their disabilities or the limitations of the used devices.
Normally these means are provided by user agents but also some languages provide switches for selecting content. If author provides the default selections, he should make sure that nothing in his design prevents flexible user control. Sometimes the defaults can also be automatically negotiated.
Heuristic 3:Provide user control for presentation by separating it from the rest of the content. This benefits users with disabilities or devices with limited capability.
For instance, a blind user may want to define that emphasized text is read in a louder voice, or a user with low vision can change the fonts to a larger size and use colors that have more contrast. This principle can be implemented by using style sheet technology. It is discussed more in Section 3.3.
Heuristic 4: Provide device independent interaction so that users with different input and output devices can easily get to all the available functionality.
This is often reached by using a user agent that can provide access to the functions by emulating mouse. However, it is good to provide shortcuts that get users to functions without any need to use mouse or spatial positioning. This is hardest when walking in a 3D world, but doing it might also help others not so familiar with 3D navigation with a mouse.
Heuristic 5: Provide semantics for structure. This helps provide alternative ways for user navigation and orientation. This can also help the use of alternative presentations. Use authoring tools that support this.
Semantics can be provided by using the elements of the language in a correct way and by describing the site and page navigation and the structured components with other available means. The languages usually include general grouping elements that can be used to add semantics. The other means include the use of class hierarchy and semantic languages, such as Resource Description Framework (RDF) . These are discussed in more detail in Section 3.5.
Heuristic 6: Provide reusable components. This helps users who use a media that makes it more laborious to compare the components.
Multimedia often contains application-defined components that are repeated several times. Especially graphics components might be repeated but also some video sequences or houses or other objects in a 3D model might contain the same elements. A user who is examining a structured image visually can do it much faster than a blind user navigating through the structure and the equivalent alternative explanations or even the graphical components. Reuse of components saves time as a model component can be examined only once.
All the multimedia content needs to be inherently suitable for visual, auditive and tactile channels or have alternative equivalents that make it suitable. The markup languages need to offer means for including the equivalents into the language and associating them to primary content.
Text equivalents are fundamental to accessibility since they may be rendered graphically, as speech, or by a Braille device. They are also easy as they are discrete in nature; they don't contain time references or have an intrinsic duration. They are typically specified by attributes such as the alt or longdesc attribute of the img element in XHTML and SMIL  or title and desc elements in SVG  or as part of an XHTML object element. The use of elements is preferred as they enable the attachment of class id's, styles, and other information to the equivalents.
Figure 3.1 shows an SVG image of a Network with a) a graphical rendering and b) rendering of the text equivalents.
Figure 3.1a: An image of a network with embedded structure.
Network An example of a computer network based on a hub
Hub A typical 10baseT/100BaseTX network hub
Computer A A common desktop PC
Computer B A common desktop PC
Cable A 10BaseT twisted pair cable
Cable B 10BaseT twisted pair cable
Cable N 10BaseT twisted pair cable
Figure 3.1b: Alternative equivalents describing the components of the network in Figure 3.1a. With additional semantics connections between components can be added.
Multimedia presentations often include continuous equivalents, such as text captions, which describe the spoken dialog and sound effects as text, or auditory descriptions, which is a sound track, describing what is happening in the video visually. Continuous equivalents have intrinsic duration and may contain references to time. For instance, a continuous text equivalent consists of pieces of text associated with a time code. They need to be synchronized with other time-dependent media. Continuous equivalents may be constructed out of discrete equivalents, for instance by using the SMIL timing features.
Figure 3.2. Image of a WBGH tutorial video with captions.
Figure 3.2 presents an example of an educational video explaining Einstein's theory of the relativity by using video and sound. The video was created by WBGH National Center for Accessible Media . Text captions that are located under the video window describe the content of the original soundtrack as text. In addition, the video has also an audio description. The user can control the playing of the captions and the audio description from the player user interface.
Other forms of continuous equivalents not explicitly required in WCAG 1.0 may also promote accessibility of multimedia and graphics. Representations of sign language benefit people with auditory limitations and simplified sound tracks may help people with some cognitive difficulties.
The 3D worlds need alternative equivalents too, but there are not many examples yet. In 3D the spatial location of the equivalent alternatives becomes important. A blind user turning his head around in a 3D world, such as near the Helsinki Senate Square in Figure 3.3, may not see around but may want to hear what is around him e.g. Helsinki Senate Square, Helsinki Lutheran Cathedral, Unioninkatu street and the Main Building of Helsinki University. We could provide discrete equivalents for the houses and geological constructs, such as hills or rivers. They could also be tied to a certain place or range of places, such as paths through the world.
In 3D we may also need to let user define the level of detail and the information categories e.g. what kind of buildings and what stores exist, and how far located equivalents are read when the user turns. For instance, sometimes the user may want to know generally what exists in the city in a certain direction and not just about the buildings that exists in the visual view.
Figure 3.3: Image of Senaatintori taken from a 3D model of Helsinki.
Every user is different and therefore may have different needs to see the content with different media at different times. There might be users who can only use the media that is suitable for audio or users with devices only capable of showing small amounts of text at a time. Users' should be able to select equivalent content suitable for their needs. This can be achieved by including appropriate features to markup languages as well as to user agents.
Figure 3.4: Selecting text captions or audio description for a video with Real Player.
Technologies such as SMIL offer a lot of control for the authors to decide which media to show at what time and where in a visual layout. In addition, the design should be able to accommodate the user preferences as otherwise it may be impossible for many users to get to the content. For instance, if the user wants to see captions the design should allow that, whether the author has designed an alternative layout for embedding the captions in the user interface or not. The captions benefit not only users with hearing disabilities, but also users who have problems in understanding the spoken language or who are in a noisy environment.
Both SMIL and SVG let authors design alternative user interfaces with the switch technology. This technology can check the values of attributes set by the user or provided by the user agent. Furthermore, the normal features of SMIL language can be used for adding equivalent content. Unfortunately, in that case the user does not have the choice of selecting between different contents. If the different media tracks could be identified and provided to the user agents the user could select combinations of tracks not predicted by the author.
With SVG authors can provide information on graphical components in different levels with title and descelements. When the information is available the user agent or a player can let the user select the level in which it is provided. The selection could be done with a style sheet or by providing an interface to do that through the user agent.
In 3D interfaces, the selection of alternative equivalents becomes even more complicated. It is difficult for the author to provide alternatives on all possible abstraction levels. However, some default alternatives can be provided. For instance, an author could provide an alternative equivalent for ready-made tours through a model. When alternative information is constructed and labeled automatically from information attached to components in the 3D world, the user should be able to select the level of detail preferred and from which areas the information should be provided.
Users need to be able to control how the information is presented so that they can adapt the rendering to meet their specific needs. A user who cannot see small blinking text or read red text on a green background should be given the option of changing the colors or font size and stopping the blinking so that he can get to the information that otherwise would have been outside his reach.
Part of the user control can be provided by separating the presentation from the structure and content in the language with style sheet technologies, such as CSS [13, 14] or XSL . The other part should be provided by the user agent. Some of the control can be quite automatic from the user point of view, as the user agent is interpreting the current user settings and applying them, and some settings are selected only when problems appear.
Figure 3.5: Different style sheets provided by a site author
Style sheets can be used to increase accessibility in most W3C languages, especially the ones that are based on XML . CSS provides cascading of the style sheets so that user style sheets can be built on top of author and system style sheets. Authors who separate presentation from structure and content also gain flexibility for authoring and can easily provide alternative presentations for users.
It is a good practice to try to provide a set of style sheets for different disabilities and devices, as that will ensure that the semantics needed for different user styles exists. Figure 3.5 has an example of alternatives from the Trace Center site.
Style definitions can be bound to the element names of a markup language but often the author also needs to define application specific classes for grouping the similar elements before style can be bound to the classes. This is especially true with graphic languages, such as SVG. When style sheets are separated into their own documents, or at least to a separate style definition element, it is easier to remember to group the similar elements into classes.
Style sheets can also be used to hide or show elements according to the context. For instance, the text equivalents for the Network image in Figure 3.1b. could be selected by using a special style sheet that only shows title and desc elements and an XML renderer. User agents can select the most suitable style sheet according to the characteristics of the device also by using the @media definitions in the style sheet.
Languages for new media or new types of representation may require new style elements. For instance, SVG has some filter, gradient, and mask styles that are not part of CSS, and 3D graphics probably also need some special style elements. However, it is not always possible to update the style sheet languages, such as CSS to include all these new elements. In SVG these are part of the SVG language but can be defined as symbols in separate files that are used as attribute values in the style sheet definitions. These can essentially function as style sheet definitions.
When elements can be recognized the user agent can also provide control for the user. For instance, movement can be disturbing for many users and at certain frequencies cause seizures for some users. A user agent could let the user stop all movement to reduce the disturbance. Also the user agent could slow down a fast presentation for users who have difficulties with the pace. But this all depends on how many options the technology provides for the user agent, and how well the user agent utilizes them.
Authors should be able to write pages so that it is possible to interact with them with different input and output devices. When functionalities can be provided in a device-independent way it suits for both spatial (mouse, joystick) and command-based devices (text or voice input).
It should also be possible to use assistive technologies that emulate input devices, such as mouse or joystick, keyboard, switch, and voice input. Languages that use general event names suitable for all devices make these goals easier to reach. For instance, the Document Object Model  defines such an interface, in addition to an interface to document elements and attributes.
Similarly to the input, the alternative equivalents in Section 3.1 and 3.2 and the user control of presentation make it possible to create output suitable for various output devices.
In 3D, or in other highly spatially oriented interfaces, it is challenging but possible to come up with commands that let users perform the same functions as when navigating with a mouse and visual output. For instance, the old versions of the game Dungeon or Adventureland, where users moved in caves in a 3D world, were using keyboard commands for moving and text for explaining the world (see Figure 3.6).
I'm in a dismal swamp. Obvious exits: North, South, East, West, Up. I can also see: cypress tree - evil smelling mud - swamp gas - floating patch of oily slime - chiggers
Figure 3.6: A sample description from Adventureland.
Similarly, it should be possible to take a 3D model, such as the model of Helsinki , and add commands for moving along the street to the next corner or to a named place, turning the head right and left while moving and exploring nearby statues or other places marked interesting a bit longer. Alternatively, users could move freely in a 3D model by using the spatial commands for left, right, forward, backward, up, and down or use a mouse emulator to do that.
Semantics are needed to control alternative equivalents and presentations, and to support alternative ways for navigation and orientation. For instance, the user agent needs to know the types of alternative equivalents so that it can provide control to select the right ones, and the style technology needs to allow appropriate presentations to be given for different types of content.
Specialized languages for multimedia and graphics can provide means to add semantics to the elements. Additional semantics can be attached with general metadata technology, such as RDF.
Providing the semantics involves many issues, such as using the markup language that best suits the task, and using it the way it is intended to be used and not because of its visual side effects, such as the bigger size of the letters in XHTML header elements. Semantics involves also trying to use good class definitions so that elements are grouped according to their different characteristics with classes. For instance, the author should be able to mark the navigation bar and other large structures within the page so that non-visual users don't need to read unnecessary information sequentially but can easily skip back and forth these structures.
The markup language may contain special elements for structural semantics, such as the g element in SVG for hierarchical grouping. A time-dependent presentation, such as a SMIL presentation, can also include meaningful time-dependent structures that should be marked so that users can skip uninteresting parts easily. In 3D worlds there are similar semantical structures, for instance a city that consists of streets and different types of houses, statues, lakes or ponds, railway stations, etc. Furthermore, some coordinates in the world can form tours through a city or some other more abstract semantical structures.
It may also be possible to create some semantics by interpreting spatial relations. For instance, it is possible to look at an SVG image and see that there are two epicentric circles in a right upper corner of a rectangle. However, with no other metadata a blind user only knows there are two circles and a rectangle and their sizes and positions. With some processing it is possible to form the former description based on the SVG element information.
Reusable components are important for users and authors with disabilities. This is because the alternative access methods or formats are often sequential and therefore may be more time-consuming than the original spatial presentation. This can be helped somewhat with good navigational means. For instance, a well-designed visual image of a page can be skimmed fast, and only when the user sees something interesting does she start to read the content sequentially. A visual user can do a lot of skipping and barely notice it. A nonvisual user may be a fast reader, but often needs more time to read the information sequentially and to skip parts of the content. A user with low vision or a small screen needs to zoom to different parts of an image while a visual user can glance the same image at once.
When components are reused, a user can explore them only once and save a lot of time by remembering the component. Similarly, when the creation of information takes much more time by alternative means it is helpful to be able to reuse that same component again once it is created.
SVG provides some means for defining reusable graphic components and referring to them. Also in time-dependent presentations, it should be possible to reuse certain parts of the presentation. In SMIL it is possible to reuse individual media components, but not as part of a SMIL presentation. In 3D, the components in the world, as well as tours or other abstract structures, should be reusable.
As with usability, it is extremely important to consider accessibility in the early phases of the design process of Web related technologies and content. Many things, such as alternative equivalents, are easy, and less expensive to plan and include in these early phases.
The accessibility heuristics can help the designers of graphics and multimedia languages to provide means for making the language and the corresponding user agents accessible in several phases of the design. In addition, these heuristics could also keep the accessibility-related aspects in the designer's mind during the design process of Web content.
In case of Web content, accessibility checks can often be included in the usability processes that are already in place. For instance, user scenarios should include scenarios of users with disabilities , and users with nontraditional devices or assistive technologies. In addition, usability tests should include users with disabilities; and conformance with accessibility guidelines should be a natural part of the usability requirements.
The accessibility heuristics can also be included in various phases of the usability process when more generic usability heuristics  or usability heuristics targeted to a special disability group  are being checked. The heuristics should help the designers to select applicable WAI guidelines and use their checking lists when needed.
We developed accessibility heuristics to help multimedia and graphics related language designers keep accessibility goals and requirements in their minds. The accessibility heuristics are based on the themes and principles in WCAG 1.0 and user UAAG 1.0 as well as the feedback we received when explaining the accessibility features of SVG and SMIL languages in Accessibility Notes. In addition, we have explored some potential features needed in 3D languages.
The heuristics gather the experience and expertise from people in the WAI working groups. As they put this experience in a new form, we would like to test it in practice. However, formally evaluating the accessibility of the resulting languages and user agents is not practical because of the complexity and long times involved. So instead we plan to gather more feedback from the designers.
These heuristics may be also used for capturing more abstract design principles for WCAG 2.0 guidelines and the XML Accessibility Guidelines.
We thank Judy Brewer, Daniel Dardailler, Wendy Chisholm, and Susan Lesch who provided helpful comments on previous versions of this document.
1. Adler, S. et al. Extensible Stylesheet Language (XSL), Version 1.0, W3C Working Draft 27 March 2000. Available at http://www.w3.org/TR/2000/WD-xsl-20000327/
2. Brewer, J. (ed.). How People with Disabilities Use the Web. W3C Working Draft, 4 January 2001. Available at http://www.w3.org/WAI/EO/Drafts/PWD-Use-Web/ .
3. Dardailler, D., Palmer, S. (eds.). XML Accessibility Guidelines W3C Working Draft, April, 22, 2001. Available at http://www.w3.org/WAI/PF/xmlgl.
4. Ferraiolo, J. (ed.). Scalable Vector Graphics 1.0 Specification (SVG), W3C Candidate Recommendation 2 August 2000. Available at http://www.w3.org/TR/2000/CR-SVG-20000802/.
5. Gunderson, J., and Jacobs, I. (eds.). User Agent Accessibility Guidelines, W3C Working Draft 28 July 2000. Available at http://www.w3.org/WAI/UA/WD-UAAG10-20000728.
6. Hoschka, P. (ed.). Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, Recommendation 15 June 1998. Available at http://www.w3.org/TR/1998/REC-smil-19980615.
7. Jacobs, I., and Brewer, J. (eds.) Accessibility Features of CSS, W3C Note 4 August 1999. Available at.
8. Katz, I. Web Access by Persons with Visual Disabilities. In CHI’97 workshop of Usability Testing World Wide Web Sites (March 23-24, 1997, Atlanta). Available at http://www.acm.org/sigchi/web/chi97testing/katz.htm.
9. Koivunen, M., and Jacobs, I. Accessibility features of SMIL, W3C Note 21 September 1999. Available at http://www.w3.org/TR/1999/NOTE-SMIL-access-19990921/.
10. Koivunen, M., and McCathieNevile, C. Accessibility features of SVG, W3C Note 7 August 2000. Available at http://www.w3.org/TR/2000/NOTE-SVG-access-20000807/.
11. Lassila, O., and Swick, R. (eds.). Resource Description Framework (RDF) Model and Syntax, W3C Recommendation 22 February 1999. Available at http://www.w3.org/TR/1999/REC-rdf-syntax-19990222.
12. Le Hors, A. et al. (eds.). Document Object Model (DOM) Level 2 Core Specification Version 1.0, W3C Recommendation, 13 November 2000. Available at http://www.w3.org/TR/DOM-Level-2-Core/.
13. Lie, H., and Bos, B. (eds.). Cascading Style Sheets, Level 1, W3C Recommendation 17 December 1996, revised 11 January 1999. Available at.
14. Lie, H., Bos, B., Lilley, C., and Jacobs, I. (eds,). Cascading Style Sheets, Level 2, W3C Recommendation 12 May 1998. Available at.
15. Linturi, R., Koivunen, M., and Sulkanen, J. Helsinki Arena 2000 - Augmenting a Real City to a Virtual One. In Ishida, T., and Isbister, K. (eds.). Digital Cities: Technologies, Experiences, and Future Perspectives, Lecture Notes in Computer Science 1765, Springer-Verlag, 2000. Pages 83-96. Available at http://link.springer.de/link/service/series/0558/bibs/1765/17650083.htm.
17. Nielsen, J. Enhancing the explanatory power of usability heuristics. In Proc. Of ACM CHI'94 (Boston, MA, April 24-28), pages 152-158.
18. Schmitz, P., and Cohen, A. (eds.). SMIL animation, W3C Working Draft 31 July 2000. Available at http://www.w3.org/TR/2000/WD-smil-animation-20000731.
19. Treviranus, J., McCathieNevile, C., Jacobs, I., and Richards, J. (eds.). Authoring Tool Accessibility Guidelines 1.0, W3C Recommendation 3 February 2000. Available at.
20. Chisholm, W., Vanderheiden, G., and Jacobs, I. (eds.). Web Content Accessibility Guidelines 1.0, W3C Recommendation 5 May 1999. Available at http://www.w3.org/TR/1999/WAI-WEBCONTENT-19990505.