This Wiki page is edited by participants of the HTML Accessibility Task Force. It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Task Force participants, WAI, or W3C. It may also have some very useful information.

Spec Review/Focus

From HTML accessibility task force Wiki
Jump to: navigation, search

Under review:

UAWG Keyboard Use Cases and Recommendations

Migrated from UAWG wiki 26 July 2011

  • Complete and efficient keyboard access is critical for accessibility.
  • We examine high-level things that web protocols and formats can do to enable good keyboard UI:
    • Let users accomplish any task using the keyboard alone
    • Let content coexist and adapt to a wide range of user agents, browser add-ins, nested user agents, other content, and assistive technologies
    • Let the user take advantage of the widest range of keyboard commands and shortcuts
    • Provide the information needed to enable a wide range of keyboard features
    • Let the user retain ultimate control of their experience
    • Protect the user from badly behaved content and nested user agents
  • We present a number of specific topics with use cases, issues, and recommendations, as well as topics that have no clear recommendations.
  • Some of these are already covered by the latest HTML5 draft, while others are not.

Background and concepts used here are discussed in below.


Content in this section migrated from UAWG wiki Keyboard Concepts for HTML5 Discussion, 26 July 2011

Basics of keyboard access

Good keyboard access means:

  • Make keyboard access universal:
    • for every task (letting the user do everything using only a keyboard or keyboard emulator), and
    • for every user (not limiting keyboard access to users with good vision, with a particular keyboard layout, etc.)
  • Make keyboard access usable:
    • easy,
    • efficient (keep the number of input steps as low as is feasible, and not disproportionately higher than for people using both mouse and keyboard),
    • reliable/predictable (making sure that keyboard commands act in ways the user expects, including consistency with standards, within related content, and between user agents)
    • easily learned and remembered

Techniques for keyboard access include:

  • sequential navigation : the ability to explore by moving the keyboard focus forward and backwards through all the items that can be visited (e.g. tab and shift+tab to move between controls, left and right arrow to move between characters, or up and down arrows to move between lines of text)
  • direct commands:
    • direct navigation: being able to move the focus directly to the target you want, rather than having to go through everything in between
    • direct activation: being able to trigger an element's action without having to move the focus to a corresponding element (e.g. pressing Ctrl+S at any time to activate the Save menu item)
  • structural navigation: moving the focus using the structure of the content (e.g. forward and backward by sentence, paragraph, group, page, section, frame), as well as being able to see the structure (e.g. choosing a destination by moving through a hierarchical view of the document headings)
  • spatial commands:
    • spatial navigation: moving the focus based on the multidimensional location of things on the screen (e.g. using arrow keys to move between spreadsheet cells)
    • spatial manipulation: commands that move an object in particular directions (e.g. scrolling a window, moving a pushpin on a map, moving the pointer on the screen)
  • textual navigation: moving the focus to destinations based on text (e.g. finding a search string on a page, or moving to a control by typing the first characters of its label)

Mission, Goals, and Principles

Mission: To ensure that software and web technologies can provide the user with full and efficient keyboard access even when working with multiple documents, web apps, add-ins, nested user agents, and accessibility aids.

Goals and Principles:

  • Let users do everything from the keyboard:
    • Access to all elements. Let the user discover, navigate to, and manipulate everything.
    • Access to all input operations. Provide the ability to simulate all non-keyboard input operations so they can be done using the keyboard and other input systems.
  • Let things give users maximum flexibility over the keyboard UI:
    • Put the user in control. User agents should have flexibility to alter or supplement author-specified keyboard shortcuts, tab stops, behaviors, etc. That is, as with many things, user agents should be able to give the user (rather than the author) ultimate control. (The downside to this approach is that a badly designed user agent can mess up both the author and the user, but at least the user can choose user agents, while in many cases they cannot choose their content.)
    • Let content provide hints that allow user agents to add sophisticated keyboard capabilities. For example, if the user agent wants to provide a command to navigate to the navigation bar, it needs to known which element(s) in the content are the navigation bar, and if it wants to provide commands to navigate forward and backward through the document, it needs to understand the content's recommended reading order.
    • Support the full range of keyboard inputs, including unmodified keys, key combinations including the full range of modifier keys, and key combinations.
    • Let things provide information to support enhanced keyboard UI. Specific information needs to be defined in content (etc.) and passed on by the host and platform. For example, accessibility aids and the user agent need to be able to determine an author's intended reading and navigation order, accessibility aids need to be able to determine properties of content elements such as name and keybinding (e.g. to present them to the user) and location (e.g. to click on it if it does not fully support programmatic control).
    • Let things provide methods to support enhanced keyboard UI. Programmatic interfaces need to be defined, and passed on by the host, so that other components can programmatically perform any actions on an element that can be done using a mouse, keyboard, or touchscreen, and so that they can alter the presentation of elements such as to more prominently indicate focus indicated or to make keybinding discoverable,
  • Let users work with multiple things at the same time:
    • Allow content, components and add-ins to coexist and cooperate, even when they were developed independently. For example, allowing them to negotiate allocation of the limited set of possible keyboard shortcuts, allowing to user to transition between the UI of the user agent, content, nested user agents, etc.
    • User agents should be able to prevent things from breaking keyboard access. For example, an embedded object should not be able to trap the keyboard focus or break the methods of exiting that the user is used to, and content should not be able to trap the focus on an input field until the user enters properly formatted data.
  • Let things adapt to keyboard restrictions, conventions, and conflicts:
    • Let things negotiate keybindings, determining and adapting to the set of possible keyboard inputs and which are impossible (e.g. not on the user's keyboard), reserved by the platform (inputs reserved by the host or operating system so they cannot be changed by the component, e.g. Alt+Tab), restricted by convention (e.g. standard keybindings for copy, paste, exit, etc. that should not be changed lightly), or in use (e.g. . For example, allowing them to negotiate allocation of the limited set of possible keyboard shortcuts
    • Let content reflect actual keyboard shortcuts, such as incorporating them into their instructions, even if they are changed to adapt to the environment (platform, user agent, other content, add-ins, etc.).
    • Allow components to conform to the host's keyboard conventions, such as allowing a custom control (e.g. a drop-down list box implemented entirely in Javascript) to set its keyboard commands to match those its host browser provides for its equivalent controls, or allowing a form in HTML to emulate the navigation and access key behavior of native dialog boxes.
  • Let users discover keyboard UI:
    • Let things determine the active keybindings for all components so they can be presented to the user as a training aid or reference. This also allows the user to determine what command a keyboard input is currently mapped to, so they can identify what they accidentally did, or whether it is safe to repurpose a specific keyboard input because its current function is available through other means or not one the user would ever need.


User agents are platforms. Most platforms are designed to support multitasking, but most only support it well when the user is only interacting directly with a single application at a time. That's sufficient for most users, most of the time, but is entirely insufficient for a lot people who rely on assistive technology. Those products often need to modify input or output, or provide global commands that the user can activate regardless of what application or context they're working in, and only many platforms creating such requires undocumented, unsupported, and unreliable hacks simply because the platforms don't provide the necessary infrastructure. As we are defining the future platform architecture we need to do better, not just for assistive technology, but because Web browsers have made add-ins and plug-ins most prominent and popular even among the mainstream.

Unfortunately, most users and developers—including platform and standards developers—only think about a very limited set of input and output options: graphical output with keyboard for text entry and a mouse or touchscreen for navigation, and users interacting directly with only one application at a time. Those limited views do not accommodate the variety of users and their needs.

In reality the user works in an environment made up of many things, including:

  • hardware (e.g. available primary and modifier keys),
  • platforms (e.g. operating systems, GUIs, and window managers that might use or reserve keys),
  • user agents (e.g. web browsers),
  • content, including documents (e.g. HTML pages) and web apps (e.g. dynamic HTML) rendered in a browser,
  • browser extension and plug-ins (e.g. components that modify browser UI or content, or act within the browser on behalf of external utilities),
  • nested user agents (e.g. embedded media player), and
  • external utilities (e.g. accessibility aids that don't run within the browser, yet examine, modify, or provide alternate input or output for browsers and the content they render—and usually other applications as well).

These things often nest and coexist, as in:

  • Content hosting content, e.g. HTML rendered by the browser uses iframe to host HTML also rendered by the browser
  • Content hosting user agents, e.g. HTML rendered by the browser hosts a media player
  • User agent hosting multiple content frames, e.g. browser showing window split between two web pages, which could be static documents or web apps that may or may not interact with the content of other frames
  • User agent hosting add-ins, e.g. add-in creates sidebar providing separate view of or interacting with content being viewed in the browser, and creates its own keyboard commands that may function globally within the browser window or entire browser session
  • User agent interacting with external accessibility aids, e.g. a screen reader providing keyboard shortcuts to read portions of the text

Keyboard Commands

There are several types of keyboard commands:

  • Basic keys are simple keys that have dedicated functions (e.g. the A key enters the letter A; the Del key deletes something, be it a character, a file, contents of a cell; the Left arrow key moves something to the left, etc.)
  • Shortcut keys:
    • Access keys on some platforms are part of the visible label that can be used as a quick way to activate and/or navigate to the associated UI element (e.g. in a dialog box, an underlined S on the Save button indicates that you can press S to activate the button if the focus is not in a text input field, or Alt+S to activate it even if the focus is in a text input field)
    • Hotkeys trigger an action without necessarily triggering any particular UI element (e.g. Ctrl+S saves the current document works even if there is no Save menu item or it is not visible, and F1 brings up help).

Keyboard commands and shortcuts are particularly important for people with disabilities:

  • Many users with disabilities rely entirely on the keyboard or keyboard emulators because they have difficulty manipulating a mouse, seeing a screen, etc.
  • Users who cannot use a mouse often increase the number of shortcut keys in order to make tasks more efficient, especially people for whom each keypress is time-consuming, tiring, or painful.
  • Many users rely on accessibility aids that use their own set of keyboard commands, which can conflict with those defined by user agents, add-ins, or content.

While users with disabilities may need more keyboard shortcuts, the possible shortcuts are limited by:

  1. limitations in the hardware and system (e.g. no Command key on Windows or Windows key on a Macintosh, no Theta on non-Greek keyboards, no numeric plus key on compact keyboards),
  2. need to avoid conflicts with keys reserved by the platform and the user agent's user interface (e.g. Alt+Space displays the window menu on Windows, or Spotlight on OS X),
  3. need to avoid conflicts with keyboard conventions (e.g. if the user expects Ctrl+C to be Copy everywhere on the platform, content and applications can but shouldn't override that to make it trigger a different action),
  4. user agent limits on shortcut characters (e.g. whether the browser limits them to letters, allows compound characters, etc.), and
  5. user agent and format limits on using unmodified keys, different modifier keys, and key sequences (e.g. HTML4 and HTML5 accesskey only allows specifying a single base character)

Ways around these limitations include:

  1. allowing alternate modifiers (e.g. Ctrl+E does one thing, Ctrl+Shift+E another),
  2. allowing key sequences in addition to key combinations (e.g. Ctrl+E,a does one thing, while Ctrl+E,b does another), and
  3. allowing unmodified keys (e.g. F12, or the letter A), although this presents risk for users relying on alternate or automated input, or who have trouble perceiving mode-change indicators while working

We want content and nested user agents to register their keybindings with their host in order to:

  1. allow negotiation to avoid conflicts (e.g. web app changes bindings that conflict with the browser)
  2. allow the user agent and tools to provide enhanced UI (e.g. keybinding lists and reconfiguration)

Shortcut Conflicts

It's worth noting that direct activation and direct navigation commands (shortcut keys) come in two flavors, which we'll call:

  • Access keys provide a quicker way to activate and/or navigate to a UI element in your current context, but they should never be the only way.
    • In some contexts accelerator characters may be usable without any modifier or prefix keys (e.g. S to press the Save button in a dialog box if the focus is not in a text input field) and/or with certain modifier or prefix keys (e.g. Alt+S to press the Save button in a dialog box even if the focus is in a text input field). However, neither work when is the focus were in another context, such as an active menu.
    • HTML implements this using the accesskey attribute, which can be put on nearly any element.
  • Hotkeys trigger an action without necessarily triggering any particular UI element (e.g. Ctrl+S to save the current document works even if there is no Save menu item or it is not visible, although it still wouldn't work if the focus is in another context such as an active menu).
    • The platform should not change these because there is no UI informing the user of the change.
    • HTML5 implements this with the command element, which lets the author associate an accesskey attribute with any scripted action.

Note: Hotkeys are often called shortcut keys, but I'm avoid that term because the HTML spec uses it to include both access keys and hotkeys.

How shortcut keys conflict

One difficulty with hotkeys is that you can have multiple sets active at the same time. For example, the browser might define Ctrl+F as Find, while an add-on defines F12 as displaying a particular sidebar, and the active document might define Ctrl+F as moving focus to one of its input fields. On the other hand, hotkeys can be disabled when the focus is in another context, such as when a menu is active.

By contrast, access keys are designed so that, for the most part, only one set can be active at any time. For example, when typing in a word processor the menu bar is visible and only the access keys for its menus work, along with the access keys of controls in the document. However, when you display the File menu only the access keys for its menu items, and when a dialog box is active, only the access keys for its controls work.

Unfortunately this can break down at times, such as when the active window has a menu bar and also contains controls; in those cases it's often undefined what should be the proper behavior if the access keys for a menu conflicts with that of a control. When an application designer controls both the menu and the window content they can choose access keys that don't conflict, but if the application designer isn't in charge of the window content (e.g. developing a word processor that can show documents with embedded controls) or a content designer isn't in charge of the menu (e.g. authoring a web page that can be viewed in a variety of browsers) it is difficult to avoid conflicts.

Three solutions for conflicts

One solution is to use different modifier keys for application vs. content; Firefox 4 does this by using the Alt with access keys in the application but Shift+Alt with access keys in content. This avoid conflicts between application and content, but it's not perfect because users have to learn a new method of invoking access keys and constantly switch back and forth depending on context. It doesn’t seem to terrible to make the user learn that content in a particular browser work in a different way than applications, but it can be a problem if an author can hide the browser's window controls, in which case the window may look like a native application window but still function like browser content.

Another solution would be for the application to simply not have access keys, but that obviously reduces usability of the application.

Neither of these methods prevent conflicts when using multiple pieces of content (e.g. with iframes) or multiple application components (e.g. a browser and its add-ins).

A third method is to allow access keys to conflict, but ensure that it doesn't make anything unavailable. For example, if there are multiple items with the same access key, pressing the access key can merely move the focus to the next item in that set without actually activating it. The user can then use the access key to quickly navigate between the items, and another key (such as Enter) to activate the one with the focus. This system is how Windows handles conflicts within a menu bar, menu, or form, and so it's already familiar to many users. It decreases efficiency (as things take more keystrokes) and usability (because access keys can behave differently depending on the combination of application, add-ins, and documents), but at least all the functionality is still available and the number of additional keystrokes is usually very small; in that way it's far better than having one access key win and the others be ignored, which could multiply the number of keystrokes required for a task many, many times.

This method does not work for hotkeys as they may not have any UI to move focus to, and so no way to give users feedback when the hotkey key didn't have its expected effect.


The most significant change for keyboard support in HTML5 is the introduction of the command element, which can be used to create menu items with access keys and also to associate a hotkey with some other scripted action.

The introduction of native drag and drop also provides an opportunity to improve keyboard emulation of this activity for cases where the content doesn't implement an equivalent mechanism for the keyboard.


Sequential navigation to all elements that take focus or input

Users need to be sure they can explore and find all focusable and actionable elements, even if they cannot use a mouse.

  • Use case: Laurie is tabbing through a dynamic web page, but finds that there are certain buttons she cannot reach because the author, thinking only of mouse users, has specified that the buttons should not be included in the tab order by setting tabindex to a negative number. Therefore Laurie, who relies entirely on keyboard input, cannot access some functionality on the page.
  • Use case: Laurie is using a web page that contains a custom control, an image that does not take keyboard input or focus but does have an onClick handler. Therefore Laurie, who relies entirely on keyboard input, cannot click on the element to activate it, and even though her browser provides a context menu that would let her activate the image's onClick event, it does not let her move focus to it because that would violate the HTML5 specification.
  • Recommendation: Specification should explicitly state that user agents are allowed and encouraged to provide modes or commands that let the user move focus to all elements that take focus or input, even if the author has indicated that the element should not normally be included in sequential navigation, and even if the element takes input (e.g. has an onClick handler) but lacks other attributes that would normally render it focusable.

Greg to file bug though need to stay clear of deliberately non-focusable (aria-hidden, @hidden, display:none) Bug 13532

  • HTML5 Status: The current HTML5 specification (7.3.1 Sequential focus navigation and the tabindex attribute) says that if the tabindex value is a negative number, "The user agent must allow the element to be focused, but should not allow the element to be reached using sequential focus navigation." I'm not sure whether the use of "should" rather than "must" means that the user agent is allowed to include these in sequential navigation, or whether it is still forbidden. Likewise the spec says that if the tabindex value is a zero the user agent "must" allow the element to be focused, but only "should" allow the element to be reached using sequential focus navigation; if the user agent doesn't provide the ability to sequentially focus all elements that take input, then the spec should be changed to read that elements with tabindex of zero "must" be included in sequential navigation.

Navigation to and through non-editable content

  • Issue: Is there anything that HTML5 should do to facilitate this feature? I haven’t thought of any. It seems like the user agent can do everything it needs without any explicit support in the source language.
  • Use case: Wayne needs to select and copy some content from a Web page. Pressing the Tab key would normally move the focus between controls, links, frames, and the browser UI, but it would not stop at blocks of read-only text and images. For this task Wayne turns on his brower's "caret browsing mode," which adds each block of read-only content to the tab order. He can then move focus to the appropriate block, move the text cursor through it, select a range, and copy it to the clipboard or invoke its shortcut menu, all using the keyboard.

Greg to file bug about UA handling of caret browsing Bug 13533

Discuss if caret keeps its position when going "back" to a page

Preventing validation from trapping focus

  • Use case: Svetlana is tabbing through the controls on a form and lands on a field that expects a telephone number, but when she tries to tab away the user agent puts up an error message saying that a valid telephone number is required. Even though she had no intention of completing the form, she is stuck until she makes up and enters a telephone number.
  • Use case: Etta brings up a web form showing her account information. As she tabs between the fields she lands on one containing her current password. Unfortunately, the security requirements for this web site have recently been increased and her password is no longer considered secure enough, so even though she's only tabbing through the fields, the current value fails to validate, and she is prevented from tabbing on with its current value or by clearing the field.
  • Recommendation: Any field that validates input should allow the user to exit the field. At minimum, allow them to exit with the field being empty or retaining its initial value.
  • HTML5 Status: Unknown

Greg to submit bug that spec be explicit about this

Reading and navigation order

Authors should be able to specify preferred direction and/or order for sequential navigation, even among things such as tables that would not normally have a tab order.

  • Use case: Masahiko is reading a web page, and uses browser commands to move the text cursor to the next and previous paragraphs. In most cases this works fine because the suggested reading order is that in which elements occur in the HTML. However, when Masahiko encounters a table that is designed to be read down the columns rather than across the rows, this simplistic navigation is entirely inappropriate. A similar problem occurs when CSS is used to rearrange blocks of text on the screen.
  • Recommendation: Allow marking up a table to indicate whether the preferred reading order is by columns, by rows, both, or neither. This could be done with a new attribute, such as orientation="columns".

Greg to file bug Bug 13539

  • Recommendation: Allow marking up any element with a reference to the logically next and/or previous elements, for use when those are not the next/previous elements in the source. This could be done with new general attributes, such when a browser was implementing caret browsing and the caret moved beyond the end of a paragraph marked up with next="story5", that attribute would be a hint to the browser to move the caret to the element with id="story5" rather than to the element that follows the paragraph in the HTML source.

Greg to file bug but not with a11ytf keyword until we shake out the suggested mechanism Bug 13540

  • HTML5 Status: HTML5 allows the user to mark up tables with column and row headings, but not specify whether the table's preferred reading order is by rows or by columns. HTML5 also provides the tabindex attribute to specify order, but it is optimized for a relatively small number of controls, and applies to controls that normally take keyboard focus but not to anchors or to text and other elements that are subject to navigation in caret browsing modes.

Facilitate navigating related pages

  • Use case: Jason wants to reduce the number of keystrokes he enters, so when reading web sites he doesn't want to have to tab through all the links and controls on a page just to use the link that takes him to the next page. Instead, he uses his browser's keyboard commands that load the next, previous, first, last, and main pages based on the link elements in the current page's header. Unfortunately, a significant number of site—including some major news sites—fail to provide these elements, so Jason is forced to tab his through their pages.
  • Recommendation: Provide a way to mark up a link or control to identify it unambiguously as leading to the next, previous, first, last page, etc. This could be an attribute on a link (e.g. rel="next") or other mechanism. Even though this would be redundant to the existing link elements, it may increase the number of sites that support automated navigation shortcuts.
  • HTML5 Status: HTML5 doesn't add anything beyond the existing link elements.

Shortcut Keys

Greg to file bug catch-all and supporting bugs that depend on it for the issues in this section Bug 13555 is catch-all; supporting bugs are Bug 13564, Bug 13565, Bug 13575, Bug 13576, Bug 13576

Shortcut keys consist of both access keys (e.g. S or Alt+S to activate the Save button, or Alt+F to activate the File menu) where the user agent is trying to emulate behavior of the platform, and hotkeys (e.g. Ctrl+S to trigger the "save" action, Ctrl+Shift+S to trigger the "save as" action, or Ctrl+C to trigger the "copy" action). See detailed discussion above.

We want content and nested user agents to register their keybindings with their host in order to:

  1. allow negotiation to avoid conflicts (e.g. web app changes bindings that conflict with the browser)
  2. allow the user agent and tools to provide enhanced UI (e.g. list and/or modify keybindings)

Negotiating shortcut keybindings

There should be a mechanism for components (user agents, documents, web apps, embedded objects, accessibility aid, etc.) to negotiate which keyboard commands will be used by one or the other.

  • Why is this an accessibility issue:
    • Users with disabilities are much more likely to rely on keyboard access. For them, keyboard conflicts might present insurmountable barriers, while they'd only be minor inconveniences for users routinely using the mouse.
    • Users who cannot use a mouse often need to drastically increase the number of shortcut keys in order to make tasks more efficient, especially people for whom each keypress is time-consuming, tiring, or painful. Increased number of shortcuts increase the number of potential conflicts.
    • Users with some cognitive impairments have more difficulty adjusting when their accustomed methods suddenly fail to work, or when commands they use suddenly do something unexpected.
Negotiation between host and embedded object
  • Use case: An embedded object uses Shift+Esc for one of its control functions, but it's run on a browser that uses this same key combination as the method for returning focus to the browser. In the simple cases, either the user wouldn't be able to exit the object using the method they're familiar with, or else they couldn’t use some function in the object because the keystroke would exit the object instead.
  • Recommendation: I'd say that it is important that the user have a consistent way to exit all embedded objects, because without this they can become effectively trapped; even if there is a way out, the user may not know it or be able to look it up when needed. The embedded object needs to be able to determine that on this particular browser it cannot use a particular command (e.g. Shift+Esc) and adjust its command set, user interface and instructions accordingly. If the user does exit the embedded object using the host's command, the host should inform the embedded object so that it can "clean up" and handle the action gracefully.
  • HTML5 Status: Unknown
Negotiation with nested hosts
  • Use case: Pablo is used to pressing Ctrl+W to close a browser window. In his browser he's reading a page that contains an embedded user agent, and while browsing in THAT context he presses Ctrl+W to close the window. Unfortunately, the script being run by the inner user agent did not know that Ctrl+W was used by the outer user agent, so it grabs and consumes the keyboard input and carries out some action, so Pablo is unable to use his accustomed method to close the outer browser's window.
  • Recommendation: There should be some way for embedded objects, including nested user agents, to determine which shortcut keys are being used all the layers hosting it, so it can modify its own shortcut keys to avoid conflicting with them.
  • HTML5 Status: Unknown
Negotiation between host and content
  • Use case: Pablo is used to pressing Ctrl+F to find a string on the current page. However, an online encyclopedia grabs Ctrl+F and moves the focus to its own text input fields that's used to search the entire encyclopedia.
  • Recommendation: The script on the page should be able to query whether Ctrl+F is already assigned to something in the system (the host browser, a browser add-in, etc.), and if it is it can identify a new, unused keyboard input, map that to its control, and incorporate that into the instructions it presents to the user.
  • HTML5 Status: HTML5 allows the content to provide a list of suggested Unicode characters, but the user agent gets to decide on the actual key assignment, including base character and modifier keys. The content script can retrieve a user-friendly string representing the assignment. (That's enough for this use case but is too limited for some of the others.)
Disabling unmodified keys as shortcuts

Some components use unmodified letters, numbers, and punctuation marks as keyboard command. This can be handy for users who want to make keyboard input as efficient as possible, including some users with disabilities, but for other users with disabilities it can be a significant problem because everyday text input can trigger a large number of seemingly random actions if it's entered in the wrong context. Therefore user agents should be permitted to make this available as a user option.

  • Use case: Tom uses speech recognition to input text and commands, and he's working in a Web-based word processor while in the background another Web app or browser add-in is downloading a large file. He's in the middle of dictating a letter the background task steals the activation to notify him that the download has completed. Suddenly the text that was supposed to go into a letter is interpreted in the new context as dozens of commands. Tom looks at the browser and finds that his project in that context has been altered or deleted altogether, and also that display options have changed and he has no idea what command would server to restore them.
  • Recommendation: Tom goes into his browser's preference settings and clears a check box to disable the use of unmodified keys as commands and shortcuts. When the Web app starts up, it asks the browser whether the letter "u" is available as a shortcut and is told that it is restricted by policy. Therefore the app goes down its list of preferred keystrokes, determines that Ctrl+U is available, and configures itself to use that instead. It may even display an indicator on its status bar warning the user that non-default keyboard commands are being used. The user can then go into the app's configuration screen to find out the current keybindings.
  • HTML5 Status: Unclear. The example in 7.4.2 shows that a user agent can use a key unmodified, but 7.4.3 merely says the user agent can assign its choice of "a combination of modifier keys", but does not specify whether no modifier key is a valid option.

Retrieving actual keybindings

If the author can only suggest keyboard shortcuts using accesskey, how can they provide instructions to the user?

  • Why is this is an accessibility issue?
    • Many users with disabilities rely entirely on the keyboard or keyboard emulators because they cannot physically manipulate a mouse, or see the screen, etc.
    • Many users have difficulty memorizing keybindings, and this is more extreme for users with some cognitive impairments.
    • Memorizing keybindings is more difficult when they are likely to be adjusted to avoid conflicts, and such adjustments are more common among people with disabilities. Users who cannot use a mouse often increase the number of shortcut keys in order to make tasks more efficient, especially people for whom each keypress is time-consuming, tiring, or painful, and many users rely on accessibility aids that use their own set of keyboard commands.
Retrieving user-friendly keybindings
  • Use case: In a web browser, Aaron views a Web page that has a button with accesskey="E". The author wants to incorporate instructions on the page or popup that explain the keyboard commands, but unfortunately they can't predict what keybinding the browser will use: the accesskey attribute is merely a suggestion, and the actual value will vary depending on both the browser and the platform it's running on.
  • Recommendation: HTML5 defines the new command object with property called assignedaccesskey. To solve this we could define a new property, accessible through the DOM, that returns the actual keybinding associated with an element. In one potential implementation, the page script retrieves the anchor's accesskeystring= value, which Firefox 4 would set to "Shift+Alt+E", but Internet Explorer would set to "Alt+E", Opera would set to "Shift+Esc, E", Konqueror would set to "Ctrl+E", Safari 4 on MacBook Pro would set it to "control+E" but on Windows would set it to "Alt+E", etc. The script can then insert this string into the instruction paragraph on the page so that on Firefox it would read "To delete, press Shift+Alt+E", but on Internet Explorer it would read "To delete, press Alt+E", and so forth.
  • HTML5 Status: It looks like HTML5 defines a new command element with a programmatically-retrievable accessKeyLabel attribute, whose value the user agent calculates based on its accesskey attribute. The accessKeyLabel string is presumably human-friendly, but no specific guidance or examples are provided. (This also means that the string is cannot be parsed by the script, as discussed below.)
Retrieving machine-friendly keybindings
  • Use case: Aaron asks his web browser to display a list of the currently active keyboard shortcuts. In the draft HTML5 specification it would do this by enumerating the command elements and retrieving each one's accessKeyLabel property. The user agent wants to make the list more useful by offering views sorted and organized in different fashions, including a view that includes separate groupings for unmodified keys, Ctrl key combination, Shift key combinations, and Ctrl+Shift key combinations. Unfortunately, it cannot easily parse the accessKeyLabel string, as it knows neither the names the user agent will use for modifier keys (e.g. "Ctrl", "control", "?", etc.) nor what method will be use to concatenate them (e.g. "Ctrl+a", "Ctrl/A", "^A", "A+Control", "?A", etc., all of which may depend on the platform, user agent, language, and/or locale).
  • Recommendation: I see two reasonable approaches. The first, analogous to that used in most native programming environment, would be for a property to return a programmatic representation of the keybinding, such as a list of codes representing keys and their modifiers (e.g. an integer representing the F5 key along with a mask of bits representing Alt and Shift modification). However, this should not be a substitute for returning a user-friendly string representation, as scripts script could have a difficult time creating one from the programmatic representation. The machine-friendly version could either be returned by an element using a property analogous to accessKeyLabel, or the user agent could be required to provide a function that converts the string returned by accessKeyLabel into a machine-friendly form. The second approach would be to have a different function analogous to accessKeyLabel that returned a string compounded from strings that are language-, locale- and platform-neutral (e.g. "Ctrl+A" for the combination of the "A" key and the equivalent of the control key, even in environments where the latter is referred to as "control").
  • HTML5 Status: This is not supported; currently only a human-friendly string is returned (via the accessKeyLabel property).

Maximizing potential keyboard shortcuts

Specifying detailed hotkeys

Today's accesskey attribute is a single character, and it's up the browser to decide which whether that character will be used or another substituted, and what modifiers if any are required (e.g. accesskey="s" might map to Ctrl+S on one environment and Alt+Shift+S on another). That may make sense for access keys (e.g. S or Alt+S to press the Save button) where the user agent is trying to emulate behavior of the platform, but it is an unnecessary limitation for hotkeys (e.g. Ctrl+S to trigger the "save" action, or Ctrl+C to copy the current record). (See detailed discussion of access keys vs. hotkeys above.)

When defining hotkeys developers of Web apps and authors of documents should be able to assign more specific keys, such as actually requesting Ctrl+I for one a frequently used command and Ctrl+Shift+I for another less frequently used. This would let them assign shortcuts in a way that is more meaningful, both in terms of grouping and being easier to remember. For example, a page for reading email could assign shortcut Ctrl+R to the Reply button and Ctrl+Shift+R to the Reply All button, Ctrl+J to the Forward button and Ctrl+Shift+J to the Forward As Attachment button, and so forth. (Of course any specific key combination may already be used by the host or another component in the system, and some keys may not be available on the user's system, so the browser would either treat these as hints to be modified as needed or else the page's script could negotiate with the user agent as described elsewhere in this document.)

  • Use case: Roger is using a Web-based email client that has a row of buttons for things he can do to the selected message. The author wanted to make keyboard usage as efficient as possible and minimize the number of keystrokes that users such as Roger need to enter, so they assigned easily typed keyboard inputs to the most commonly used commands and more complex inputs to less frequently used commands. In this case, it assigns the shortcut Ctrl+R to the Reply button and Ctrl+Shift+R to the Reply All button, Ctrl+J to the Forward button and Ctrl+Shift+J to the Forward As Attachment button, and so forth.
  • Recommendation: Facilities to establish and adjust keybindings should allow content, add-ins, and the user to bind commands to specific key combinations, rather than merely specifying a single base character. HTML would let the author specify preferred keyboard inputs for accesskey and the like including recommended base keys and modifiers. For example, accesskey="ctrl+shift+i", using standardized, non-locale-specific names for keys and modifier keys. There should also be way for the author to request that a key be unmodified, as discussed below in the section "Unmodified Keys as Shortcuts". These recommended keyboard inputs would be modified by the browser to accommodate impossible, reserved, restricted, or conflicting inputs. For example, when an HTML5 command element is used to create a menu item and the user agent wants to emulate native access key behavior on the Windows platform, it would limit the access key to a single character that could be used alone or with the Alt key depending on the situation, but when a command element is used to establish a hotkey for a scripted action the user agent could allow a wider range of modifiers and/or sequences.
  • HTML5 Status: Currently the author can only specify a single, unmodified Unicode code point, and the user agent assigns any combination of modifier keys it chooses with no further hinting from the author.
Sequences as shortcuts

Allowing key sequences to be used as commands greatly increases the number of keyboard commands that can be used, as well as the ability to make these command mnemonic.

  • Use case: Jemiah is a keyboard user who wants to make her work as efficient as possible. She configures her Web-bases word processor to set up shortcut keys for the dozens of the built-in commands and macros she uses frequently, especially those that would normally take many keystrokes to carry out. However, she quickly runs out of good keystrokes, both because of the sheer number and because she needs to keep them mnemonic in order to easily remember them. Therefore she uses key sequences so she can group related commands together with the same prefix. For example, she uses Ctrl+R as the prefix for all the commands dealing with revisions, with Ctrl+R followed by S to show revisions, Ctrl+R followed by H to hide revisions, Ctrl+R followed by A to accept revisions, Ctrl+R followed by R to reject revisions, and so forth.
  • Recommendation: Facilities to establish and adjust keybindings should allow content, add-ins, and the user to bind commands to key sequences as well as single keys and key combinations. HTML would let the author specify preferred keyboard inputs for accesskey and the like including key sequences, such as accesskey="ctrl+r a" for Ctrl+R followed by the letter A.
  • HTML5 Status: Currently key sequences are not supported as shortcuts. [7 User interaction — HTML5 If the value is not a string exactly one Unicode code point in length, then skip the remainder of these steps for this value.]
Enabling unmodified keys as shortcuts

Some components use unmodified letters, numbers, and punctuation marks as keyboard command. This can be handy for users who want to make keyboard input as efficient as possible, including some users with disabilities, but for other users with disabilities it can be a significant problem because everyday text input can trigger a large number of seemingly random actions if it's entered in the wrong context. Therefore user agents should be permitted to make this available as a user option.

  • Use case: Reggie finds it difficult to press key combinations, and even though many platforms provide the StickyKeys feature that lets her simulate them, she wants to author her own web site so that the shortcuts on the pages are accessed with single keys rather than key combinations.
  • Recommendation: See section on Specifying Detailed Shortcuts.
  • HTML5 Status: See section on Specifying Detailed Shortcuts.
Allowing non-character keys as shortcuts

Allowing non-character keys such as F12 and Del to be used as shortcuts significantly increases the number of keys and key combinations available.

  • Use case: Jeanine is developing a Web app and because most of the normal character keys are taken, she wants to have the F12 key activate the "Exit" link on the page, and Shift+F12 activate the "Save and Exit" button.
  • Recommendation: Provide a way to use keys that cannot be represented as a single Unicode code point. For example, define keywords such as "del", "num-del" and "f12", and allow their use in the accesskey attribute along with normal Unicode characters (e.g. accesskey="f12 del d").
  • HTML5 Status: Currently non-character keys such as F12 cannot be assigned as accesskey shortcuts, as it can only be keys equating to single Unicode code points. 7.4.3 Processing model

Shortcuts for Navigation

Navigation shortcuts without visible links
  • Use case: Jeanine is creating a web page, and wants to define a shortcut that would assist users with disabilities by moving the keyboard focus and point of regard to a specific bookmark in the page. However, she doesn't want to have a link to that location be visible on the page.
  • Recommendation: It should be possible to define a keyboard shortcut for the sole purpose of navigation, rather than activation. It should be possible to target essentially any element in the document, and the command should be defined as navigating to without activating the element.
  • HTML5 Status: Unlike HTML4, HTML5 allows accesskey on all elements, and that it creates "a keyboard shortcut that activates or focuses the element". However, it leaves the actual behavior undefined and thus up to the user agent, meaning that a user agent could implement it so that the keyboard command does nothing if the element does not take input. This should be clarified.
Distinguishing activation from navigation shortcuts

HTML4's specification for accesskey says "Pressing an access key assigned to an element gives focus to the element. The action that occurs when an element receives focus depends on the element." In actuality, whether the accesskey moves focus to and/or activates an element varies from one user agent to another. This means the same content or web app behaves differently in different browsers, and in some cases functionality may not be available at all.

  • Issue: Is it worthwhile to define a mechanism whereby authors would want to be able to specify separate shortcuts for activating vs. navigating to an element, or one but not the other? Or do we merely expect user agents to provide some navigation mechanism for activation and navigation without an input from the content?
  • Issue: Should the user be able to activate a button and afterward have the focus remain (or be restored to) where it was? Would this be the author's choice and/or the user's?
  • Use case: Juan is viewing a web page which has a link to a document, and assigns an accesskey to a link. In one browser, Juan can press the accesskey to move the focus to the link, then press a key to display the link's shortcut menu, then select the command to show him the destination and title attributes of the link. However, on one browser pressing the accesskey activates the link, taking him away from the page he was on, so he cannot use the accesskey to navigate to the link, instead having to press the tab key a dozen times.
  • HTML5 Status: The HTML5 spec provides even less guidance than that for HTML4, saying merely "The accesskey attribute's value is used by the user agent as a guide for creating a keyboard shortcut that activates or focuses the element."
User choice between navigating and activating
  • Use case: Roger is using a Web-based email client that has a row of buttons for things he can do to the selected message. Roger can flag a message by clicking the Flag button or pressing the button's shortcut key. When he does either, the focus is left on the button, so after doing this he almost always has to navigate back to the message list before he can use the arrow key to select the next message. He would much rather have focus remain on the message, or move to the next message after activating the Delete button.
  • Recommendation: Let the author specify separate keyboard shortcuts for activating and navigating to an element. For example, HTML could replace or supplement the accesskey attribute with separate activationkey and navigationkey attributes; pressing the former would activate the element without moving focus, while pressing the latter would move the focus to the element without activating it, and if the values were the same then pressing the key would move focus to and activate the element.
  • HTML5 Status: Currently it appears that command elements can only be used for activation, not for navigation. It is implied, but not explicitly stated, that the target control should be activated without moving the focus.

Emulating non-keyboard operations

Simulating drag and drop
  • Use case: June is using interacting with Web page that uses HTML5's drag and drop facilities. For users like June who can't use a mouse, the web browser provides a keyboard mechanism that lets her carry out all the drag and drop operations using only the keyboard, including not only the normal drag and drop but also the behavior of drag and drop modified by various modifier keys. June presses the tab key until the focus is on the element she wants to drag (and she may have had to turn on a special mode to add drag sources to the tab order), and from its shortcut menu chooses the "Drag and drop" submenu, then "Select for Shift + drag and drop". She then presses the tab or an access key to move the focus to the drag target, and from its shortcut menu chooses the "Drag and drop" submenu, then "Drop". The browser sends the proper commands to the page's script to simulate all stages of the process, including triggering dragstart, ondrag, ondragenter, ondrop, etc. events.
  • Issue: Does HTML5 provide everything it needs to allow user agent to enable this type of feature?
  • Issue: Can the content tell the user agent what commands are supported, e.g. drag for move, shift+drag for copy? Can there be a way to register these?
  • HTML5 Status: Unknown.

Greg to file bug to be sure drag and drop fully supported Bug 13591

Greg to file bug about allowing content to register what it supports Bug 13593

Other Keyboard Issues

  • Handling shortcut conflicts. While we can define mechanisms to let dynamic content, add-ins and the like negotiate to try to avoid conflicts, they are likely to still occur at times. With HTML4 and below, the behavior in such circumstances is left undefined, and consequently is handled differently by different user agents and so cannot be planned for.
    • Issue: Should the author be able to suggest ways to handle shortcut conflicts? If an author could specify that pressing a shortcut would move focus sequentially between the elements that have that accesskey, and wrap, they author could design keyboard UI to take advantage of this in a way that is not otherwise possible. On the other hand, issues such as whether sequential navigation wraps should ideally be under the user's control, since wrapping is a significant advantage to some users (e.g. those who need to minimize the number of input commands) but a disadvantage to others (such as those who may not be able to tell when wrapping has occurred).

Greg to file bug Bug 13594

  • Partially downloaded content. What if a web page pulls down content as needed (e.g. Google Maps), and elements in the content may have keyboard shortcuts, but some of those elements are in portions of the content that won't be downloaded until needed?
    • Issue: Should the Web page be able to download a complete list of the keyboard shortcuts associated with elements in the content, and have host user agent notify it when those keys are pressed so that it can download and present the corresponding portion of the content? Is this necessary, or since the as-needed downloading is handled by scripting, is it simply the script's responsibility to handle these shortcuts as well?
    • Use case: A web page displays a list of tens of thousands of names in alphabetical order, with a heading for each initial letter, and a shortcut for each heading. Rather than downloading the entire list, it wants to download portions of the content only as they're needed, in response to the user scrolling the window, moving the text cursor through the content, or pressing a shortcut key associated with the headings.
  • Presenting access keys to the user.
    • Issue: Can there be ways to automatically incorporate the shortcuts into the way content is presented? For example, many GUIs underline the shortcut key in a text label, but that doesn't work with browsers that underline all link text. Do most browsers still not present accesskeys defined by the element? Is there anything that could or should be done in the protocols for formats to assist with this?
    • Recommendation: Where the label element is described in the HTML5 spec, it should specifically discuss and recommend use of the accesskey attribute, and give an example of how a user agent may present not only use this value but also present it to the user. For example, when displaying label text a user agent could underline the first occurrence of the accesskey character in the displayed label text (called an implicit designator), or if the character does not appear in the label text, it could append a space and the underlined accesskey character in parentheses (called an explicit designator).
  • Distinguishing between access keys and hotkeys
    • I'm concerned that perhaps HTML5 should distinguish more clearly between access keys, hotkeys, and menu items. If they should behave differently, should the same element (command element) really be used for both? Even if the user agent can distinguish between them, treating them differently, will it confuse authors?

Greg Lowney

Defining generic commands that have associated keybindings is an extremely powerful mechanism that lets user agents give the user control over keybindings. One use of this is to automatically generate documentation for the user providing a reference and guide to the keyboard commands as they're actually configured. However, a long, unorganized list of keybindings, while better than nothing, is still extremely difficult to use. This could be much easier if the user agent (or tool) could organize the list, as well as allowing the user to filter and navigate it intelligently. This would be possible if the author could provide hints with each command, such as recommended categories or keywords.

Use case: Carlos relies on the keyboard, and command keybindings are very important way for him to perform tasks efficiently. He is using a web-based application, and asks his web browser to present a list of all the commands defined by the web-app, which he can consult and print out for future reference. The browser has already processed all the commands defined in the HTML source, including those created by interactive elements with acccesskey as well as those command elements that associate an action with a keyboard input. Unfortunately, this list is very long, especially if it's combined with the browser's own commands. Luckily, the browser's dialog box contains buttons for sorting the list alphabetically or by category (e.g. commands relating to tables, commands relating to view options, commands for formatting, etc.), and it's able to do that because the author was able to supply a user-friendly name and category (or keywords) for each command. When Carlos wants to look up a command but doesn't know the name assigned to it, or wants to look up a bunch of related commands, he can use the category view, just like those provided in printed software user guides. When he already knows the official name of the command, he can use the alphabetical view to find it quickly. Note that this is particularly important when Carlos moves between different user agents that assign different keybindings to the author's commands.

HTML5 Status: HTML5 lets the author provide a user-friendly name for command elements, but

Recommendation: HTML5 should allow the user to associate a category or, even better, multiple categories or keywords, with each author-defined command. My preferred method would be to allow a keywords or tags attribute on any element that can be used to define a command (that is, all the methods described in section 4.11.5 Commands), as discussed in more detail elsewhere in this document (see 11 Standard pieces of information should be automation-friendly). There would also be value in letting the author define a primary keyword or category, but I think that's less critical than allowing multiple keywords or categories.

Greg to file bug Bug 13616

Andrew Kirkpatrick

The sections under 7.5.1 (Move the Caret and Change the selection) provide this as a vague indication of potential keyboard support:

"This could be triggered as the default action of keydown events with various key identifiers and as the default action of mousedown events."

Does this suffice as a requirement for keyboard access? Do we expect that the spec should indicate this more clearly, or is this expected to be handled by browsers wanting to comply with Section 508 or other accessibility standards?

Andrew to file bug Bug 13573