- Important note: This Wiki page is edited by participants of the User Agent Accessibility Guidelines working group (UAWG). It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Working Group participants, WAI, or W3C. It may also have some very useful information.

Keyboard Concepts for HTML5 Discussion

From WAI UA Wiki
Jump to: navigation, search


  • Complete and efficient keyboard access is critical for accessibility.
  • We examine high-level things that web protocols and formats can do to enable good keyboard UI:
    • Let users accomplish any task using the keyboard alone
    • Let content coexist and adapt to a wide range of user agents, browser add-ins, nested user agents, other content, and assistive technologies
    • Let the user take advantage of the widest range of keyboard commands and shortcuts
    • Provide the information needed to enable a wide range of keyboard features
    • Let the user retain ultimate control of their experience
    • Protect the user from badly behaved content and nested user agents
  • We present a number of specific topics with use cases, issues, and recommendations, as well as topics that have no clear recommendations.
  • Some of these are already covered by the latest HTML5 draft, while others are not.

Basics of keyboard access

Good keyboard access means:

  • Make keyboard access universal:
    • for every task (letting the user do everything using only a keyboard or keyboard emulator), and
    • for every user (not limiting keyboard access to users with good vision, with a particular keyboard layout, etc.)
  • Make keyboard access usable:
    • easy,
    • efficient (keep the number of input steps as low as is feasible, and not disproportionately higher than for people using both mouse and keyboard),
    • reliable/predictable (making sure that keyboard commands act in ways the user expects, including consistency with standards, within related content, and between user agents)
    • easily learned and remembered

Techniques for keyboard access include:

  • sequential navigation : the ability to explore by moving the keyboard focus forward and backwards through all the items that can be visited (e.g. tab and shift+tab to move between controls, left and right arrow to move between characters, or up and down arrows to move between lines of text)
  • direct commands:
    • direct navigation: being able to move the focus directly to the target you want, rather than having to go through everything in between
    • direct activation: being able to trigger an element's action without having to move the focus to a corresponding element (e.g. pressing Ctrl+S at any time to activate the Save menu item)
  • structural navigation: moving the focus using the structure of the content (e.g. forward and backward by sentence, paragraph, group, page, section, frame), as well as being able to see the structure (e.g. choosing a destination by moving through a hierarchical view of the document headings)
  • spatial commands:
    • spatial navigation: moving the focus based on the multidimensional location of things on the screen (e.g. using arrow keys to move between spreadsheet cells)
    • spatial manipulation: commands that move an object in particular directions (e.g. scrolling a window, moving a pushpin on a map, moving the pointer on the screen)
  • textual navigation: moving the focus to destinations based on text (e.g. finding a search string on a page, or moving to a control by typing the first characters of its label)

Mission, Goals, and Principles

Mission: To ensure that software and web technologies can provide the user with full and efficient keyboard access even when working with multiple documents, web apps, add-ins, nested user agents, and accessibility aids.

Goals and Principles:

  • Let users do everything from the keyboard:
    • Access to all elements. Let the user discover, navigate to, and manipulate everything.
    • Access to all input operations. Provide the ability to simulate all non-keyboard input operations so they can be done using the keyboard and other input systems.
  • Let things give users maximum flexibility over the keyboard UI:
    • Put the user in control. User agents should have flexibility to alter or supplement author-specified keyboard shortcuts, tab stops, behaviors, etc. That is, as with many things, user agents should be able to give the user (rather than the author) ultimate control. (The downside to this approach is that a badly designed user agent can mess up both the author and the user, but at least the user can choose user agents, while in many cases they cannot choose their content.)
    • Let content provide hints that allow user agents to add sophisticated keyboard capabilities. For example, if the user agent wants to provide a command to navigate to the navigation bar, it needs to known which element(s) in the content are the navigation bar, and if it wants to provide commands to navigate forward and backward through the document, it needs to understand the content's recommended reading order.
    • Support the full range of keyboard inputs, including unmodified keys, key combinations including the full range of modifier keys, and key combinations.
    • Let things provide information to support enhanced keyboard UI. Specific information needs to be defined in content (etc.) and passed on by the host and platform. For example, accessibility aids and the user agent need to be able to determine an author's intended reading and navigation order, accessibility aids need to be able to determine properties of content elements such as name and keybinding (e.g. to present them to the user) and location (e.g. to click on it if it does not fully support programmatic control).
    • Let things provide methods to support enhanced keyboard UI. Programmatic interfaces need to be defined, and passed on by the host, so that other components can programmatically perform any actions on an element that can be done using a mouse, keyboard, or touchscreen, and so that they can alter the presentation of elements such as to more prominently indicate focus indicated or to make keybinding discoverable,
  • Let users work with multiple things at the same time:
    • Allow content, components and add-ins to coexist and cooperate, even when they were developed independently. For example, allowing them to negotiate allocation of the limited set of possible keyboard shortcuts, allowing to user to transition between the UI of the user agent, content, nested user agents, etc.
    • User agents should be able to prevent things from breaking keyboard access. For example, an embedded object should not be able to trap the keyboard focus or break the methods of exiting that the user is used to, and content should not be able to trap the focus on an input field until the user enters properly formatted data.
  • Let things adapt to keyboard restrictions, conventions, and conflicts:
    • Let things negotiate keybindings, determining and adapting to the set of possible keyboard inputs and which are impossible (e.g. not on the user's keyboard), reserved by the platform (inputs reserved by the host or operating system so they cannot be changed by the component, e.g. Alt+Tab), restricted by convention (e.g. standard keybindings for copy, paste, exit, etc. that should not be changed lightly), or in use (e.g. . For example, allowing them to negotiate allocation of the limited set of possible keyboard shortcuts
    • Let content reflect actual keyboard shortcuts, such as incorporating them into their instructions, even if they are changed to adapt to the environment (platform, user agent, other content, add-ins, etc.).
    • Allow components to conform to the host's keyboard conventions, such as allowing a custom control (e.g. a drop-down list box implemented entirely in Javascript) to set its keyboard commands to match those its host browser provides for its equivalent controls, or allowing a form in HTML to emulate the navigation and access key behavior of native dialog boxes.
  • Let users discover keyboard UI:
    • Let things determine the active keybindings for all components so they can be presented to the user as a training aid or reference. This also allows the user to determine what command a keyboard input is currently mapped to, so they can identify what they accidentally did, or whether it is safe to repurpose a specific keyboard input because its current function is available through other means or not one the user would ever need.


User agents are platforms. Most platforms are designed to support multitasking, but most only support it well when the user is only interacting directly with a single application at a time. That's sufficient for most users, most of the time, but is entirely insufficient for a lot people who rely on assistive technology. Those products often need to modify input or output, or provide global commands that the user can activate regardless of what application or context they're working in, and only many platforms creating such requires undocumented, unsupported, and unreliable hacks simply because the platforms don't provide the necessary infrastructure. As we are defining the future platform architecture we need to do better, not just for assistive technology, but because Web browsers have made add-ins and plug-ins most prominent and popular even among the mainstream.

Unfortunately, most users and developers—including platform and standards developers—only think about a very limited set of input and output options: graphical output with keyboard for text entry and a mouse or touchscreen for navigation, and users interacting directly with only one application at a time. Those limited views do not accommodate the variety of users and their needs.

In reality the user works in an environment made up of many things, including:

  • hardware (e.g. available primary and modifier keys),
  • platforms (e.g. operating systems, GUIs, and window managers that might use or reserve keys),
  • user agents (e.g. web browsers),
  • content, including documents (e.g. HTML pages) and web apps (e.g. dynamic HTML) rendered in a browser,
  • browser extension and plug-ins (e.g. components that modify browser UI or content, or act within the browser on behalf of external utilities),
  • nested user agents (e.g. embedded media player), and
  • external utilities (e.g. accessibility aids that don't run within the browser, yet examine, modify, or provide alternate input or output for browsers and the content they render—and usually other applications as well).

These things often nest and coexist, as in:

  • Content hosting content, e.g. HTML rendered by the browser uses iframe to host HTML also rendered by the browser
  • Content hosting user agents, e.g. HTML rendered by the browser hosts a media player
  • User agent hosting multiple content frames, e.g. browser showing window split between two web pages, which could be static documents or web apps that may or may not interact with the content of other frames
  • User agent hosting add-ins, e.g. add-in creates sidebar providing separate view of or interacting with content being viewed in the browser, and creates its own keyboard commands that may function globally within the browser window or entire browser session
  • User agent interacting with external accessibility aids, e.g. a screen reader providing keyboard shortcuts to read portions of the text

Keyboard Commands

There are several types of keyboard commands:

  • Basic keys are simple keys that have dedicated functions (e.g. the A key enters the letter A; the Del key deletes something, be it a character, a file, contents of a cell; the Left arrow key moves something to the left, etc.)
  • Shortcut keys:
    • Access keys on some platforms are part of the visible label that can be used as a quick way to activate and/or navigate to the associated UI element (e.g. in a dialog box, an underlined S on the Save button indicates that you can press S to activate the button if the focus is not in a text input field, or Alt+S to activate it even if the focus is in a text input field)
    • Hotkeys trigger an action without necessarily triggering any particular UI element (e.g. Ctrl+S saves the current document works even if there is no Save menu item or it is not visible, and F1 brings up help).

Keyboard commands and shortcuts are particularly important for people with disabilities:

  • Many users with disabilities rely entirely on the keyboard or keyboard emulators because they have difficulty manipulating a mouse, seeing a screen, etc.
  • Users who cannot use a mouse often increase the number of shortcut keys in order to make tasks more efficient, especially people for whom each keypress is time-consuming, tiring, or painful.
  • Many users rely on accessibility aids that use their own set of keyboard commands, which can conflict with those defined by user agents, add-ins, or content.

While users with disabilities may need more keyboard shortcuts, the possible shortcuts are limited by:

  1. limitations in the hardware and system (e.g. no Command key on Windows or Windows key on a Macintosh, no Theta on non-Greek keyboards, no numeric plus key on compact keyboards),
  2. need to avoid conflicts with keys reserved by the platform and the user agent's user interface (e.g. Alt+Space displays the window menu on Windows, or Spotlight on OS X),
  3. need to avoid conflicts with keyboard conventions (e.g. if the user expects Ctrl+C to be Copy everywhere on the platform, content and applications can but shouldn't override that to make it trigger a different action),
  4. user agent limits on shortcut characters (e.g. whether the browser limits them to letters, allows compound characters, etc.), and
  5. user agent and format limits on using unmodified keys, different modifier keys, and key sequences (e.g. HTML4 and HTML5 accesskey only allows specifying a single base character)

Ways around these limitations include:

  1. allowing alternate modifiers (e.g. Ctrl+E does one thing, Ctrl+Shift+E another),
  2. allowing key sequences in addition to key combinations (e.g. Ctrl+E,a does one thing, while Ctrl+E,b does another), and
  3. allowing unmodified keys (e.g. F12, or the letter A), although this presents risk for users relying on alternate or automated input, or who have trouble perceiving mode-change indicators while working

We want content and nested user agents to register their keybindings with their host in order to:

  1. allow negotiation to avoid conflicts (e.g. web app changes bindings that conflict with the browser)
  2. allow the user agent and tools to provide enhanced UI (e.g. keybinding lists and reconfiguration)

Shortcut Conflicts

It's worth noting that direct activation and direct navigation commands (shortcut keys) come in two flavors, which we'll call:

  • Access keys provide a quicker way to activate and/or navigate to a UI element in your current context, but they should never be the only way.
    • In some contexts accelerator characters may be usable without any modifier or prefix keys (e.g. S to press the Save button in a dialog box if the focus is not in a text input field) and/or with certain modifier or prefix keys (e.g. Alt+S to press the Save button in a dialog box even if the focus is in a text input field). However, neither work when is the focus were in another context, such as an active menu.
    • HTML implements this using the accesskey attribute, which can be put on nearly any element.
  • Hotkeys trigger an action without necessarily triggering any particular UI element (e.g. Ctrl+S to save the current document works even if there is no Save menu item or it is not visible, although it still wouldn't work if the focus is in another context such as an active menu).
    • The platform should not change these because there is no UI informing the user of the change.
    • HTML5 implements this with the command element, which lets the author associate an accesskey attribute with any scripted action.

Note: Hotkeys are often called shortcut keys, but I'm avoid that term because the HTML spec uses it to include both access keys and hotkeys.

How shortcut keys conflict

One difficulty with hotkeys is that you can have multiple sets active at the same time. For example, the browser might define Ctrl+F as Find, while an add-on defines F12 as displaying a particular sidebar, and the active document might define Ctrl+F as moving focus to one of its input fields. On the other hand, hotkeys can be disabled when the focus is in another context, such as when a menu is active.

By contrast, access keys are designed so that, for the most part, only one set can be active at any time. For example, when typing in a word processor the menu bar is visible and only the access keys for its menus work, along with the access keys of controls in the document. However, when you display the File menu only the access keys for its menu items, and when a dialog box is active, only the access keys for its controls work.

Unfortunately this can break down at times, such as when the active window has a menu bar and also contains controls; in those cases it's often undefined what should be the proper behavior if the access keys for a menu conflicts with that of a control. When an application designer controls both the menu and the window content they can choose access keys that don't conflict, but if the application designer isn't in charge of the window content (e.g. developing a word processor that can show documents with embedded controls) or a content designer isn't in charge of the menu (e.g. authoring a web page that can be viewed in a variety of browsers) it is difficult to avoid conflicts.

Three solutions for conflicts

One solution is to use different modifier keys for application vs. content; Firefox 4 does this by using the Alt with access keys in the application but Shift+Alt with access keys in content. This avoid conflicts between application and content, but it's not perfect because users have to learn a new method of invoking access keys and constantly switch back and forth depending on context. It doesn’t seem to terrible to make the user learn that content in a particular browser work in a different way than applications, but it can be a problem if an author can hide the browser's window controls, in which case the window may look like a native application window but still function like browser content.

Another solution would be for the application to simply not have access keys, but that obviously reduces usability of the application.

Neither of these methods prevent conflicts when using multiple pieces of content (e.g. with iframes) or multiple application components (e.g. a browser and its add-ins).

A third method is to allow access keys to conflict, but ensure that it doesn't make anything unavailable. For example, if there are multiple items with the same access key, pressing the access key can merely move the focus to the next item in that set without actually activating it. The user can then use the access key to quickly navigate between the items, and another key (such as Enter) to activate the one with the focus. This system is how Windows handles conflicts within a menu bar, menu, or form, and so it's already familiar to many users. It decreases efficiency (as things take more keystrokes) and usability (because access keys can behave differently depending on the combination of application, add-ins, and documents), but at least all the functionality is still available and the number of additional keystrokes is usually very small; in that way it's far better than having one access key win and the others be ignored, which could multiply the number of keystrokes required for a task many, many times.

This method does not work for hotkeys as they may not have any UI to move focus to, and so no way to give users feedback when the hotkey key didn't have its expected effect.


The most significant change for keyboard support in HTML5 is the introduction of the command element, which can be used to create menu items with access keys and also to associate a hotkey with some other scripted action.

The introduction of native drag and drop also provides an opportunity to improve keyboard emulation of this activity for cases where the content doesn't implement an equivalent mechanism for the keyboard.