- Important note: This Wiki page is edited by participants of the User Agent Accessibility Guidelines working group (UAWG). It does not necessarily represent consensus and it may have incorrect information or information that is not supported by other Working Group participants, WAI, or W3C. It may also have some very useful information.

HTML5 review by UAWG notes

From WAI UA Wiki
Jump to: navigation, search

This page serves as a repository for last call accessibility issues discovered in the HTML5 Draft. This page will be edited and submitted to the overall WAI HTML5 comments.


NOTE: PF Wiki for HTML5 tracking

New Attributes on <input>: pattern and placeholder (This section in progress by mhakkine)

HTML5 includes new input types and attributes, such as type="tel", pattern, and placeholder. These new types and attribute have usability and accessibility implicates, and guidance within the HTML5 specification is at times contradictory.

The pattern attribute

The pattern attribute specifies a regular expression against which the control's value, or, when the multiple attribute applies and is set, the control's values, are to be checked.


The pattern attribute value, in regular expression notation, is not suitable as a hint or description for end-users. The HTML specification describes methods for presenting the pattern to end-users:

When an input element has a pattern attribute specified, authors should include a title attribute to give a description of the pattern. User agents may use the contents of this attribute, if it is present, when informing the user that the pattern is not matched, or at any other suitable time, such as in a tooltip or read out by assistive technology when the control gains focus.

The specification goes on to state:

When a control has a pattern attribute, the title attribute, if used, must describe the pattern. Additional information could also be included, so long as it assists the user in filling in the control. Otherwise, assistive technology would be impaired.

For instance, if the title attribute contained the caption of the control, assistive technology could end up saying something like The text you have entered does not match the required pattern. Birthday, which is not useful.

UAs may still show the title in non-error situations (for example, as a tooltip when hovering over the control), so authors should be careful not to word titles as if an error has necessarily occurred.

The lack of a keyboard accessible mechanism for displaying title content within all major UAs prevents keyboard users from accessing the pattern description. The statement "Otherwise, assistive technology would be impaired" fails to address the actual accessibility implications.

The use of the title, implied in the second paragraph above, as text for an error message, implies processing by the UA or assistive technology, and is in effect a special casing of the title attribute.

The definition of the placeholder attribute also contradicts the recommendation of using title.

User understandable rendering of Element Names and Roles

The addition of new semantic elements and roles within HTML5 implies that the elements must be understandable by users of assistive technologies. Authors can use a variety of techniques to supply human readable labels that name or describe the element. In cases where the UA does not identify an authored label, the UA is suggested to announce the element name. For example, the <details> element has an optional <summary> element that provides a label for or summary of the content of the detail. If the summary element is not used, the HTML5 specifications states:

The first summary element child of the element, if any, represents the summary or legend of the details. If there is no child summary element, the user agent should provide its own legend (e.g. "Details").


To support internationalization, semantic element names or roles, reported by the UA in lieu of an author specified replacement text, must be localized by the UA prior to any visual rendering and before passing to the Accessibility API.

Miscellaneous Comments on Specification (or applying Principle 3, Understandable)


checkedness is used 44 times in the HTML5 specification, but not defined. What is the definition of checkedness?

Global Attributes (Questions and Concerns)

Simon Harper 08:06, 17 June 2011 (UTC)

The following are notes and concerns for primarily UAWG but with notes to ATWG and HTML5 if the PFWG decide they can be integrated.

3.2.3 Global attributes

  • accesskey - By only allowing an accessskey for be one unicode character we remove the possibly of sequenced entry such as Alt+F S for file save. Also should we have a Web Key (think Windows or Apple key) which give focus to accesskeys first before chrome? Myself and mhakkine have had an email discussion about this which should inform any decision titled 'HTML5 Sanity Check - before it goes to the Wiki' http://lists.w3.org/Archives/Public/w3c-wai-ua/2011AprJun/0068.html
  • Class - looks OK
  • contenteditable - Does, contenteditable equal to true, mean that the content/UA now needs to conform to ATAG?
  • contextmenu - typo 'by the invoking the'; Does the 'show' event modify the DOM (cannot find a definition) - if not how will AT 'see' contextmenu?
  • dir - looks OK
  • draggable - looks OK
  • dropzone - looks OK
  • hidden - '— if something is marked hidden, it is hidden from all presentations, including, for instance, screen readers.' Do we need the ability to see a hidden presentation if the user so wishes?
  • id - looks OK
  • lang - looks OK
  • spellcheck - looks OK
  • style - looks OK
  • tabindex - looks OK but 'an element that is only focusable because of its tabindex attribute will fire a click event in response to a non-mouse activation (e.g. hitting the "enter" key while the element is focused' - seems right but thoughts?
  • title - 'If this attribute is omitted from an element, then it implies that the title attribute of the nearest ancestor HTML element with a title attribute set is also relevant to this element. Setting the attribute overrides this, explicitly stating that the advisory information of any ancestors is not relevant to this element. Setting the attribute to the empty string indicates that the element has no advisory information.' Do we need to say the nearest ancestor HTML element with a title attribute set is automatically used as the uaag repair default? Myself and Jim Allan have had an email discussion about this which should inform any decision, again, titled 'HTML5 Sanity Check - before it goes to the Wiki' http://lists.w3.org/Archives/Public/w3c-wai-ua/2011AprJun/0069.html

Global accessibility settings

Added by Greg Lowney 21:13, 12 July 2011

Issue: Should the HTML5 spec define a standard, platform-independent way for content to query the user agent's accessibility settings, and by extension platform settings that are known to the user agent? Are there any equivalents today?

Use case: All major operating systems support a "high contrast" mode that tells software the user wants high contrast between foreground and background. Yev turns on this option, and in his browser he loads a web-based flow chart editor that displays all its document content in an HTML5 canvas element. The flow chart editor wants to detect when the user has high contrast mode turned on so it can adjust its graphical display appropriately. Because it's designed to run on any browser and any operating system, it needs a platform-independent means of querying this setting.

Use case: Kevin has turned on the "Show Extra Keyboard Help" option in the Windows Control Panel, which tells all software that he wants any and all options that enhance keyboard access to be automatically enabled. His web browser responds to this setting by, for example, always displaying the underlined access keys in menu and control labels. He would like web pages and web apps to also respond to this setting, even if they're creating custom controls.

Keyboard Use Cases and Recommendations

Added by Greg Lowney 19:21, 6 July 2011 (UTC)

  • Complete and efficient keyboard access is critical for accessibility.
  • We examine high-level things that web protocols and formats can do to enable good keyboard UI:
    • Let users accomplish any task using the keyboard alone
    • Let content coexist and adapt to a wide range of user agents, browser add-ins, nested user agents, other content, and assistive technologies
    • Let the user take advantage of the widest range of keyboard commands and shortcuts
    • Provide the information needed to enable a wide range of keyboard features
    • Let the user retain ultimate control of their experience
    • Protect the user from badly behaved content and nested user agents
  • We present a number of specific topics with use cases, issues, and recommendations, as well as topics that have no clear recommendations.
  • Some of these are already covered by the latest HTML5 draft, while others are not.

Background and concepts used here are discussed in a separate page, Keyboard Concepts for HTML5 Discussion.


Sequential navigation to all elements that take focus or input

Users need to be sure they can explore and find all focusable and actionable elements, even if they cannot use a mouse.

  • Use case: Laurie is tabbing through a dynamic web page, but finds that there are certain buttons she cannot reach because the author, thinking only of mouse users, has specified that the buttons should not be included in the tab order by setting tabindex to a negative number. Therefore Laurie, who relies entirely on keyboard input, cannot access some functionality on the page.
  • Use case: Laurie is using a web page that contains a custom control, an image that does not take keyboard input or focus but does have an onClick handler. Therefore Laurie, who relies entirely on keyboard input, cannot click on the element to activate it, and even though her browser provides a context menu that would let her activate the image's onClick event, it does not let her move focus to it because that would violate the HTML5 specification.
  • Recommendation: Specification should explicitly state that user agents are allowed and encouraged to provide modes or commands that let the user move focus to all elements that take focus or input, even if the author has indicated that the element should not normally be included in sequential navigation, and even if the element takes input (e.g. has an onClick handler) but lacks other attributes that would normally render it focusable.
  • HTML5 Status: The current HTML5 specification (7.3.1 Sequential focus navigation and the tabindex attribute) says that if the tabindex value is a negative number, "The user agent must allow the element to be focused, but should not allow the element to be reached using sequential focus navigation." I'm not sure whether the use of "should" rather than "must" means that the user agent is allowed to include these in sequential navigation, or whether it is still forbidden. Likewise the spec says that if the tabindex value is a zero the user agent "must" allow the element to be focused, but only "should" allow the element to be reached using sequential focus navigation; if the user agent doesn't provide the ability to sequentially focus all elements that take input, then the spec should be changed to read that elements with tabindex of zero "must" be included in sequential navigation.

Navigation to and through non-editable content

  • Issue: Is there anything that HTML5 should do to facilitate this feature? I haven’t thought of any. It seems like the user agent can do everything it needs without any explicit support in the source language.
  • Use case: Wayne needs to select and copy some content from a Web page. Pressing the Tab key would normally move the focus between controls, links, frames, and the browser UI, but it would not stop at blocks of read-only text and images. For this task Wayne turns on his brower's "caret browsing mode," which adds each block of read-only content to the tab order. He can then move focus to the appropriate block, move the text cursor through it, select a range, and copy it to the clipboard or invoke its shortcut menu, all using the keyboard.

Preventing validation from trapping focus

  • Use case: Svetlana is tabbing through the controls on a form and lands on a field that expects a telephone number, but when she tries to tab away the user agent puts up an error message saying that a valid telephone number is required. Even though she had no intention of completing the form, she is stuck until she makes up and enters a telephone number.
  • Use case: Etta brings up a web form showing her account information. As she tabs between the fields she lands on one containing her current password. Unfortunately, the security requirements for this web site have recently been increased and her password is no longer considered secure enough, so even though she's only tabbing through the fields, the current value fails to validate, and she is prevented from tabbing on with its current value or by clearing the field.
  • Recommendation: Any field that validates input should allow the user to exit the field. At minimum, allow them to exit with the field being empty or retaining its initial value.
  • HTML5 Status: Unknown

Reading and navigation order

Authors should be able to specify preferred direction and/or order for sequential navigation, even among things such as tables that would not normally have a tab order.

  • Use case: Masahiko is reading a web page, and uses browser commands to move the text cursor to the next and previous paragraphs. In most cases this works fine because the suggested reading order is that in which elements occur in the HTML. However, when Masahiko encounters a table that is designed to be read down the columns rather than across the rows, this simplistic navigation is entirely inappropriate. A similar problem occurs when CSS is used to rearrange blocks of text on the screen.
  • Recommendation: Allow marking up a table to indicate whether the preferred reading order is by columns, by rows, both, or neither. This could be done with a new attribute, such as orientation="columns".
  • Recommendation: Allow marking up any element with a reference to the logically next and/or previous elements, for use when those are not the next/previous elements in the source. This could be done with new general attributes, such when a browser was implementing caret browsing and the caret moved beyond the end of a paragraph marked up with next="story5", that attribute would be a hint to the browser to move the caret to the element with id="story5" rather than to the element that follows the paragraph in the HTML source.
  • HTML5 Status: HTML5 allows the user to mark up tables with column and row headings, but not specify whether the table's preferred reading order is by rows or by columns. HTML5 also provides the tabindex attribute to specify order, but it is optimized for a relatively small number of controls, and applies to controls that normally take keyboard focus but not to anchors or to text and other elements that are subject to navigation in caret browsing modes.

Facilitate navigating related pages

  • Use case: Jason wants to reduce the number of keystrokes he enters, so when reading web sites he doesn't want to have to tab through all the links and controls on a page just to use the link that takes him to the next page. Instead, he uses his browser's keyboard commands that load the next, previous, first, last, and main pages based on the link elements in the current page's header. Unfortunately, a significant number of site—including some major news sites—fail to provide these elements, so Jason is forced to tab his through their pages.
  • Recommendation: Provide a way to mark up a link or control to identify it unambiguously as leading to the next, previous, first, last page, etc. This could be an attribute on a link (e.g. rel="next") or other mechanism. Even though this would be redundant to the existing link elements, it may increase the number of sites that support automated navigation shortcuts.
  • HTML5 Status: HTML5 doesn't add anything beyond the existing link elements.

Shortcut Keys

Shortcut keys consist of both access keys (e.g. S or Alt+S to activate the Save button, or Alt+F to activate the File menu) where the user agent is trying to emulate behavior of the platform, and hotkeys (e.g. Ctrl+S to trigger the "save" action, Ctrl+Shift+S to trigger the "save as" action, or Ctrl+C to trigger the "copy" action). See detailed discussion above.

We want content and nested user agents to register their keybindings with their host in order to:

  1. allow negotiation to avoid conflicts (e.g. web app changes bindings that conflict with the browser)
  2. allow the user agent and tools to provide enhanced UI (e.g. list and/or modify keybindings)

Negotiating shortcut keybindings

There should be a mechanism for components (user agents, documents, web apps, embedded objects, accessibility aid, etc.) to negotiate which keyboard commands will be used by one or the other.

  • Why is this an accessibility issue:
    • Users with disabilities are much more likely to rely on keyboard access. For them, keyboard conflicts might present insurmountable barriers, while they'd only be minor inconveniences for users routinely using the mouse.
    • Users who cannot use a mouse often need to drastically increase the number of shortcut keys in order to make tasks more efficient, especially people for whom each keypress is time-consuming, tiring, or painful. Increased number of shortcuts increase the number of potential conflicts.
    • Users with some cognitive impairments have more difficulty adjusting when their accustomed methods suddenly fail to work, or when commands they use suddenly do something unexpected.
Negotiation between host and embedded object
  • Use case: An embedded object uses Shift+Esc for one of its control functions, but it's run on a browser that uses this same key combination as the method for returning focus to the browser. In the simple cases, either the user wouldn't be able to exit the object using the method they're familiar with, or else they couldn’t use some function in the object because the keystroke would exit the object instead.
  • Recommendation: I'd say that it is important that the user have a consistent way to exit all embedded objects, because without this they can become effectively trapped; even if there is a way out, the user may not know it or be able to look it up when needed. The embedded object needs to be able to determine that on this particular browser it cannot use a particular command (e.g. Shift+Esc) and adjust its command set, user interface and instructions accordingly. If the user does exit the embedded object using the host's command, the host should inform the embedded object so that it can "clean up" and handle the action gracefully.
  • HTML5 Status: Unknown
Negotiation with nested hosts
  • Use case: Pablo is used to pressing Ctrl+W to close a browser window. In his browser he's reading a page that contains an embedded user agent, and while browsing in THAT context he presses Ctrl+W to close the window. Unfortunately, the script being run by the inner user agent did not know that Ctrl+W was used by the outer user agent, so it grabs and consumes the keyboard input and carries out some action, so Pablo is unable to use his accustomed method to close the outer browser's window.
  • Recommendation: There should be some way for embedded objects, including nested user agents, to determine which shortcut keys are being used all the layers hosting it, so it can modify its own shortcut keys to avoid conflicting with them.
  • HTML5 Status: Unknown
Negotiation between host and content
  • Use case: Pablo is used to pressing Ctrl+F to find a string on the current page. However, an online encyclopedia grabs Ctrl+F and moves the focus to its own text input fields that's used to search the entire encyclopedia.
  • Recommendation: The script on the page should be able to query whether Ctrl+F is already assigned to something in the system (the host browser, a browser add-in, etc.), and if it is it can identify a new, unused keyboard input, map that to its control, and incorporate that into the instructions it presents to the user.
  • HTML5 Status: HTML5 allows the content to provide a list of suggested Unicode characters, but the user agent gets to decide on the actual key assignment, including base character and modifier keys. The content script can retrieve a user-friendly string representing the assignment. (That's enough for this use case but is too limited for some of the others.)
Disabling unmodified keys as shortcuts

Some components use unmodified letters, numbers, and punctuation marks as keyboard command. This can be handy for users who want to make keyboard input as efficient as possible, including some users with disabilities, but for other users with disabilities it can be a significant problem because everyday text input can trigger a large number of seemingly random actions if it's entered in the wrong context. Therefore user agents should be permitted to make this available as a user option.

  • Use case: Tom uses speech recognition to input text and commands, and he's working in a Web-based word processor while in the background another Web app or browser add-in is downloading a large file. He's in the middle of dictating a letter the background task steals the activation to notify him that the download has completed. Suddenly the text that was supposed to go into a letter is interpreted in the new context as dozens of commands. Tom looks at the browser and finds that his project in that context has been altered or deleted altogether, and also that display options have changed and he has no idea what command would server to restore them.
  • Recommendation: Tom goes into his browser's preference settings and clears a check box to disable the use of unmodified keys as commands and shortcuts. When the Web app starts up, it asks the browser whether the letter "u" is available as a shortcut and is told that it is restricted by policy. Therefore the app goes down its list of preferred keystrokes, determines that Ctrl+U is available, and configures itself to use that instead. It may even display an indicator on its status bar warning the user that non-default keyboard commands are being used. The user can then go into the app's configuration screen to find out the current keybindings.
  • HTML5 Status: Unclear. The example in 7.4.2 shows that a user agent can use a key unmodified, but 7.4.3 merely says the user agent can assign its choice of "a combination of modifier keys", but does not specify whether no modifier key is a valid option.

Retrieving actual keybindings

If the author can only suggest keyboard shortcuts using accesskey, how can they provide instructions to the user?

  • Why is this is an accessibility issue?
    • Many users with disabilities rely entirely on the keyboard or keyboard emulators because they cannot physically manipulate a mouse, or see the screen, etc.
    • Many users have difficulty memorizing keybindings, and this is more extreme for users with some cognitive impairments.
    • Memorizing keybindings is more difficult when they are likely to be adjusted to avoid conflicts, and such adjustments are more common among people with disabilities. Users who cannot use a mouse often increase the number of shortcut keys in order to make tasks more efficient, especially people for whom each keypress is time-consuming, tiring, or painful, and many users rely on accessibility aids that use their own set of keyboard commands.
Retrieving user-friendly keybindings
  • Use case: In a web browser, Aaron views a Web page that has a button with accesskey="E". The author wants to incorporate instructions on the page or popup that explain the keyboard commands, but unfortunately they can't predict what keybinding the browser will use: the accesskey attribute is merely a suggestion, and the actual value will vary depending on both the browser and the platform it's running on.
  • Recommendation: HTML5 defines the new command object with property called assignedaccesskey. To solve this we could define a new property, accessible through the DOM, that returns the actual keybinding associated with an element. In one potential implementation, the page script retrieves the anchor's accesskeystring= value, which Firefox 4 would set to "Shift+Alt+E", but Internet Explorer would set to "Alt+E", Opera would set to "Shift+Esc, E", Konqueror would set to "Ctrl+E", Safari 4 on MacBook Pro would set it to "control+E" but on Windows would set it to "Alt+E", etc. The script can then insert this string into the instruction paragraph on the page so that on Firefox it would read "To delete, press Shift+Alt+E", but on Internet Explorer it would read "To delete, press Alt+E", and so forth.
  • HTML5 Status: It looks like HTML5 defines a new command element with a programmatically-retrievable accessKeyLabel attribute, whose value the user agent calculates based on its accesskey attribute. The accessKeyLabel string is presumably human-friendly, but no specific guidance or examples are provided. (This also means that the string is cannot be parsed by the script, as discussed below.)
Retrieving machine-friendly keybindings
  • Use case: Aaron asks his web browser to display a list of the currently active keyboard shortcuts. In the draft HTML5 specification it would do this by enumerating the command elements and retrieving each one's accessKeyLabel property. The user agent wants to make the list more useful by offering views sorted and organized in different fashions, including a view that includes separate groupings for unmodified keys, Ctrl key combination, Shift key combinations, and Ctrl+Shift key combinations. Unfortunately, it cannot easily parse the accessKeyLabel string, as it knows neither the names the user agent will use for modifier keys (e.g. "Ctrl", "control", "?", etc.) nor what method will be use to concatenate them (e.g. "Ctrl+a", "Ctrl/A", "^A", "A+Control", "?A", etc., all of which may depend on the platform, user agent, language, and/or locale).
  • Recommendation: I see two reasonable approaches. The first, analogous to that used in most native programming environment, would be for a property to return a programmatic representation of the keybinding, such as a list of codes representing keys and their modifiers (e.g. an integer representing the F5 key along with a mask of bits representing Alt and Shift modification). However, this should not be a substitute for returning a user-friendly string representation, as scripts script could have a difficult time creating one from the programmatic representation. The machine-friendly version could either be returned by an element using a property analogous to accessKeyLabel, or the user agent could be required to provide a function that converts the string returned by accessKeyLabel into a machine-friendly form. The second approach would be to have a different function analogous to accessKeyLabel that returned a string compounded from strings that are language-, locale- and platform-neutral (e.g. "Ctrl+A" for the combination of the "A" key and the equivalent of the control key, even in environments where the latter is referred to as "control").
  • HTML5 Status: This is not supported; currently only a human-friendly string is returned (via the accessKeyLabel property).

Maximizing potential keyboard shortcuts

Specifying detailed hotkeys

Today's accesskey attribute is a single character, and it's up the browser to decide which whether that character will be used or another substituted, and what modifiers if any are required (e.g. accesskey="s" might map to Ctrl+S on one environment and Alt+Shift+S on another). That may make sense for access keys (e.g. S or Alt+S to press the Save button) where the user agent is trying to emulate behavior of the platform, but it is an unnecessary limitation for hotkeys (e.g. Ctrl+S to trigger the "save" action, or Ctrl+C to copy the current record). (See detailed discussion of access keys vs. hotkeys above.)

When defining hotkeys developers of Web apps and authors of documents should be able to assign more specific keys, such as actually requesting Ctrl+I for one a frequently used command and Ctrl+Shift+I for another less frequently used. This would let them assign shortcuts in a way that is more meaningful, both in terms of grouping and being easier to remember. For example, a page for reading email could assign shortcut Ctrl+R to the Reply button and Ctrl+Shift+R to the Reply All button, Ctrl+J to the Forward button and Ctrl+Shift+J to the Forward As Attachment button, and so forth. (Of course any specific key combination may already be used by the host or another component in the system, and some keys may not be available on the user's system, so the browser would either treat these as hints to be modified as needed or else the page's script could negotiate with the user agent as described elsewhere in this document.)

  • Use case: Roger is using a Web-based email client that has a row of buttons for things he can do to the selected message. The author wanted to make keyboard usage as efficient as possible and minimize the number of keystrokes that users such as Roger need to enter, so they assigned easily typed keyboard inputs to the most commonly used commands and more complex inputs to less frequently used commands. In this case, it assigns the shortcut Ctrl+R to the Reply button and Ctrl+Shift+R to the Reply All button, Ctrl+J to the Forward button and Ctrl+Shift+J to the Forward As Attachment button, and so forth.
  • Recommendation: Facilities to establish and adjust keybindings should allow content, add-ins, and the user to bind commands to specific key combinations, rather than merely specifying a single base character. HTML would let the author specify preferred keyboard inputs for accesskey and the like including recommended base keys and modifiers. For example, accesskey="ctrl+shift+i", using standardized, non-locale-specific names for keys and modifier keys. There should also be way for the author to request that a key be unmodified, as discussed below in the section "Unmodified Keys as Shortcuts". These recommended keyboard inputs would be modified by the browser to accommodate impossible, reserved, restricted, or conflicting inputs. For example, when an HTML5 command element is used to create a menu item and the user agent wants to emulate native access key behavior on the Windows platform, it would limit the access key to a single character that could be used alone or with the Alt key depending on the situation, but when a command element is used to establish a hotkey for a scripted action the user agent could allow a wider range of modifiers and/or sequences.
  • HTML5 Status: Currently the author can only specify a single, unmodified Unicode code point, and the user agent assigns any combination of modifier keys it chooses with no further hinting from the author.
Sequences as shortcuts

Allowing key sequences to be used as commands greatly increases the number of keyboard commands that can be used, as well as the ability to make these command mnemonic.

  • Use case: Jemiah is a keyboard user who wants to make her work as efficient as possible. She configures her Web-bases word processor to set up shortcut keys for the dozens of the built-in commands and macros she uses frequently, especially those that would normally take many keystrokes to carry out. However, she quickly runs out of good keystrokes, both because of the sheer number and because she needs to keep them mnemonic in order to easily remember them. Therefore she uses key sequences so she can group related commands together with the same prefix. For example, she uses Ctrl+R as the prefix for all the commands dealing with revisions, with Ctrl+R followed by S to show revisions, Ctrl+R followed by H to hide revisions, Ctrl+R followed by A to accept revisions, Ctrl+R followed by R to reject revisions, and so forth.
  • Recommendation: Facilities to establish and adjust keybindings should allow content, add-ins, and the user to bind commands to key sequences as well as single keys and key combinations. HTML would let the author specify preferred keyboard inputs for accesskey and the like including key sequences, such as accesskey="ctrl+r a" for Ctrl+R followed by the letter A.
  • HTML5 Status: Currently key sequences are not supported as shortcuts. [7 User interaction — HTML5 http://www.w3.org/TR/2011/WD-html5-20110525/editing.html#assigned-access-key: If the value is not a string exactly one Unicode code point in length, then skip the remainder of these steps for this value.]
Enabling unmodified keys as shortcuts

Some components use unmodified letters, numbers, and punctuation marks as keyboard command. This can be handy for users who want to make keyboard input as efficient as possible, including some users with disabilities, but for other users with disabilities it can be a significant problem because everyday text input can trigger a large number of seemingly random actions if it's entered in the wrong context. Therefore user agents should be permitted to make this available as a user option.

  • Use case: Reggie finds it difficult to press key combinations, and even though many platforms provide the StickyKeys feature that lets her simulate them, she wants to author her own web site so that the shortcuts on the pages are accessed with single keys rather than key combinations.
  • Recommendation: See section on Specifying Detailed Shortcuts.
  • HTML5 Status: See section on Specifying Detailed Shortcuts.
Allowing non-character keys as shortcuts

Allowing non-character keys such as F12 and Del to be used as shortcuts significantly increases the number of keys and key combinations available.

  • Use case: Jeanine is developing a Web app and because most of the normal character keys are taken, she wants to have the F12 key activate the "Exit" link on the page, and Shift+F12 activate the "Save and Exit" button.
  • Recommendation: Provide a way to use keys that cannot be represented as a single Unicode code point. For example, define keywords such as "del", "num-del" and "f12", and allow their use in the accesskey attribute along with normal Unicode characters (e.g. accesskey="f12 del d").
  • HTML5 Status: Currently non-character keys such as F12 cannot be assigned as accesskey shortcuts, as it can only be keys equating to single Unicode code points. 7.4.3 Processing model

Shortcuts for Navigation

Navigation shortcuts without visible links
  • Use case: Jeanine is creating a web page, and wants to define a shortcut that would assist users with disabilities by moving the keyboard focus and point of regard to a specific bookmark in the page. However, she doesn't want to have a link to that location be visible on the page.
  • Recommendation: It should be possible to define a keyboard shortcut for the sole purpose of navigation, rather than activation. It should be possible to target essentially any element in the document, and the command should be defined as navigating to without activating the element.
  • HTML5 Status: Unlike HTML4, HTML5 allows accesskey on all elements, and that it creates "a keyboard shortcut that activates or focuses the element". However, it leaves the actual behavior undefined and thus up to the user agent, meaning that a user agent could implement it so that the keyboard command does nothing if the element does not take input. This should be clarified.
Distinguishing activation from navigation shortcuts

HTML4's specification for accesskey says "Pressing an access key assigned to an element gives focus to the element. The action that occurs when an element receives focus depends on the element." In actuality, whether the accesskey moves focus to and/or activates an element varies from one user agent to another. This means the same content or web app behaves differently in different browsers, and in some cases functionality may not be available at all.

  • Issue: Is it worthwhile to define a mechanism whereby authors would want to be able to specify separate shortcuts for activating vs. navigating to an element, or one but not the other? Or do we merely expect user agents to provide some navigation mechanism for activation and navigation without an input from the content?
  • Issue: Should the user be able to activate a button and afterward have the focus remain (or be restored to) where it was? Would this be the author's choice and/or the user's?
  • Use case: Juan is viewing a web page which has a link to a document, and assigns an accesskey to a link. In one browser, Juan can press the accesskey to move the focus to the link, then press a key to display the link's shortcut menu, then select the command to show him the destination and title attributes of the link. However, on one browser pressing the accesskey activates the link, taking him away from the page he was on, so he cannot use the accesskey to navigate to the link, instead having to press the tab key a dozen times.
  • HTML5 Status: The HTML5 spec provides even less guidance than that for HTML4, saying merely "The accesskey attribute's value is used by the user agent as a guide for creating a keyboard shortcut that activates or focuses the element."
User choice between navigating and activating
  • Use case: Roger is using a Web-based email client that has a row of buttons for things he can do to the selected message. Roger can flag a message by clicking the Flag button or pressing the button's shortcut key. When he does either, the focus is left on the button, so after doing this he almost always has to navigate back to the message list before he can use the arrow key to select the next message. He would much rather have focus remain on the message, or move to the next message after activating the Delete button.
  • Recommendation: Let the author specify separate keyboard shortcuts for activating and navigating to an element. For example, HTML could replace or supplement the accesskey attribute with separate activationkey and navigationkey attributes; pressing the former would activate the element without moving focus, while pressing the latter would move the focus to the element without activating it, and if the values were the same then pressing the key would move focus to and activate the element.
  • HTML5 Status: Currently it appears that command elements can only be used for activation, not for navigation. It is implied, but not explicitly stated, that the target control should be activated without moving the focus.
Facilitate organizing keyboard commands

Added by Greg Lowney 04:39, 28 July 2011 (UTC)

Defining generic commands that have associated keybindings is an extremely powerful mechanism that lets user agents give the user control over keybindings. One use of this is to automatically generate documentation for the user providing a reference and guide to the keyboard commands as they're actually configured. However, a long, unorganized list of keybindings, while better than nothing, is still extremely difficult to use. This could be much easier if the user agent (or tool) could organize the list, as well as allowing the user to filter and navigate it intelligently. This would be possible if the author could provide hints with each command, such as recommended categories or keywords.

Use case: Carlos relies on the keyboard, and command keybindings are very important way for him to perform tasks efficiently. He is using a web-based application, and asks his web browser to present a list of all the commands defined by the web-app, which he can consult and print out for future reference. The browser has already processed all the commands defined in the HTML source, including those created by interactive elements with acccesskey as well as those command elements that associate an action with a keyboard input. Unfortunately, this list is very long, especially if it's combined with the browser's own commands. Luckily, the browser's dialog box contains buttons for sorting the list alphabetically or by category (e.g. commands relating to tables, commands relating to view options, commands for formatting, etc.), and it's able to do that because the author was able to supply a user-friendly name and category (or keywords) for each command. When Carlos wants to look up a command but doesn't know the name assigned to it, or wants to look up a bunch of related commands, he can use the category view, just like those provided in printed software user guides. When he already knows the official name of the command, he can use the alphabetical view to find it quickly. Note that this is particularly important when Carlos moves between different user agents that assign different keybindings to the author's commands.

HTML5 Status: HTML5 lets the author provide a user-friendly name for command elements, but

Recommendation: HTML5 should allow the user to associate a category or, even better, multiple categories or keywords, with each author-defined command. My preferred method would be to allow a keywords or tags attribute on any element that can be used to define a command (that is, all the methods described in section 4.11.5 Commands), as discussed in more detail elsewhere in this document (see 11 Standard pieces of information should be automation-friendly). There would also be value in letting the author define a primary keyword or category, but I think that's less critical than allowing multiple keywords or categories.

Emulating non-keyboard operations

Simulating drag and drop
  • Use case: June is using interacting with Web page that uses HTML5's drag and drop facilities. For users like June who can't use a mouse, the web browser provides a keyboard mechanism that lets her carry out all the drag and drop operations using only the keyboard, including not only the normal drag and drop but also the behavior of drag and drop modified by various modifier keys. June presses the tab key until the focus is on the element she wants to drag (and she may have had to turn on a special mode to add drag sources to the tab order), and from its shortcut menu chooses the "Drag and drop" submenu, then "Select for Shift + drag and drop". She then presses the tab or an access key to move the focus to the drag target, and from its shortcut menu chooses the "Drag and drop" submenu, then "Drop". The browser sends the proper commands to the page's script to simulate all stages of the process, including triggering dragstart, ondrag, ondragenter, ondrop, etc. events.
  • Issue: Does HTML5 provide everything it needs to allow user agent to enable this type of feature?
  • Issue: Can the content tell the user agent what commands are supported, e.g. drag for move, shift+drag for copy? Can there be a way to register these?
  • HTML5 Status: Unknown.

Other Keyboard Issues

  • Handling shortcut conflicts. While we can define mechanisms to let dynamic content, add-ins and the like negotiate to try to avoid conflicts, they are likely to still occur at times. With HTML4 and below, the behavior in such circumstances is left undefined, and consequently is handled differently by different user agents and so cannot be planned for.
    • Issue: Should the author be able to suggest ways to handle shortcut conflicts? If an author could specify that pressing a shortcut would move focus sequentially between the elements that have that accesskey, and wrap, they author could design keyboard UI to take advantage of this in a way that is not otherwise possible. On the other hand, issues such as whether sequential navigation wraps should ideally be under the user's control, since wrapping is a significant advantage to some users (e.g. those who need to minimize the number of input commands) but a disadvantage to others (such as those who may not be able to tell when wrapping has occurred).
  • Partially downloaded content. What if a web page pulls down content as needed (e.g. Google Maps), and elements in the content may have keyboard shortcuts, but some of those elements are in portions of the content that won't be downloaded until needed?
    • Issue: Should the Web page be able to download a complete list of the keyboard shortcuts associated with elements in the content, and have host user agent notify it when those keys are pressed so that it can download and present the corresponding portion of the content? Is this necessary, or since the as-needed downloading is handled by scripting, is it simply the script's responsibility to handle these shortcuts as well?
    • Use case: A web page displays a list of tens of thousands of names in alphabetical order, with a heading for each initial letter, and a shortcut for each heading. Rather than downloading the entire list, it wants to download portions of the content only as they're needed, in response to the user scrolling the window, moving the text cursor through the content, or pressing a shortcut key associated with the headings.
  • Presenting access keys to the user.
    • Issue: Can there be ways to automatically incorporate the shortcuts into the way content is presented? For example, many GUIs underline the shortcut key in a text label, but that doesn't work with browsers that underline all link text. Do most browsers still not present accesskeys defined by the element? Is there anything that could or should be done in the protocols for formats to assist with this?
    • Recommendation: Where the label element is described in the HTML5 spec, it should specifically discuss and recommend use of the accesskey attribute, and give an example of how a user agent may present not only use this value but also present it to the user. For example, when displaying label text a user agent could underline the first occurrence of the accesskey character in the displayed label text (called an implicit designator), or if the character does not appear in the label text, it could append a space and the underlined accesskey character in parentheses (called an explicit designator).
  • Distinguishing between access keys and hotkeys
    • I'm concerned that perhaps HTML5 should distinguish more clearly between access keys, hotkeys, and menu items. If they should behave differently, should the same element (command element) really be used for both? Even if the user agent can distinguish between them, treating them differently, will it confuse authors?

Command element and images should support different resolutions

Added by Greg Lowney 22:40, 27 July 2011

Why allow multiple icons in different resolutions for a page icon but not for a command element icon or an img element?

Use case: Todd is working on a high resolution monitor but has moderately low vision, so he uses his browser's zoom setting so that he can read the text easily and discern the details of images. He goes to a web page that includes multiple buttons implemented using command elements, each represented by an icon. Unfortunately, since HTML5 only allows a command element to link to a single image, the browser has to use brute force methods to enlarge the image, with a result that's blocky and difficult to understand. If the author had been able to link to multiple versions of the image, each optimized for different screen resolutions or sizes, Todd could have been presented with graphical buttons he could understand.

Recommendation: HTML5 should allow the author to link a command element to multiple images optimized for different screen resolutions and zoom ratios.

It would also be beneficial to allow static images (e.g. the img element) to also link to multiple image files optimized for different resolutions, so that pages can better adapt to different screen resolutions and zoom ratios without having to resort to complex scripting. However, it could be argued that there are different priorities for the relative priority of applying this to command elements--which the user always has to identify to interact with them--and possibly static img elements

Facilitate grouping of menu items

Added by Greg Lowney 23:51, 27 July 2011

The description of how user agent should or must handle the menu element and its contents might be overly prescriptive. There are cases where the user agent should be allowed to adjust the presentation, particularly for different screen resolutions or navigation methods, that would require going against the prescribed behaviors. It could also do more to facilitate hierarchical presentation and navigation by allowing labeled groups of items.

Use case: Nadia is blind and using a web browser with a screen reader. The document contains a menu structure created with the HTML5 menu element, and it includes some very long menus with many groups of menu items separated by horizontal rules into various groups or sections. As Nadia uses the down arrow key to navigate through the menu items, she has to pause for each one to be read to her, so traversing a long menu takes a long time and a lot of effort. She would prefer to have the menu presented to her in hierarchical fashion that uses progressive disclosure, so she could navigate through the short list of sections, and then through the short list of commands in the desired section, rather than through one long list of items.

Use case: Aidan is the opposite of Nadia. He uses an alternative input system and input is difficult for him, so he wants to reduce the number of actions he has to take. Therefore he prefers to see all the options visible at once so that he can choose one directly, rather than having to use mechanisms involving progressive disclosure. (He has even invested in a large, high-resolution monitor to support this work style.) Rather than choosing a sub-menu and then items from them, he'd rather have all the sub-menus and their items displayed together. Unfortunately, the HTML5 specification explicitly states that the menu element with a label must be presented as a sub-menu rather than displayed inline.

HTML5 Status: The current HTML5 draft specification has a few problems with this. First, it clearly spells out when user agents should render groups of menu items inline vs. as a submenu, seemingly giving the user agent no leeway to adjust for user preference. Instead, it should note that user agents may override the default presentation described. Second, much to Nadia's dismay, it strongly emphasizes the use of hr elements to divide menu items into groups, rather than recommending methods that allow providing labels for those groups. This prevents the user agent from giving her a meaningful list of sections to navigate through. It looks like some methods are probably supported, such as by using a nested menu element that's labeled by a label element rather than a label attribute, and so would by default be presented inline rather than as a submenu, but they're not discussed in detail or recommended.

Recommendation: The HTML5 specification should note that user agents may override the default presentation described in order to comply with user preferences.

Recommendation: In order to facilitate structural or hierarchical navigation and progressive disclosure, the HTML5 specification should emphasize giving names to groupings, including to groups of menu items.

Tri-State Checkboxes

Added by Greg Lowney 00:31, 28 July 2011

Why is there no true support for tri-state checkboxes (those with states of checked, unchecked, and mixed/undefined)? Not providing this means that authors and developers will still have to implement custom controls and behaviors for an extremely common feature.

Use case: Aidan uses speech recognition for input. When he views an interactive web page or web-based application that uses standard HTML5 controls, his speech recognition program can let him control them using standardized commands, such as checking or unchecking recognized checkboxes by saying "Check italic" or "Uncheck bold". However, when it encounters custom controls or controls with nonstandard behaviors, he has to resort to saying actual keystrokes, such as "Press tab. Press tab. Press tab. Press space." He uses a web-based text editor that provides a tri-state check box that is checked with the entire selection is italicized, unchecked when the entire selection is not italicized, and in a third, "mixed" state when only part of the selection is italicized. In one scenario, he accidentally checks the Italics check box then realizes he wants to change it back to the "mixed" state. Let's say it's implemented as an HTML5 input element with type=checkbox, but with scripting to handle the tristate behavior (perhaps as described in Shams' Blog: Tri-State Checkbox using Javascript - http://shamsmi.blogspot.com/2008/12/tri-state-checkbox-using-javascript.html); in this case the keyboard UI is not standardized, so the speech recognition utility cannot provide a corresponding voice command. In a second scenario, it's implemented as an entirely custom control.

Use case: Nadia uses a screen reader with the same web-based text editor that provides a tri-state checkbox for indicating and adjusting italics. Regardless of whether the author used an HTML5 input element with type=checkbox or if they used an entirely custom control, the screen reader has no way of determining which state it's really in, and so can't convey this to Nadia using speech.

Use case: Ryan, a keyboard user, is using the same web-based text editor that provides a tri-state checkbox for indicating and adjusting italics. Unfortunately, because each web site or web-based app has to implement its tri-state checkbox itself, they often implement entirely different keyboard UI, and so when Ryan comes to one he cannot easily figure out how to use it with the keyboard.

Recommendation: HTML5 should support tri-state check boxes and menu items so that user agents can provide standardized user interface for them and so assistive technology can provide alternative input and output for them.

Non-Interactive User Agents

Added by Greg Lowney 00:40, 28 July 2011

The current draft HTML5 spec says:

Non-interactive presentation user agents

User agents that process HTML and XHTML documents purely to render non-interactive versions of them must comply to the same conformance criteria as Web browsers, except that they are exempt from requirements regarding user interaction.

Typical examples of non-interactive presentation user agents are printers (static UAs) and overhead displays (dynamic UAs). It is expected that most static non-interactive presentation user agents will also opt to lack scripting support.

A non-interactive but dynamic presentation UA would still execute scripts, allowing forms to be dynamically submitted, and so forth. However, since the concept of "focus" is irrelevant when the user cannot interact with the document, the UA would not need to support any of the focus-related DOM APIs.

It is very important that developers should not relegate user agents to the "non-interactive" category if there is benefit to the user in being able to interact with them.

Use case: Imelda uses a screen enlarger. She runs an application that displays a static, web-based slide detailing today's weather forecast. Imelda uses a screen enlarger, and normally reads blocks of text by having the magnifier track the text caret as she moves it through the content. However, as the developers considered theirs a non-interactive user agent, they left out the ability to do caret browsing. With luck, the screen enlarger will be able to access the application's DOM and determine the screen coordinates of each word, but it would certainly be easier if the application supported caret browsing.

Use case: The weather application that Imelda is running allows the user to select text with the mouse and automatically copies that text to the clipboard. However, the developers did not consider this "focus" or "activation", or even "selection" because the selection does not persist and the user can't perform their choice of actions on it. However, by this decision they are making functionality available to only one input modality, and users who rely on other modalities such as keyboard or speech recognition are denied full access. In this case, the developers should not have considered their application non-interactive, and instead implemented full focus and selection functionality.

Recommendation: The HTML5 spec should include wording that clarifies that user agents should not consider themselves non-interactive if they render content to the user on any system that can take input, and more specifically should not omit support for focus-related DOM APIs just because they do not expect to be taking input. Ideally it would include use cases similar to the above in order to help readers understand the issue.

Standard pieces of information should be automation-friendly

Added by Greg Lowney 02:34, 28 July 2011

As a general rule, when it's common for pieces of content to serve a conventional purpose, the markup should allow this purpose to be identified in an unambiguous and automation-friendly way. This allows user agents and assistive technology to make use of it to provide users with advanced navigation and customization features. HTML5 does this in many areas, many more in fact than HTML4, but there are still common purposes that are not yet addressed.

For example, in order to let people find and navigate content using a wider range of mechanisms, markup languages could provide a way to add tags (also called keywords) to any or almost any content elements. Currently keyword tagging is supported at the HTML document level, but not at smaller granularities. As many web sites support content tagging on images, posts, or articles, and HTML5 puts a lot of effort into facilitating aggregation of these, the new feature would also be an example of providing a standardized way for content to expose information that is currently handled in an ad hoc or site-specific fashion, thus allowing user agents and assistive technology to make use of it in novel ways.

Use case: Nadia uses a screen reader to explore a blogger's web site. The page shows the first paragraph of each recent post, each in an HTML5 article element, and each includes the title, poster, posting date, keyword tags, and a link to the full article. Because scanning the entire page searching for articles that match her criteria is much more difficult and time consuming for her than for most users, Nadia uses a browser add-in that lets her navigate to the most recent article before or after a given date. Each article's title and publication date are marked up in automation-friendly ways, as <article><header><h1>The Very First Rule of Life</h1><header><p>

Use case: Marge is browsing a web page with a hundred images, but she wants to find the one of a sailboat. While many users can make a quick visual scan of each screenful of content, paging down until they find the correct image, this is much more difficult for her, so she activates an add-in for her browser, selects the check box for "Images", and a list box is populated with the keywords for all the images on the page. She types the first characters of "boats" to move the focus to this entry, presses Space to select it, and presses Enter, at which point the extension temporarily hides all the content except the two images that have the boat keywords.

Use case: Randy is easily distracted so he uses a browser add-in to filter each page, hiding information that it can recognize as not relevant to his current task. Using standard markup it can, for example, hide all content on a page excepting articles that meet certain criteria.

Use case: In order to help him understand and navigate a collection of HTML documents, Joshua runs a tool that generates and presents him with an index and table of contents. If the author has no way to recommend keywords and phrases for portions of content, the functionality can only provide a simplistic guess at what are the primary topic of each section, but if the author can explicitly associate key words or phrases with headings, tables, and even paragraphs, the tool can do a much better job. This can also provide tools such as search engines to data and hints to work with.

Recommendation: HTML5 should allow a keywords or tag element to be added to all or nearly all elements. This could be a set of space-separated tokens, and although this precludes including spaces in the keywords or key phrases, it would allow it to be processed in standard ways already used by other attributes. For example:

  • <img keywords="flags swan union-jack" src="western-australia.png" alt="Flag of Western Australia">.
  • <article keywords="computers apple news"…>
  • <h2 keywords="html markup automation keywords author">

Alternatively, a keywords element could be defined that could be associated with another element, either by referencing an element ID or by surrounding content with which it's associated.

Recommendation: HTML5 should define a text-level author element that would identify its content as identifying the author of the associated content. The typical use would be in cases like <article><header><h1>The Very First Rule of Life</h1><header><p><time pubdate datetime="2009-10-09T14:28-08:00"><p>Posted by: <author><a href="/users/gwashington">George Washington</a></author>

Added Greg Lowney 08:23, 28 July 2011 (UTC)

Use case: Nadia is editing text on a web page, in a textarea or an element with contenteditable=true. Her browser's spelling checker is turned on, and the region has the spellcheck=true, so when the browser's spelling checker marks up spans with a red, wavy underline to indicate what it thinks is a spelling error, and a green, wavy underline to indicate what it thinks is a grammar error. Following the advice in the HTML5 spec (4.6.19 The mark element), the browser marks up these phrases with u elements, distinguishing them using different classes. Unfortunately, these classes are meaningless to Nadia's screen reader, which is unable to distinguish them from ordinary underlines; that is, it has no idea what the classes mean, and whether they have semantic meaning or merely stylistic effects. If only they were marked up in a standardized way, her screen reader could use spoken phrases, audio icons, or inflections to indicate which ranges were flagged as spelling errors, etc.

Recommendation: HTML5 should define a standardized mechanism for marking up phrases flagged by a spelling checker or the like, so that content scripts or assistive technology could react to them intelligently. One option would be to extend the mark element with one or more new attributes, such as a flag attribute with enumerated type whose values could be spelling (e.g. "generral") or grammar (e.g. "He run"), or syntax (e.g. coding errors such as end if without preceding if), and also a meaning attribute that could use set to a user-friendly string that would convey the meaning of the markup to the user (e.g. in a multilingual editor, or <mark meaning="Break Point"> in a debugger). Alternatively, a new element could be created for this purpose, rather than extending the use of the mark or u elements. (It's also worth noting that the important aspect of these indicators is *not* that they're underlines, but that they're spelling or grammar errors, and therefore using the u element is really inappropriate.)

Marking up content that repeats on related pages

Added by Greg Lowney 09:11, 28 July 2011 (UTC)

Can you mark up content that is repeated across pages on the site or subsite? For example, an example given in the HTML5 spec for the <footer> element describes it as showing a "site wide footer", but there is no easy way to distinguish that from a page-specific footer. This is particularly, but not exclusively, an issue for <header>, <footer>, and <nav> elements. This is an issue because assistive technology may often want to hide or skip over things that are repeated on multiple pages, but not skip over equivalents that are unique to the current page. Note that repeated content may still vary somewhat from page to page, as in when items in <nav> are all links except the one representing the current page, or the fact that different pages may have different copyright dates. One mechanism might be to allow an string attribute on these elements that could be compared to elements on other pages of the same site, and if the strings match the user agent can assume that they are equivalent (e.g. <nav sitewide="toplevel">).