Proposed revision/modification/addition to Guideline 2.1 and related SCs to cover touch+AT scenarios

See further discussion of this initial proposal on the mailing list https://lists.w3.org/Archives/Public/public-mobile-a11y-tf/2016Jul/0009.html

Proposal

Expanding the current Guideline 2.1 and SCs 2.1.1, 2.1.2 and 2.1.3 to cover not only "keyboards", but inputs that provide a functionally similar interaction mode (in particular in the touchscreen+AT scenario). The primary reason is to avoid duplicating various SCs purely for "touch+AT" when the concept is already expressed in the same way in the current 2.1/2.1.1/2.1.2/2.1.3 but, due to the language used in those, "touch+AT" is not covered.

Considerations

please review these against the current WCAG 2.0 2.1/2.1.1/2.1.2/2.1.3
does the revised language still adequately cover the traditional "keyboard" scenario? we don't want to make the new guideline/SCs "looser". We want to expand their applicability, without providing any new loopholes/wiggle room for authors NOT to support traditional keyboard
does the revised language adequately cover the touch+AT scenario and our intended requirements (that users of touch+AT must be able to navigate/operate stuff, and that they don't get trapped)?

A reminder that this is specifically about touch+AT - in this scenario, it's the AT (VoiceOver, TalkBack, Narrator, JAWS on a touchscreen Win 10 laptop, NVDA on a touchscreen Win 10 laptop) that is interpreting the gestures/swipes/taps and translating these into instructions to the OS/browser (move focus to next element, to previous element, activate); scenarios where an author OVERRIDES the AT (which were mentioned in some of our calls, but I feel violate the non-interference requirement), and scenarios where we're looking at touch WITHOUT AT (e.g. where authors implemented their own gesture-based interface, for instance) are NOT covered by my suggestions below, and these WILL need new SCs (so to be clear, I'm not saying "if we just do the below, we can go home early folks..."); these simply cover one particular aspect (which we can then set aside and concentrate on the touch w/out AT stuff, stylus, fancy stylus with tilt/rotation/etc, device motion sensor, light sensors, etc).

Note: in many respects, the touch+AT scenario is actually more limited than traditional keyboard, since it does not generally allow for arbitrary keys (like cursor keys, any letters, ESC, etc) to be triggered unless an on-screen keyboard is provided by the OS/AT (and, in the case of VoiceOver/iOS, TalkBack/Android, Narrator/Win10Mobile) this only happens when a user explicitly sets their focus on an input (like a text input in a form). It is functionally similar to the most basic keyboard TAB/SHIFT+TAB/ENTER/SPACE interactions (though it does NOT fire "fake" keyboard events, like a fake TAB for instance). This actually makes it potentially more involved for authors to satisfy this new/modified guideline/SC, meaning that if the below were to be included in 2.1, it would tighten (not loosen) the requirement on authors. If this is felt too limiting, one possibility could be to add some form of additional exception to the modified 2.1.1 to keep it as loose as current WCAG 2.0, and rely solely on 2.1.3 and its "no exceptions" (but 2.1.3 would then, I'd say, need to be promoted to AA instead of AAA).

Of course, beyond the below, there'd be a need to review all relevant "understanding", "how to meet", and related failure/success examples/techniques. But for now, I feel we need to nail this part, as it's quite fundamental to any further input-specific work we want to carry out under MATF.

Pro

It is important to keep in mind that we do not want to increase the size of WCAG. If each group is submitting 5-10 Success criteria, this will be a substantial increase to WCAG. I think we need to integrate work into existing WCAG SC whereever possible.
It avoids duplication of concepts that are closely related - otherwise, the risk is that WCAG 2.1 will end up with a wide range of SCs all saying the same thing (e.g. "don't trap focus", but specific to different input modalities - "don't trap focus for keyboard users", "don't trap focus for touch+AT users", "don't trap focus for speech input users", etc)

Con

Removing the the proposal for functionality to work when AT is turned on may leave a hole that we haven't plugged, and since we are the "Mobile accessibility TF" one primary objective is to ensure things work when AT is tuned on, which this SC doesn't discuss. The AT layer is on a separate level and we'd have to make sure that meeting this SC, will make things "just work" once AT is turned on.
There may be two requirements in this one SC, and sufficient techniques may end up with AND statements joining 2 techniques to meet the two parts of the SC, which convolutes meeting the SC.

Modified Guideline/SCs

Guideline 2.1 Non-pointer Accessible: Make all functionality available from non-pointer input.

2.1.1 Non-pointer:

All functionality of the content is operable through a keyboard and similar non-pointer input interfaces, without requiring specific timings for individual interactions, except where the underlying function requires input that depends on the path of the user's movement and not just the endpoints. (Level A)

Note 1: non-pointer inputs include (but are not limited to) physical keyboards, on-screen keyboards, single-switch and two-switch interfaces, assistive technologies such as speech input (which translate spoken commands into simulated keystrokes and user agent interactions) and screen readers on a touchscreen device (which translate touchscreen swipes and other gestures into user agent interactions). [ED: this is pretty much what should go in the glossary, but I'd say the note can reiterate it here for clarity?]

Note 2: The exception relates to the underlying function, not the input technique. For example, if using handwriting to enter text, or gestures on a touchscreen device running gesture-controlled assistive technology to move the current focus, the input technique (handwriting, touchscreen gesture) usually requires path-dependent input, but the underlying function (text input, moving the focus) does not.

Note 3: This does not forbid and should not discourage authors from providing pointer input (such as mouse, touch or stylus) in addition to non-pointer operation.

2.1.2 No Focus Trap:

If focus can be moved to a component of the page using a keyboard and similar non-pointer input interfaces, then focus can be moved away from that component using only a keyboard and non-pointer input interfaces. If it requires more than unmodified exit method (such as arrow keys, tab keys, or other standard exit methods), the user is advised of the method for moving focus away. (Level A)

Note: Since any content that does not meet this success criterion can interfere with a user's ability to use the whole page, all content on the Web page (whether it is used to meet other success criteria or not) must meet this success criterion. See Conformance Requirement 5: Non-Interference.

2.1.3 Non-pointer (No Exception):

All functionality of the content is operable through a keyboard and similar non-pointer input interfaces without requiring specific timings for individual interactions. (Level AAA)

Additions to glossary

pointer input

an input device that can target a specific coordinate (or set of coordinates) on a screen, such as a mouse, pen, or touch contact. (cross-reference https://w3c.github.io/pointerevents/#dfn-pointer) [ED: i'd also be happy to modify the glossary definition in the Pointer Events Level 2 spec (which is about to go to FPWD) to talk about "pointer input" rather than "pointer", to make it match this proposed wording]

non-pointer input

compared to a pointer input (cross-reference previous glossary entry) - which allows user to target a specific coordinate (or set of coordinates) on a screen - non-pointer inputs generally provide an indirect way for users to move their focus and activate controls/functionality. Non-pointer inputs include (but are not limited to) physical keyboards, on-screen keyboards, single-switch and two-switch interfaces, assistive technologies such as speech input (which translate spoken commands into simulated keystrokes and user agent interactions) and screen readers on a touchscreen device (which translate touchscreen swipes and other gestures into user agent interactions).