Rich's Proposal, edited by Chaals

The problem statement

Scalable Vector Graphics (SVG) 2 adds tabindex to support scriptable linear focus navigation.

SVG Tiny introduced directional keyboard navigation, and logical next/previous linear navigation. The directional keyboard navigation was restrictive in granularity. Linear navigation was only slightly better than tabindex but it had the additional restriction of being linear and less conducive to dynamic interaction as the author has to keep a mental model of the linear sequence through a drawing. Additionally, SVG Tiny did not consider depth in its navigational model which is essential in very large complex drawings.

Touch Interfaces have created a new dimensional navigational mode. Users who are unable to see the screen can receive feedback on what the sighted user can see, increasing usability.

We need an approach to focus navigation that allows an author to

Use tabindex to establish essential keyboard landmarks in a drawing and provide backward compatibility for people who use script to create rich UIs.
Leverage WAI-ARIA and its modular extensions supported in SVG
Leverage the special and location features of SVG and touch devices

Proposal Part 1 – Semantics and Focus

SVG exposes the location of all drawing objects to assistive technologies, similar to HTML.

SVG2 also introduces strong native host language semantics and WAI-ARIA.

There is a draft Connector specification that defines linkages between drawing objects.

Each technology allows the author to define intentional host language semantics to separate out the wheat from the chaff. Only drawing objects having rich semantic meaning are exposed to assistive technologies. The remaining drawing objects are not and this simplifies our ability to support a directional navigation model by filtering out user interface drawing objects that have no semantic meaning to the end user as they are simply used as the basis for producing a higher level semantic user interface construct. Objects with no semantic meaning have a semantic “role”, in WAI-ARIA vernacular, of “none” and are not exposed to accessibility services layers nor to the directional navigation model.

Using this premise we propose, for SVG all drawing objects not having a semantic role of “none” are not focusable and those that do not become focusable or the equivalent of having a tabindex value =”-1” and do not get placed in the tab order. See Intrinsic ARIA Semantics.

Proposal Part II: Directional Focus Navigation

From wherever you have focus on the screen user agents must provide a directional rotation interaction object that is user activated around the currently focused object that allows the user to use multiple forms of input, such as keyboard arrow key or a gesture to rotate in either a clockwise or counterclockwise pattern. This rotation stops on the closest first semantic element encountered out from the current focus point or point of regard (line of sight) much like radar would use to detect objects in the vicinity. Information about that object should be relayed to the user that can be derived through the semantics that defines the object and its relationship to the currently focused object such as “what the object is” and “are they connected and how they are connected.” Additionally, the direction can be conveyed to the author in the form of degrees or points on a compass. The user can then decides whether to give that object focus or to move on to the next semantic object encountered in the rotation.

This approach removes the granularity restrictions from SVG Tiny, leverages the tabindex support carried over into SVG2 from HTML and leverages accessibility semantics provided by the author. In SVG Tiny you could perform directional navigation but you would have to navigate to things that had absolutely no relevance to the user or the author other than to be used to form something of greater importance.

It also leverages existing navigational paradigms found in mobile devices like the iOS rotor to facilitate navigation. The rotor can be used by: a screen reader; alternative input solutions including single switch devices. This solution, or a derivation of it could be used on a wearable device with a touch screen such as a watch.

Other considerations and scoping

Do we convey distance information obout the object encountered by the line of sight hit?

Proposal Part III: The Drill Down and Drill Up

SVG is designed such hat authors can manage how much the user sees or have access to. In SVG we should include the ability to indicate whether an object has more underlying information and allow the user to drill down on an object. We propose that additional information be provided to indicate whether more details resides within an object and allow a user to drill down into it from the UI. Exposing this semantic property for an object would be exposed through new WAI-ARIA attributes called aria-maxdepth and aria-currentdepth that indicates that more information was available that allows author to drill down or drill up and out of the object.

aria-maxdepth

0 (default) You cannot drill down at all
>0 The number of times you can drill down
<0 invalid

aria-currentdepth

>=0 the depth
<0 current height from the default

aria-depthvalue

This is a localized string defined by the author representing the current value. If this the user agent is to manage the current depth then there should be some sort of change notification to the author that the value has changed for an object and the user agent should notify the author that the depth changes and allow the author to modified the localized string for an element.

The group should strongly consider making current depth managed by a user agent, as part of SVG vs. ARIA, given that this is a form of navigation and point of regard should probably include depth. This forms a 3 dimensional navigational model that could be modified for 3D drawing models like WebGL in the future.

Other considerations should be scoping

Do we have document wide depth considerations?
Should we separate drawing object and document depth?

Conclusion

This approach to focus navigation leverages legacy keyboard navigation found in SVG2 and HTML5. It provides a new 2 dimensional navigation model that can be aligned with today’s mobile and wearable devices. It also allows for navigation by depth. It achieves this by leveraging the semantic intent of the essential parts of a drawing to produce a more usable experience for both disabled and users impaired by the context in which they operate.

Amelia's Proposal & Related Brainstorming

Summarizing various issues we've discussed on the teleconferences and in email. And then running with it...

What do we mean by navigation?

There are two distinct types of navigation:

Navigation of the input focus between interactive components.
Navigation of the point of regard (e.g., of a screen reader) through any and all content with semantic meaning.

In many cases there won't be any interactive components, but there still needs to be a way for a screen reader to navigate the complex content in a logical and useful reading order, and for the user to control that order for a meaningful exploration of the content. In other cases, there could be interactive components that contain complex sub-content that needs to be presented to the user in order to understand how to interact with it.

Whether an element should be able to receive input focus depends on:

Strict native semantics (e.g., links with a valid href should always be focusable)
Explicit author instructions (e.g., a tabIndex value, maybe a CSS focusable property)
(Maybe) implicit author instructions (e.g., a role attribute with a widget value)

Whether an element should be in the reading order (able to receive the point of regard) depends on:

If it is focusable
If it has any alternative text or a role that does not map to "none"

I would like to see browsers treat any SVG element that has a tooltip as a focusable interactive element, included in the navigation sequence. The tooltip would be revealed when that component receives focus. This would not automatically put the element in the tabIndex sequence; however, it would make it accessible from the other navigation methods. On a touchscreen, it would allow the tooltip to be revealed by tapping on the shape.

In other words, I'm suggesting that the idea of a "focusable" element be distinct from whether or not it exists in the tabIndex. This is consistent with Florian's focusable CSS property. It is also consistent with WCAG's recommendations for using scripting to control the focus within sub-components of a widget that has a single overall tabIndex value, and the related aria-activedescendent attribute.

By broadening the definition of focusable in that way, we can use the same navigation system for both keyboard users and screen readers. That means that some focusable items might not be interactive, even after factoring tooltip reveal. However, they could still be a waypoint for activating drill-down or directional navigation. Within each focusable object, there could be sub-structure that a screen reader could navigate within using normal text reading commands, but there would be no special SVG navigation of that structure.

If that approach isn't adopted, the reading-order navigation pattern should always be a super-set of the focus order navigation. Or to say it in reverse, the focus order navigation should be a simplified version of the reading-order navigation, restricted only to the focusable elements. The same types of navigation (e.g., directional versus ordered sequence) should be available.

There is also a third type of navigation to keep in mind: pan & zoom of a visual display. This is separate, in that it is not associated with specific elements receiving focus/regard. It would not require any author or browser definition of a navigation order. However, it is relevant in that it is another complexity for user agents mapping input commands to navigation actions. How many ways can a user agent overload basic inputs such as arrow keys with extra modifiers before it becomes too complicated for users to remember?

How are users going to communicate navigation intent?

We don't want to make normative requirements about which user inputs are mapped to which commands. However, we do have to keep in mind that there are only so many ways to communicate "go to the next item". If we have too many ways of defining "next", it becomes too complex: too complex for users to remember, and also too complex for implementations.

Most of the chart navigation demos we've looked at use complex custom keyboard shortcuts, e.g., using the control/command key plus letter keys. These are unlikely to be adopted by browsers for wide use because of potential conflicts with other web pages or plug-ins or the browser's own shortcuts.

The standard keyboard commands supported by most browsers are:

tab/shift-tab

Navigate between user-input components. In HTML, the default is to navigate in DOM order, but it can be affected by the tabIndex attribute. SVG 2 would adopt this.

page-up/page-down

Scroll the visible region of the document. In HTML pages on most browsers, this specifically translates to scrolling the nearest scrollable component that is an ancestor of the element with input focus (which is a problem if there are scrollable sections that cannot receive focus).

arrow keys

Navigate within form fields that have logical sub-components (such as a drop-down list or radio-button group), or within editable text regions.

WCAG recommends arrow-key navigation for many custom widgets, although it has to be implemented with complex scripting.
If the input focus is not within a component that supports arrow navigation, the arrow keys map to scrolling behavior, in smaller amounts than a page-up/page-down key.

Some browsers (e.g., Presto-based Opera) use a modified arrow-based navigation that jumps between tabIndex-focusable components based on document layout instead of DOM order. Florian Rivoal has summed up some of the resulting issues in a post to the mailing list.

Screen readers have extra commands for navigating text separate from changing the focus, with different modes for different types of components: text, tables, forms. Ideally, the graphics navigation system would map to familiar commands, but it would probably require a new navigation mode similar to table mode. Modified arrow keys are often mapped to functions such as repeating or spelling out a word, so expecting a screen reader to implement multiple modifications of arrow-based inputs is not realistic.

What navigation orders and options make sense?

The basic tabIndex focus navigation is strictly linear. The order may be generated automatically based on the DOM order, or it may be set explicitly with numerical tabIndex values. Either way, there is always a single option for next and previous. All focusable elements are traversed to reach the end. This becomes unwieldy when there are many interactive components.

The basic text reading-order navigation used by screen readers also assumes a single well-defined reading order based on the DOM, but it has a structured outline (if the author has correctly marked-up the document). The user can therefore skip to relevant sections or drill down. ARIA introduces the possibility of an alternative branched reading order with the aria-flowto attribute. I don't know how well implemented this is, or how frequently it is used.

The different types of graphical navigation we've talked about can be grouped as follows:

linear

there are a finite number of navigation points, and a single traversal order through all of them

structured network

navigation follows meaningful connections between discrete components

This includes hierarchical or tree-based navigation, which might be based on the DOM tree
It also includes network graphs with cycles and branches that cannot be represented in a tree structure

ordered navigation

navigation searches for the nearest element in a certain direction along a specific scale

This includes 2D navigation based on the visual layout of the graphics
It could also include an ordered navigation based on some other data scale

In a complex application, some combination of all these options would likely be used.

Questions to think about:

Should authors need to explicitly enable each navigation mode? I would argue that each mode should have a default behavior, that could be modified or enhanced by the author providing information. That way, users would be able to get familiar with the navigation modes & use them regardless of whether the author has put extra effort in.
Should there be some sort of switcher between navigation modes, so that the same input (e.g. arrow key) has different meaning depending on the mode? Or should they all exist simultaneously, mapped to different inputs (e.g. shift+arrow versus ctrl+arrow)? Should this be up to the user agent?

Linear Navigation

For a limited number of major components, tabbing through them is reasonable. The model for this is well established, using tabIndex. It's not perfect, but I don't think we'll convince anyone to change it now.

Structured Navigation

The idea of structured navigation is that there are clear relationships between individual components. I'm going to use the term navigation routes to describe these relationships. I want to distinguish the logical connections from the idea of connectors as visual components. I also don't want to use "path" since that has a separate meaning in SVG.

With this perspective, a drill-down/drill-up navigation sequence would be an example of navigating through clearly defined routes from parent to child components. The navigation routes may not be displayed on screen, but they exist logically.

When navigating a tree structure, this would mean that you have to navigate back up the tree before moving to the next branch; you could not move from the last leaf on one branch to the first leaf on the next branch.
A screen reader might do this automatically, scanning to find the next logical thing to read. However, that could cause problems when the routes are cyclical instead of hierarchical. The software would need to have a way of identifying and communicating when it came back to a node it has already visited.
However, in the general case there would be no way of declaring the "level" within the structure. If level had a semantic meaning, the author would need to annotate individual nodes with this information.

More generally, navigation routes may have a direction (incoming vs outgoing). If the connector is represented as a visible object with properties, that can be mapped as a route from the node to the connector, and then another route from the connector the next node.

Some examples:

For the chemical diagrams, both atoms and bonds would be objects that could be traversed. Bonds have properties based on whether they are single, double, triple, or part of a benzyl loop. Atoms have properties based on their chemical code and whether they have an ionic charge. Each atom has one or more non-directional routes to a bond. Each bond has exactly two routes (an atom on either side), but again there is no specific direction to them. If you follow enough bond-atom-bond-atom routes you might end up right back where you started. There is no semantically defined start point for exploring the molecule.
For a flow chart such as the W3C recommendation track diagram, the routes all have clear directionality. The logical start of the first chart is an arrow labelled "First WD", which has one forward route to a node labelled "WD". This node has two outgoing routes: one is a dashed arrow that cycles back to the same node, the other is a solid arrow that leads to the node labelled "CR". However, the "WD" node also has two incoming routes: the afore-mentioned arrow labelled First WD, but also a grey dashed arrow returning back from the "CR" node. The second chart (for handling revisions) has many similarities, but it logically starts with the "REC" node. It also has question nodes, where the routes out are through connectors labelled "Yes" or "No".
For a grouped bar chart, you would have an implicit node for the data plot component of the graphic (i.e., distinct from the title, axis, or legend components). That data plot node has routes to each data group, identified by their category. These routes are directional in that there is a clear parent-child relationship. If you follow the route to that group, you would then have a reverse route back up to the parent level, but also additional forward routes to the sub-categories.

That said, a simple 2D grouped bar chart could be more effectively represented as a table structure for non-visual users. The tree structure is more generalizable, but it does not allow you to skip from a sub-category in one group to the same sub-category in the next group. I think this could be covered by author guidance: if data can be effectively presented as a table, use markup that does so!

For the user, the functional requirements for getting this to work involve discovering and selecting routes:

Within a node, the user can discover which components are next in a route.
If routes are directional, forward and reverse routes can be distinguished. If routes are not directional, ideally the user can distinguish between the route they arrived by and any additional routes (this is more important for screen readers than visual users). More generally, a non-directional route might show up on both the forward and reverse route lists.
The user can cycle through these options, and choose which route to take.

For visual displays, there would need to be some sort of automatic highlighting of the options as you cycle through them, probably using the CSS :focus pseudoclass. Or maybe a new pseudoclass with a similar purpose, since the current element would still have the input focus, you'd be cycling through the "next" values.

For simple examples, this could be implemented solely with arrow keys. E.g., left/right arrows to cycle options, up/down to move forward along the currently selected option or reverse back up the route you took to get here. Or vice versa. Menus in many desktop applications use this approach: up/down to cycle through the menu, and left/right to open or close a sub-menu. The problem is that the routes in a graphical display will not necessarily be arranged horizontally or vertically.

A more flexible structure would require explicit activation to move forward on the currently selected route. This is like a file-directory explorer where you have to explicitly select a folder to before being able to cycle through its contents. The benefit is that the means of cycling through options can be completely independent of any direction or layout.

The commands would need to be:

highlight (or describe) next/prev outgoing route
highlight (or describe) next/prev incoming route
select highlighted route
(maybe) go back along the last route taken

The author needs to be able to specify explicit connections between elements; the aria-flowto attribute could be used. However, you'd want a way to indicate whether that relationship had a strict direction or whether it was bi-directional. Either way, the browser would need to calculate the reverse relationships. Connector elements would have native semantics to define an incoming and an outgoing route.

The author should also be able to indicate that the DOM structure should be used to generate a basic tree structure. Or even better, the DOM structure should be used automatically, and the author would have a way to explicitly override it.

Ordered Navigation

The idea of ordered navigation is that, within a group of sibling elements (logical siblings or actual DOM siblings), the user can cycle through them in a meaningful order.

By defining navigation according to siblings in a group, you avoid the issue of suddenly jumping from data points to axis or legends. The group would by default be a DOM container, but could be constructed using aria-owns relationships or maybe some other relationship we define as part of the charts API.

The elements you are navigating among would be "focusable" elements in the broader sense defined above: either made focusable by giving them a <title> or other alt text, by giving them a specific role, or by setting the CSS focusable property.

Directional navigation would be one instance of searching through elements based on ordered properties. The browser would be expected to order the elements automatically based on their visual displayed position. That raises some questions:

How should the browser convert the position of a shape to a point on a scale? Using the nearest edge of the bounding box? Using the center of the bounding box? I would argue for the center, since it is more consistent across different graphic types, and the position would be the same regardless of which direction you're moving.
The position should be the apparent position, after all 2D transformations are applied. But what should be done about 3D transformations?
Should we support 8-direction (i.e., diagonal) searching?
How can the semantic meaning of the layout be communicated to non-visual users? E.g., Can the author associate a direction on the screen with an axis scale and labels?

The commands would need to be:

go to the next/prev item in the horizontal ordering
go to the next/prev item in the vertical ordering
go to the first/last item in the horizontal ordering
go to the first/last item in the vertical ordering

More generally, we've discussed whether authors could be able to define a meaningful scale and give elements a value on that scale, possibly using aria-value and related properties. However, that brings up many questions of its own:

Would this replace geometric ordering based on visual layout, or be an alternative to it?
How many custom scales would be possible?
How would the user select which alternative scale to use for ordering? How would it be communicated to screen reader users but also to visual users?

Some examples to think about:

In certain map projections, the cardinal directions (E/W and N/S) do not directly correspond to the visual positions on the screen. It would be nice to be able to specify exact latitude and longitude positions for the items on the map, and have the arrow key navigations reflect these values instead of the SVG coordinates.

In a choropleth map of countries colored according to some data value, the user would like to cycle through them according to this data value, or jump to the country with the highest/lowest value. However, another user might still want to navigate by geometric/geographic position.

I think the following is an implementable solution:

There should only be two scales available, horizontal and vertical. This makes it easy to map a fixed set of user input navigation commands to clear actions.
The author should be able to specify a label for each direction with an aria attribute on the parent container element.
- The label could be indicated with a link to a component of the graphic (e.g., an axis with a visible label). That component would have a role to indicate it was a scale or axis, and aria attributes indicating max and min values.
- Alternatively, the labels could simply be strings. Maybe paired strings, so that there could be different labels for up versus down, left versus right. E.g., in a map you might not have a separate element like a chart axis, but you would still want to describe the directions as North, South, East, West.
On each component within the group, the author can specify a horizontal-position and a vertical position as a numerical value. If a value is not specified, the browser uses the position of the midpoint of the shape in SVG user units, converted to the coordinate system for the parent container.

Note that this is significantly different from Rich's suggestion to scan in a rotational manner and communicate the nearest elements by angle and distance. I think this makes it more practical. User input mechanisms are geared towards horizontal/vertical instead of clockwise/counter-clockwise. So are all SVG coordinates. Furthermore, by separating out the horizontal and vertical motion, you reflect the fact that these values are often on unrelated scales in a data chart: distance on a diagonal might not have any meaning.

That said, when the author is specifying scales and values on those scales, they do not have to correspond to geometric position. The author could use a polar coordinate system (radius and angle) instead of horizontal and vertical. However, for any more than two scales the author would need to implement custom focus control with scripting.

Can we put this all together?

I (who?) will have to code up a sample system to try it out.

My tentative keyboard mapping would be something like:

Tab / Shift-Tab: Linear navigation of links and major interactive components, defined by tabIndex.
Alt+ Arrow keys: Cycle through child components or flowto components for the currently focused element; CMN: It seems more natural that shift-alt would go through the cycle backwards…
Alt+Shift+ Arrow keys: Cycle through parent components and flow-from components for the currently focused element
Spacebar: Follow the currently highlighted route; CMN: how does this interact with "buttons" or links?
Arrow keys: Directional navigation along horizontal and vertical axes of elements within a group.
Ctrl+ Arrow keys: Jump to first/last element within the horizontal and vertical ordering of a group.; CMN: what about home/end, PgDN/PgUp? That could free ctrl-arrow for navigating parents…