Response to: ChangeProposals/DeprecateLongdesc
The following is in response to the Change Proposal Deprecate Longdesc submitted by Jonas Sicking on 25 June, 2011. While some of this is directed to Jonas as author of the Proposal, I also hope that other members of the community will offer feedback and comments when appropriate. I would also like to note that the core issues discussed here will also have impact on some of the other open discussions we are currently engaged in.
There are 3 core issues that I believe Jonas’ proposal does not address, while also noting that currently @longdesc does. They are:
- User Interaction: Discoverability and User choice
- Preservation of HTML Semantics and Richness
- User-Agent Support
For each of these issues, it is my assertion that Jonas' proposal incompletely or inaccurately portrays the behavior of browsers and screen readers with regard to longdesc and/or ARIA, and thus it contributes to inaccurate conclusions about how user needs would be addressed by different approaches to longdesc, and/or how other users would experience or potentially benefit from changes to this feature.
Issue 1: User Interaction
In Jonas’ submission, he indirectly hits upon one of the fundamental problems additional data (be it "metadata" or "hidden" data) suffers from: discoverability.
...there is no hint indicating that the longdesc attribute is available until the user right-clicks the element which means that it's still not very discoverable...
In fact, this problem is at the root of many of our most difficult accessibility challenges left to finalize in HTML5. Whether it’s the longer description of a complex image, author-supplied keyboard shortcuts provided by @accesskey, additional table navigation (supplied by @summary), or newer issues such as the need to describe the initial key-frame of a video or the "hit-testing" issue of canvas – all of these issue share the core fact that the data must be discoverable to the end user to be of value.
Jonas’ Proposal acknowledges that design constraints often require that this text not be shown under the majority of cases:
One use case that has been brought up is the ability to provide a description to AT users, while still not affecting the design for non-AT users. This is because page designers often have quite strict requirements on the visual appearance of the page and it would likely negatively impact the level of accessibility support if contents specifically for for example screen readers had to be provided within those requirements.
Currently, in Screen Readers that support @longdesc the behavior is to announce the presence of a longer description, and then allow the user to pursue the description (or not). Today, Browser extensions to support @longdesc such as the Opera "TellMeMore" (https://addons.opera.com/addons/extensions/details/tellmemore/1.2/) or the recent Firefox "Longdesk" Plug-in (https://addons.mozilla.org/en-US/firefox/addon/longdesk/) provide 2 examples of visual indicators of the presence of @longdesc content – these are user-choice extensions that can be toggled on or off by the end user.
While Jonas does acknowledge the problems with discoverability, his Change Proposal does not address these issues. The Change Proposal does not explain how using aria-describedby would be more discoverable than @longdesc to end users, including sighted users. It simply proposes to change the mechanism by which authors would associate longer textual descriptions, from an established mechanism to one designed to expressly communicate with accessibility APIs (aria-describedby), while not addressing the root problem – user agent support (or lack of). Retaining @longdesc preserves the discoverability mechanisms we have today in those user agents and Adaptive Technology that do support it, and we have no user-agents today that provide the same functionality for aria-describedby.
1b. User Choice
With the notion of Discoverability comes the companion need for Choice: once the user discovers the presence of an action or additional information (for @longdesc the choice is follow the link, or not), how is that Choice conveyed to the end user? With attributes such as @accesskey and @longdesc (as currently implemented) there is a choice conveyed to the end user: they can "consume" the author’s enhancements or choose not to.
Today tools that support @longdesc present the option of consuming the longer description as a link that can be followed: whether a link announced via a synthesized voice or rendered onscreen either in the Browser Chrome, as part of a screen overlay, or in the Contextual menu, users choose to have longer textual descriptions presented to them, not forced upon them.
In contrast, the designed behavior of Screen Reading technology that supports aria-describedby is to automatically 'read aloud' the text string referenced by the attribute, whether or not the end-user actually wants this information. While prescribing this behavior is certainly outside of the scope of HTML5, it is a known and existing behavior none-the-less, and using Jonas’ proposal at this time will introduce a "force-fed" longer description on the Screen Reader user whether they want it or not. I have previously likened this to the equivalent of my physically forcing your head to stare at a complex image, and not releasing you until you have completely described the image. This is, understandably, an extremely disruptive user-experience and one we should be avoiding at all cost. While the W3C could perhaps recommend alternative user-agent behavior here (either via the ARIA Specification, or the User Agent Accessibility Guidelines) there would certainly be a distinct time-frame before we can expect to see all Screen Readers re-tooled to behave differently.
Jonas’ Change Proposal does not explain how aria-describeby would preserve user-choice in consuming (or not consuming) a longer textual description (when supplied). It does not propose how that choice would be enacted upon by the end user, nor does it determine who is responsible for supplying the actual trigger mechanism. With @longdesc today, both of these key issues are already addressed by supporting AT, and existing interaction patterns could be (should be) adopted by other non-supporting user agents.
It has long been argued by opponents to @longdesc that providing longer textual descriptions to complex images would benefit users beyond "blind users" – users with cognitive disabilities being an oft-cited other group. There is no disagreement with this sentiment; in fact on the contrary, as it has long been held that not all users will consume or experience their web content the same way. This holds true whether we are discussing the visual consumption of the web page (fluid layout, progressive enhancement/graceful degradation, "mobile-first", etc.) or the cognitive processing of the content.
Yet Jonas writes:
...the description is available to both AT users and non-AT users alike. This is generally preferable since it allows both groups of people to collaborate better, for example if a blind and a seeing person sit at the same computer and look at the same document, both have the description available and can discuss its contents.
...not available visually directly in the page which means that a person who inspects a page visually will not have the same experience as a person who uses a screen reader.
...the presumption being that all users must have the same experience.
However reality and experience already tell us that this is simply an impossible and perhaps even harmful goal to seek: If we accept the axioms of the visual design paradigm of adapting to the end user’s display constraints (a.k.a. fluid design, etc.) why would we not also accept that for different user-groups, their consuming experience will also be adaptive and thus different. The goal has never been about "identical" experience, but rather equivalent experience. Thus a user who can see and process content of complexity will study a sophisticated image and derive an understanding, whilst a non-sighted person, or someone with a lower cognitive capacity, might benefit from additional assistance when encountering the same complex image. The goal then is not an identical experience, but rather one best tailored to all end users regardless of their needs.
Q: Is there agreement that the goal is equivalent user-experience and not identical user-experience? How does this Change Proposal support that goal?
Issue 2: Preservation of HTML Semantics and Richness
At the heart of this Change Proposal is the suggestion that aria-describedby can solve many of the alleged problems that @longdesc has suffered from in the past.
One simple way for page authors to provide such AT-specific content is to put it inside an element and add a hidden attribute to that element… There doesn't seem to be any harm in such markup. It doesn't for example negatively impact the usability or searchability of the page. And since it both is intuitive for authors and provides value to them it would be a net win to allow this pattern.
The single largest problem with this Change Proposal can be traced to this comment: "There doesn't seem to be any harm". For while it appears that a technique such as this might be benign, it actually introduces a whole raft of new issues that impact on other Standards and Specification, as well as existing hardware and software.
Potential for Harm
Changes to the WAI-ARIA section
The Change Proposal continues:
After the first paragraph, add the following note (using the green text layout as used for notes elsewhere): Note: Implementations are encouraged to follow the recommendation in ARIA specification and expose the full semantics of any HTML elements linked to by aria attributes. For example if an aria-describedby attribute points to a <a> element or a <table> element, then it is recommended that implementations expose the full semantics of those elements rather than for example just their textual contents. This applies even if those elements aren't rendered, for example due to CSS.
Outside of the potentially harmful change that this addition to HTML5 would impact on the Candidate Recommendation ARIA 1.0 (by shoe-horning this into HTML5, but not ARIA 1.0), at this time, this is technically not feasible.
This comes down to:
- how ARIA and the different Accessibility APIs must treat content referenced by describedby,
- how the difference between visibility and non-visibility of the text content (here, the longer textual description) factors into how that text must be processed by the Accessibility APIs, and
- how, when hidden, any further HTML "richness" is lost because of how the Accessibility APIs are hard-wired to process an accessible description (which is why aria-describedby was originally minted).
Section 4.6 of the CR ARIA Specification User Agent Implementation Guide states how you compute the accessible description. That is different from a described by relationship API mapping. If you are going to compute a string description you need to strip out the semantic content and get the text. Platforms do have accessible description properties and they are strings.
In Section 5.5 we have a mapping table
- for MSAA+UIA Express the description is exposed to the accessibleDescription (an MSAA String) if it is hidden. It does not say what happens when it is visible
- for MSAA+IA2 the description is exposed as the accessibleDescription(MSAA) if hidden, otherwise it exposes the relationship whereby an AT can get the full semantics of the content (you could have access to tables, etc.)
- for UIA it exposes a reference to the description in both hidden and visible by exposing the relationship.
- In ATK/ATSPI it exposes a relationship. ATK/ATSPI does not have a function to expose an accessible description string. Which would indicate that ATK/ATSPI exposes hidden elements to the AT in the accessibility API
- On Mac OSX the accessibleDescription is exposed but it states this is reserved for the non-visible case and says nothing about visible.
In 188.8.131.52 It simply states how to compute the accessible description (this is different than a relationship). This has nothing to do with when the accessibleDescriptionis exposed.
In Section 5.6.3 it simply states that aria-describedby must be mapped to the accessibility API in accordance with the state and property mapping table. However, the accessibility API cannot provide a reference to an object that is not visible as some OS platforms may not produce accessible objects in the platform accessibility API for non-visible content. Further if it is not visible and there is a tabbable item in the content what happens when you tab to it? You can't render it and the browsers will not accept that. For canvas the author has the ability to render the focus.
Therefore, to use aria-describedby here it must be under the following situations: When it is hidden the content must be exposed as a String description. To view the full semantic rich text you must provide a way to render it. It does no good to have semantically rich descriptive text if all users can't see it.
This Change Proposal also seeks to make modification to the @hidden attribute by adding new text:
"Elements that are not hidden should not link to or refer to elements that are hidden. However, ARIA attributes are exempted from this rule and are allowed to point to contents inside hidden elements"
There are a number of problems with this suggestion. The HTML5 Draft Spec states:
"The hidden attribute is a boolean attribute. When specified on an element, it indicates that the element is not yet, or is no longer, relevant. User agents should not render elements that have the hidden attribute specified.
There is a logical fallacy in asking aria-describedby to point to content that is "...not yet, or no longer, relevant" and thus "...not rendered by the user agents". How does one point to something that does not exist?
The Draft Specification continues:
The hidden attribute must not be used to hide content that could legitimately be shown in another presentation. For example, it is incorrect to use hidden to hide panels in a tabbed dialog, because the tabbed interface is merely a kind of overflow presentation — one could equally well just show all the form controls in one big page with a scrollbar. It is similarly incorrect to use this attribute to hide content just from one presentation — if something is marked hidden, it is hidden from all presentations, including, for instance, screen readers.
This Change Proposal does not addresses the apparent contradictions located in the Specification. How does using @hidden aid sighted but cognitively challenged users who are not using a specific piece of AT but would none-the-less benefit from a longer textual description? And like the over-arching problem of aria-describedby’s hidden vs. visible content, what happens when the text that is “hidden” includes tab-focusable content?
Screen-readers need to be able to tab through this content (say perhaps table cells), yet for sighted users unable to use a mouse there would be a whole bunch of apparently non-useful/non-successful tabbing, and our cognitively challenged user would potentially be hopelessly lost. (We cannot assume that only users of Adaptive Technology will need to be able to tab, and any strategy that attempts to sniff for AT is fraught with security and privacy concerns as well.)
On one hand, this is a very large problem, but on the other it is one that is relatively easy to address. As Rich Schwerdtfeger previously noted:
I believe you can use aria-describedby under the following situations. When it is hidden the content may be exposed as a String description. To view the full semantic rich text you provide a way to render it. It does no good to have semantically rich descriptive text if all users can't see it - at least on request...
The solution is to visibly render the longer textual description. While this may appear counter-intuitive with regard to supporting non-sighted users, this is in fact what happens with @longdesc today: it launches a new tab/window (or replaces the current window content with new GET content) so that the screen reader can then semantically process the data: further the long descriptions are usually written in Plain Old Semantic HTML.
Issue 3: User-Agent Support
The real issue under discussion is not the means by which we associate longer textual descriptions to complex images (both @longdesc and @describedby are attributes), but rather how user-agents subsequently engage the user with that associated content. If we accept that forcing additional content on some users (non-sighted) is a bad User-Experience, then by extension we must ensure that a means of pursuing the longer description or choosing not to must be part of any solution.
In discussion with the 2 lead developers of the NVDA Screen Reader this past march at CSUN, they confirmed to me that for NVDA to support @longdesc they would want the browser to provide the actual interaction: they were not interested in creating a new ‘screen-reader-only’ interaction, but rather to simply map to an existing interaction. They also agreed that the contextual menu would be a good place for this trigger.
Sean Hayes (Microsoft) has also posted a MSDN blog on how to extend similar functionality to Internet Explorer (http://blogs.msdn.com/b/accessibility/archive/2011/03/25/configuring-internet-explorer-to-handle-longdesc.aspx) so it seems that there is also a conversion around a standard activation mechanism for sighted users – the contextual menu.
HTML5 deliberately does not seek to define how user-agents are to act or provide interaction. This is a factor not under debate. Yet without a common interaction model users are no further advantaged, and changing the current means of providing longer textual descriptions breaks existing implementations: For Screen Readers that already support @longdesc, currently the screen reader will announce to the user the presence of a longer text description, and then await further user-input; in the GUI based solutions we have available today (native or via plug-in) for @longdesc the Trigger is provided by way of the Contextual Menu.
Jonas’ Change Proposal says nothing about improving user-interaction, while at the same time breaks (or at a minimum deprecates) existing interaction models, with no replacements provided. It provides no evidence or mandate that user agents will provide a means for sighted users (not using Assistive Technology) to access longer textual descriptions when the @hidden attribute is (as suggested by Jonas) employed. Finally, WAI-ARIA is mapped only to the Accessibility APIs, yet no mention of further (non)AAPI interaction is mentioned in Jonas’ proposal, effectively shutting out those users who could benefit from longer textual descriptions but who do not use AT tools.
Other Assertions and Comments
Jonas also makes some assertions in his proposal that bear some further scrutiny. He wrote:
"...longdesc suffers the problem that it encourages people to make the description available in a separate document..."
No evidence has been brought forward that having a longer textual description in a separate document is a problem. Is there any evidence to support this claim?
"...Note that both aria-describedby and longdesc support putting the description in either the same document or in an external document."
This is factually inaccurate: at this time aria-describedby can only point to IDREFS and not full URIs (http://www.w3.org/TR/wai-aria/states_and_properties#aria-describedby)
"...Another problem that longdesc suffers is that a lot of people seem to misunderstand how the attribute works. In a large number of cases discovered on the web people put the actual description in the attribute rather than a URI pointing to the description."
This assertion is poorly supported.
In the early days of the web, a lot of mistakes were made by under-informed authors, not only with the use of @longdesc, but with virtually every other element and attribute in HTML. In the days of writing your own web-pages in a text editor, it was not uncommon to discover or generate these types of mistakes. However today most content authors do not use Notepad to create their web content, but rather some form of visual (WYSIWYG) editor, whether stand-alone (such as Dreamweaver) or embedded within a larger web application such as a Content Management System.
In May 2011 I undertook a basic review of some of these tools, to see how they supported the creation and delivery of @longdesc content (http://john.foliot.ca/wysiwyg_longdesc). Using today’s tools, it is extremely difficult to make these kinds of mistakes: in fact one tool – a WordPress plugin – automates the entire process and writes everything to the site’s database and creates the programmatic reference with little author interference at all.
This is a problem that has already been solved.
"...HTML5 is supposed to pave the way forward. We should create HTML5 for how we want things to develop, not simply describe the best recommendations for the tools available today."
...yet the very reason why HTML5 argues the retention of <b>, <i>, and even <u> is that this is the way things are today, and ignoring the present in favor of the future breaks existing content: and one of the core goals of HTML5 was to be backward compatible.
"...One simple way for page authors to provide such AT-specific content..."
It has already been argued that these longer textual descriptions could benefit users beyond those who are using AT (Jonas actually means Screen Readers here, as head-mice, speech input software, sip-and-puff switches, and other devices are all also AT, but not related to this discussion). If we are in agreement that more than just blind users will benefit from these descriptions, why are we seeking a solution that only benefits users of AT?
"...is to put it inside an element and add a hidden attribute to that element. As the specification is written today that is required to work and so authors are likely to do this even though the specification says it is invalid use of HTML. It is however unclear if HTML validators are expected to warn about such invalid markup. But even if they did, few people validate their markup and simply do what works. Hence it's likely that we'll see such markup used."
…and yet earlier:
"...we should encourage authors to choose aria-describedby over longdesc. The most gentle tool we have for doing so is to deprecate longdesc."
If most authors fail to validate their mark-up, what value is derived from deprecating @longdesc?
If one of Jonas’ arguments is that by deprecating @longdesc we will encourage authors who never bother to validate their content to choose a solution that is valid over one that is not valid, then this seems to be a poor strategy... this is circular logic that fails.
For Jonas’ Proposal to truly solve the problem at hand the Browsers would have to address the core issues identified – this is not a mark-up problem, this is a User Agent problem.
- For any Proposal to succeed, the Browsers must address the discoverability problem for all users. This is not indicated anywhere in Jonas' Change Proposal.
- For any Proposal to succeed, the Browsers must natively support the user-choice of consuming or not consuming the longer textual description. This is not indicated anywhere in Jonas' Change Proposal.
- For any Proposal to succeed, the Browsers must preserve the HTML richness of the longer textual description content. Currently with aria-describedby the only way to do this is to visibly render the text on screen, and thus a browser requirement. This is not indicated anywhere in Jonas' Change Proposal.
As we have seen, it is not the mark-up mechanism used for supporting longer textual descriptions that is the problem: both @longdesc and @describedby can provide the linkage (albeit differently). The real problem is how User Agents then process and support that linkage.
While @longdesc still has some issues with how the mainstream User Agents afford users interaction abilities, many Screen Readers have pressed ahead and offer the required support today, and emergent plug-ins from a number of developers are seeking to "fill the gaps" that exist for their favorite/preferred GUI browser(s).
The Accessibility Task Force endorsed retention of @longdesc proposal preserves what support we have in the 3 key areas of Discoverability & User choice, Preservation of HTML Semantics and Richness, and (the limited but existing) User-Agent Support. Jonas' Change Proposal does not address these concerns, while at the same time undoing any gains already made with existing authoring tools, authoring-base and user agents.