13668 – Generated content must be accessible

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13668 - Generated content must be accessible

Summary: Generated content must be accessible

Status:	VERIFIED WONTFIX

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	LC1 HTML5 spec (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Charles McCathieNevile
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:	a11y, a11ytf

Depends on:
Blocks:

Reported:	2011-08-04 03:31 UTC by Greg Lowney
Modified:	2013-01-09 14:29 UTC (History)
CC List:	13 users (show)

See Also:

Attachments

Description Greg Lowney 2011-08-04 03:31:02 UTC

The HTML5 spec should clarify that generated content should be included in the DOM, available to assistive technology, and copied to the clipboard along with other content. This applies to content styles add before or after elements (e.g. content: "Note: ") as well as numbers and bullet characters inserted before li elements, etc.

Current browsers vary in their handling of this, causing problems for some users.

Use case: Nadia's screen reader is reading her a document, but it doesn't make sense because the browser does not expose generated content, and thus the screen reader cannot detect or voice the explanatory text such as "Example:", "Note:", and "Warning:" added before the paragraphs.

Use case: Ralph has difficulty entering text, so he relies on copy and paste more than most users. However, when he selects and copies a web page includes lots of numbered list items, then pastes it into email, all the item numbers are replaced by hash signs. Because editing the text is difficult for Ralph, it is much more of a problem for him than for other users who can correct the errors more easily.

Note that it should still be possible to distinguish generated content from author-supplied content. 

I apologize if this is not the specific spec where this should be addressed, but it must be addressed somewhere.

Comment 1 Tab Atkins Jr. 2011-08-04 04:09:17 UTC

Generated content is definitely not part of the DOM.  However, it should definitely be exposed to AT and (imo) be copied as part of the clipboard.  I'm not sure if this is best dealt with by CSS or HTML, though.

Comment 2 Michael[tm] Smith 2011-08-04 05:36:29 UTC

mass-move component to LC1

Comment 3 Benjamin Hawkes-Lewis 2011-08-04 06:32:45 UTC

(In reply to comment #0)
> The HTML5 spec should clarify that generated content should be included in the
> DOM,

It is included in the CSS Object Model not the DOM, and out of scope for HTML5:

    http://dev.w3.org/csswg/cssom/

For example, given a DOM element reference "el" you can query the content of ::before using:

    window.getComputedStyle(el, '::before').content

> available to assistive technology

HTML references WAI-ARIA which defines text exposed to accessibility APIs to include generated content "as appropriate":

    http://dev.w3.org/html5/spec/content-models.html#wai-aria

    http://www.w3.org/WAI/PF/aria/roles#textalternativecomputation

There are problems with this of course. See in particular this www-style thread:

    http://lists.w3.org/Archives/Public/www-style/2010Nov/0313.html

Note also PFWG's view that this "It is ultimately up to a group working on CSS to define the accessibility API mapping of that information when applied to a host language."

    http://lists.w3.org/Archives/Public/public-pfwg-comments/2010OctDec/0017.html

So again, this seems out of scope for HTML5.

> and copied to the clipboard along with other content.

That requirement is unrealistic because it's unclear what we should standardize and browser vendors want to continue to experiment with the right behaviours for copy and paste:

     http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-February/030182.html

> Use case: Nadia's screen reader is reading her a document, but it doesn't make
> sense because the browser does not expose generated content, and thus the
> screen reader cannot detect or voice the explanatory text such as "Example:",
> "Note:", and "Warning:" added before the paragraphs.

I think this accessibility failure reflects authorial misuse of CSS for content.

This text would be lost if author styles were replaced with user agent or user styles, so I think such code violates WCAG2 1.3.1:

    http://www.w3.org/TR/WCAG20/#content-structure-separation-programmatic

WAI-ARIA warns against this practice: "Note: Though the user agent may make efforts to compute a text alternative from CSS-generated text in the absence of text content determinable from the DOM, authors should not provide text through a style sheet, as the user agent may incorrectly determine the text alternative."

However, authors continue to abuse generated content. For example, the HTML5 spec itself abuses CSS to insert the text "This box is non-normative. Implementation requirements are given below this box" on content with the class "domintro".

> Use case: Ralph has difficulty entering text, so he relies on copy and paste
> more than most users. However, when he selects and copies a web page includes
> lots of numbered list items, then pastes it into email, all the item numbers
> are replaced by hash signs. Because editing the text is difficult for Ralph, it
> is much more of a problem for him than for other users who can correct the
> errors more easily.

I think this accessibility failure reflects a design error in HTML4 of treating the numbering of lists as a stylistic rather than content attribute:

    http://www.w3.org/TR/REC-html40-971218/struct/lists.html#h-10.3.1

When the numbering used in a list is significant, authors should arguably use <p> or <div> elements with numbers in the DOM text. User agents should arguably copy <ol> elements with numbering of some sort, since the fact that they are ordered lists is part of the content. 

> Note that it should still be possible to distinguish generated content from
> author-supplied content.

Can you be more specific about this requirement? Distinguish it in what context? Do you mean it should be distinguished in the platform accessibility APIs? If so, I suggest taking that up with PFWG for a future version of WAI-ARIA.

Comment 4 Michael Cooper 2011-09-06 15:18:45 UTC

Bug triage sub-team thinks HTML A11Y TF should track this. Will look for a volunteer to make a proposal for this issue, addressing the concerns that have been raised already.

Comment 5 Michael Cooper 2011-09-08 15:31:36 UTC

Assigning to John Foliot to lead a sub-group in HTML A11Y TF to explore the issues in this bug and bring a proposal to the group.

Comment 6 Michael[tm] Smith 2011-11-20 15:30:14 UTC

See comment #5:
> Assigning to John Foliot to lead a sub-group in HTML A11Y TF to explore the
> issues in this bug and bring a proposal to the group.

So I note that this bug is waiting on that proposal to be brought to the group.

Comment 7 theimp 2011-11-29 14:59:50 UTC

> However, authors continue to abuse generated content.

This is, fundamentally, the problem.

> I think this accessibility failure reflects a design error in HTML4 of treating the numbering of lists as a stylistic rather than content attribute:

This seems more an error of authors (which HTML5 seems to want to make very easy with the reintroduction of @value and related presentational attributes). Numbering is not implied by Ordered Lists.

As you say:

> When the numbering used in a list is significant, authors should arguably use <p> or <div> elements with numbers in the DOM text.

Indeed.

> User agents should arguably copy <ol> elements with numbering of some sort, since the fact that they are ordered lists is part of the content.

Exactly.

OL implies order (structure), not numbering generally (presentation) or specific numbers (content). All that is required of the user agent is that (by some method) the order is conveyed. In the case of numbering, this only implies that earlier entries have lower numbers than later entries, and not that any particular number is applied. If a particular number is required by the author, that number must be specified as content.

On the other hand, user agents are *not* copying numbering. And authors are *not* separating content from presentation.

> For example, given a DOM element reference "el" you can query the content of ::before using:

The ::marker pseudoelement from the CSS Lists And Counters Module Level 3 remains largely unimplemented in current browsers, meaning that, in the case of lists, ECMAscript cannot determine the value, or set it (in this way).

Furthermore, this makes it very cumbersome for authors using non-User Agent (generic ECMAscript) solutions, because the "generated content" that they're looking for in the case of a list item, for example, might be in the ::before pseudoelement, or in the ::marker pseudoelement, or in the value attribute, or in none of them, or with different values in each (possibly with browsers choosing which one is ultimately rendered, differently from each other).

So already, authors who do not know/care about accessibility have at least three different ways of inserting vital content as non-content, and no reliable way for users or Assistive Technologies to access that content.

Even before it reached this level of complexity, the typical response of voice Assistive Technologies was to number ordered lists themselves (that is, count up from one, in exactly list order), and always simply use the text "bullet" for unordered lists. While this has started to change in some cases, it has also become generally worse than it used to be.

Perhaps something does need to happen at this (or another) level.

One possibility might be to standardize, as part of HTML, a method of generating an intermediate format for export (whether to be saved to disk, or presented to Assistive Technologies, or copied to the clipboard, or converted to a non-HTML file format, etc.). In such a format, the HTML, CSS, scripts, and other resources could be interspersed in a standard way, with a view to preserving content rather than reproducing the original. For example, all required CSS rules would be reduced to only those which apply (last of each type) to a given element, and stored in the style attribute of each element. content: style rules could be expanded into actual content characters in a new element in the appropriate position. And so on.

This would be a fallback method, because unlike the DOM it would likely be too difficult to change in real-time; but it would allow standardized access to Assistive Technologies that do not specifically utilize the Accessibility APIs, and a standard method of piping data from a browser to offline processing tools (such as a spellchecker, or download manager). It might also have other benefits, such as for debugging, comparison with the output of other browsers even if you can't run them to access their DOMs at the same time, or simply because, as a single monolithic format, it can be used to reliably resume reading, or to retransmit.

This actually sounds a bit too ambitious for an HTML5 deliverable, but it is an idea.

Comment 8 Ian 'Hixie' Hickson 2011-12-07 19:59:05 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: This is a CSS issue, not an HTML issue. HTML doesn't have "generated content" and I don't see any way that it makes sense to address this in HTML.

I'm not even sure it's a CSS issue. CSS puts the content in the rendering tree exactly like page content. If the UAs aren't reading it, that's a UA bug.

Comment 9 Charles McCathieNevile 2012-11-21 01:25:03 UTC

I'm with Hixie. This isn't an HTML bug, and I think it is fundamentally a User Agent bug. I know that Opera makes the content available to VoiceOver, and best practice is generally to render to a screen reader what is on the screen (one way or another). Maybe there should be something in the relevant CSS module, but I don't see how it fits in HTML at all.