ChangeProposals/Issue31cMetaGenerator

From HTML WG Wiki
< ChangeProposals
Revision as of 06:08, 2 May 2012 by Jbrewer (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Updated Re-Open Request and Change Proposal on Meta Generator

  • The table of contents immediately follows the Summary.
  • Authors: Judy Brewer, Mike Smith
  • Contributors: Laura Carlson, Janina Sajka
  • Status: Ready for review

Summary

This change proposal:

  • provides background on prior discussions regarding meta name="generator";
  • identifies new information regarding deficiencies in specification of the "generator" value;
  • identifies deficiencies in the weighting of evidence for and against removing the "generator exception";
  • re-frames the core question as one of end-user requirements rather than authoring tool conformance considerations;
  • provides details and describes the impact of the proposed changes.

A requirement for alternative text on images in HTML has existed for fifteen years in order to help ensure accessibility of Web content for people with disabilities. The HTML Co-Chairs have previously decided that HTML5 should allow missing alt to be evaluated as conformant for any pages which include the string <meta name="generator" />, a value that is automatically inserted by a variety of types of authoring and content management tools to indicate tools used in creating a page or processing content on the page.

This conformance exception for pages that contain the "generator" value for meta name exempts images on a large portion of pages on the web from an accessibility requirement that is essential to effective use of the web by blind users. This exception has been based on misunderstandings about the authoring production process, and ambiguous criteria about hand-authoring.

This re-open request and change proposal (CP) presents new information regarding inaccuracies and ambiguities in the specification of the "generator" value, and proposes removing the sentence "The value must not be used on hand-authored pages" in order to address these problems.

This CP also re-analyzes the Co-Chairs' previous decisions in favor of the <meta name="generator" /> exception and against the requirement for alternative text, identifying factual and logical inaccuracies. It describes the harm that is done by the generator exception, and proposes removing the generator conformance exception in order to address these problems.

Rationale

Background of "generator exception" discussion

This proposal is an updated re-open request for a change proposal on the <meta name="generator"> exception for alternative text, with an extensive discussion history going back several years. Key steps have included:

The specification of the "generator" value is deficient

The current specification of the "generator" value for meta name is:

The value must be a free-form string that identifies one of the software packages used to generate the document. The value must not be used on hand-authored pages.

There are two significant deficiencies in this statement:

Fatal ambiguity in the specification

The specification introduces the authoring-conformance constraint that a meta generator element must not be used on "hand-authored" pages. However, it does not define what a "hand-authored" page is, and the definition of that term is not obvious. That lack of a clear definition for "hand-authored" is thus a fatal ambiguity in the spec -- fatal in the sense that it entirely undermines the implicit rationale that the spec uses for excepting certain documents from the requirement to provide alternative text for images, which is based completely on the assumption of the possibility of clearly making a distinction between a supposedly "hand-authored" page and a supposedly non-"hand-authored" page.

It is not at all clear whether a "hand-authored page" is intended to mean, for example, a document that was created in a text editor without using any post-processing tools at all, or whether it can mean a page that was at some point created with an automated tool of some sort, but subsequently was edited exclusively in a text editor or other tool.

It is also worth noting -- regardless of what the ambiguous term "hand-authored" was intended to mean -- that given a document in isolation from its author, it is impossible to know whether that document was "hand-authored" or authored in some other way. So conformance-checking tools can never be expected to reliably detect whether a document was "hand-authored" or not.

Tool-mediated insertions of alternative text are ignored by the "generator exception"

A mistaken assumption that seems to be implicit in the current specification language regarding meta generator is that the authoring production process is binary: that a document is either generated in a completely automated fashion, or that it is completed authored by hand. That assumption does not match reality. The document production process often includes multiple steps. For instance, documents may be first generated using some kind of automated tool, or some conversion process from another format (for example, from a word-processing application such as Microsoft Word or OpenOffice, etc.); then they may be processed through intermediate stages to address layout, format, animations, etc.; then they may be validated, cleaned or repaired using various tools. Any of the tools in this production chain may add generator tags, and, once added, these tags are generally not stripped out by other tools.

At most of these stages, most types of authoring or processing tools allow tool-mediated adjusting of content. Some of these tools, such as Tidy (which adds a generator tag) allow hand-editing as well.

Steve Faulkner's change proposal provides data on a variety of tools which, as relevant evidence to establishing the range of production processes, allow tool-mediated editing of content. Some of the tools listed also allow hand-editing.

Implications of correcting the deficiencies in specification of the "generator" value

Following this new information through to its logical conclusion, the remaining arguments against removal of the "generator exception" should be consider void. Nevertheless, additional perspectives on these points are provided below for the record, given the multiple misunderstandings that they represent regarding accessibility-related user requirements and authoring practices.

The "generator exception" results in inequitable rendering of graphical content

In their April 2011 decision, the HTML Co-Chairs incorrectly asserted that arguments regarding structural integrity against the generator exception were circular and gave these no weight:

Another objection was that the generator exemption breaks the structure of the img element:

Requiring a set of programmatically determinable valid options helps ensure that images have complete structure. Complete structure of the <img> element requires both src and text alternatives.

This claim seems to be based on a circular argument. Omitting alt should not be allowed, because that would make the img element have incomplete structure, because img requires alt. Thus, the objection fails to make its case and was given no weight.

This argument is not circular. For web content to be independent of presentation, both the src attribute and the alt attribute are necessary for images.

  • Omit the src attribute, and sighted users have no content;
  • Omit text alternatives, and non-sighted users have no content.

For a sighted user, if there is no src element, then no content is rendered, and therefore it is a document error. For a blind user, if no content is rendered, then there is likewise a document error; without alt content, the img element is not representing anything to that user. It is inequitable for a document to represent something to a sighted user but not to a non-sighted user.

Without both a src attribute and a text alternative the img element is incomplete, as further discussed in Laura Carlson's change proposal on conformance checking.

The argument against the "generator exception," regarding structural integrity of the <img> element, should have been evaluated as a strong objection rather than to have been given no weight.

The "generator exception" inadvertently and retroactively introduces new, undocumented, magic semantics

The effect of the generator exception is that it inadvertently assigns additional new semantics to meta generator -- magic semantics that are not clearly documented in the spec and not obvious to implementors and users of the spec. Specifically, the generator exception has the effect of making meta generator provide the new meaning, "I do not want conformance checkers to emit error messages about missing alternative text for any img elements in this document." And it unilaterally and retroactively assigns that additional meaning to all existing documents that contain meta generator, not just to newly created documents.

If I am implementing, for example, an HTML editing application based on the spec, it is not clear to me from reading the spec that having my editing application add a meta generator element means that for every single document any author creates with my application, conformance checkers are never going to emit error messages about missing alternative text for img elements.

And as a document author, from reading the spec, it is not at all clear to me from reading the spec that if I keep a meta generator element that has been added by any tool in the production or evaluation process, anywhere in any document, it means that I am choosing to completely opt out of having conformance checkers emit any error messages about missing alternative text for any img elements in the document. Among other things, the result is a significantly reduced ability for me to identify potential errors which I otherwise would have been alerted to; I lose an important capability due to meta generator having surprise magic semantics that are not clearly inferable from the spec.

Moreover, assigning this new "do not emit error messages about missing alternative text for any img elements in this documents" meaning to meta generator results in retroactively changing the processing behavior of an entire conformance class for all existing documents that have ever been created on the Web which contain meta generator instances. That is, prior to HTML5, the meaning of meta generator in those documents was simply that it was a stamp to indicate which applications were used to create the document. But now, an additional meaning that the original creators of those documents never intended is unilaterally and retroactively being assigned to those documents, with one of the consequences being that the documents will now be handled by conformance checkers in a way that is very different from that way in which they were handled previously. In that sense, it "breaks" existing content.

The meta generator exception is therefore actively harmful and should not be part of the specification.

The weighting of objections against the "generator exception" is deficient

In the HTML Co-Chairs' April 2011 compound decision on Issue 31 and 80, they asserted that individual objections to the "generator" exception, on the whole, drew weaker objections than would removing the exception:

Overall, there were many claimed disadvantages that flow from the generator exception, ranging from weak to moderately weak. They were generally unsupported by details or concrete evidence. Even though the use case for omitting alt when the generator mechanism is used was disputed and only found to be a medium objection, it still outweighs these claimed disadvantages, as they were all found to be weak or moderately weak.

Thus, on the whole, the proposal to allow alt to be omitted when the generator mechanism is used was found to draw weaker objections, compared to the proposal to still require alt, even when the generator mechanism is used.

Within the "generator" portion of the HTML Co-Chairs' April 2011 compound decision on issues pertaining to alternative text, the Co-Chairs used six different levels of "weights" in evaluating the objections gathered in the survey: "strong," "moderately strong," "medium," "moderately weak," "weak," and "zero." These were presented without any definitions. Further complicating the rating scheme, these levels were applied bidirectionally -- on the one hand, to objections to the generator exception, and on the other hand, to objections to removing the generator exception, thus totally twelve possible rankings for any argument, all without definition. The primary criteria suggested by the HTML Co-Chairs to explain the low weighting of objections to the generator exception was repeated assertions of insufficient evidence; yet inaccurate assertions regarding authoring production processes on which the generator exception was originally based were apparently accepted without evidence. No point values were declared for the different levels used in the decision, with the exception of "zero," making it impossible to verify the arithmetic implied in the summary conclusion.

The multiple problems described above in the rating scheme make it difficult for readers to follow, let alone to contest, its application to objections to the generator exception, nor to accept the uniformly low weighting that was assigned to these objections. Therefore it is difficult to accept the conclusion that maintained the generator exception as an appropriate reflection of the arguments presented. Relevant arguments from the April 2011 decision are re-examined below, along with the weighting of those arguments, although these do not represent all the arguments of concern from that decision.

The "generator exception" breaks harmonization with other standards and guidelines

In their decision, the Co-Chairs suggested that the disharmonization with standards and guidelines introduced by an HTML5 "generator" exception is a failure of other standards and guidelines to update, and therefore they weighted this objection to the generator exception low:

Yet another objection was based on standards and guidelines:

The generator mechanism breaks standards and guidelines requiring text equivalents on an individual element basis.

Many specific standards and guidelines were listed. However, these guidelines were generally created before the generator mechanism exemption was invented, so it's not clear if the disagreement indicates a problem, or just failure to update. Thus, this was taken to be a weak objection.

This disagreement indicates a problem that cannot be solved as the HTML Co-Chairs seem to suggest by updating numerous other standards and guidelines, but that must rather be solved by removing the "generator" exception in HTML5 that has introduced this disharmonization.

The problem introduced by the "generator" exception is major with regard to standards harmonization, in that accessibility standards require alternative text for images, but the "generator" exception makes this requirement meaningless on the large number of web pages that have a "generator" tag inserted. In exempting a large percentage of pages on the web from the requirement to provide alternative text for images, people with visual disabilities are not provided equitable access to web content--equitable access that could have readily been provided by following the provisions of these standards and guidelines.

The date that the guidelines and standards were created--whether before, during or after creation of the generator exemption--has no bearing on the validity of the user requirement for alt. User needs for alternative text as captured in provisions of web standards and guidelines do not somehow become irrelevant because of a standard that does not follow the provisions of existing standards and guidelines. The existence of people with visual disabilities and the need to accommodate them on the web have not disappeared in the intervening years.

This assertion also presumes that the other standards would now agree with the generator exception, though the opposite is a far more likely conclusion; the other standards are quite aware of automated authoring tools and simply do not agree that alt is unnecessary and expendable when a tool is involved. There is no evidence that these standards have not considered the implications of CMS; in fact some of these standards have indeed been recently updated, and none have abandoned their reliance on alt.

To suggest that the problem of standards breakage (standards fragmentation) introduced by the generator exception is a failure of standards and guidelines to update is to suggest that the reality of users' needs is shaped by a technical standard rather than vice versa. It shows a lack of understanding of or disregard for the requirements of users with disabilities and the role that standards and guidelines serve in accommodating those requirements.

The argument against the <meta name="generator"> exception, regarding failure to harmonize with standards and guidelines, should have been evaluated as a strong rather than a weak objection.

The "generator exception" inappropriately gives authoring tool conformance considerations precedence over end-user requirements

In their decision, the Co-Chairs asserted multiple rationales for retaining the generator exception based on the inconvenience that might otherwise result for authoring tool conformance:

At least one Change Proposal argued that when a page is created by an automated content generation tool, and that tool indicates this using <meta name=generator>, it should be permitted to omit the alt attribute.

It was argued that there was a valid use case for the generator exemption, namely automated content generators which cannot produce alt themselves and for various reasons cannot or will not demand alt from the user. The following objection, though entered for role=presentation, directly argues one such use case:

Consider a GUI authoring tool used by end-users, not professional Web developers or content authors. Such tools generate <img> elements, but it is not always appropriate for such tools to pause and demand alt text from the user before continuing.

Whether or not authoring tools prompt for alternative text at any given stage of document production process is immaterial to question at hand. Even authoring tools that fail to prompt for missing alternative text nevertheless may permit the introduction of alternative text for images, or else can be used with other production tools that do provide that capability; so content authors are not prevented from creating appropriate alternative text. And even the Authoring Tool Accessibility Guidelines (ATAG), which address support for production of accessible content, do not recommend intrusive prompting. But regardless, whether or not an author was prompted for alt does not change the fact that the end-user requires it, and that the generator exception will interfere with determining whether of not the resulting document contains it.

Several objectors cited this use case, and further pointed out that if content generators are forced to generate nonconforming markup to satisfy this use case, they may instead enter bogus alt values, which would merely exacerbate the problem:

If an authoring tool or other generator does not have sufficient information to include either alternative text or a caption, there is nothing the tool can do. If we say that in those cases the authoring tool would be non-conforming if it didn't provide alternative text or a caption, then the tool will just provide bogus (placebo) alt="" attribute values, which just makes the problem non-machine-detectable instead. To address this, therefore, we should allow generator tools to include images without alternative text or captions if absolutely necessary.

Also:

I object to treating the absence of the alt attribute as a validation error when the generator mechanism is used, because if it is treated as an error in that case, generator developers are incented to generate bogus values in order to make their products emit markup that doesn't trigger errors. (There are always generator developers who want to make the output of their programs validate.)

The use case of GUI tools that do not prompt for alt seems well established.

While there may indeed may be authors who intransigently choose to enter bogus alternative text, knowingly violating the intended use of alternative text as an accessibility accommodation, this is not a reason to codify their bad practices by removing validation alerts for missing alternative text for content authors who would prefer to do the right thing and provide alternative text for images.

The lines of reasoning included here imply that considerations of authoring tool conformance should take precedence over end-user requirements for accessible web content. Given that alternative text is essential to understanding graphical web content for some web users, the proposed justification for the omission of alternative text based on conformance convenience for authoring tools, and validation convenience for content authors, is an inadequate counter-consideration to the needs of end-users for accessible content.

After examining a variety of arguments regarding the convenience of the generator exception for assessing conformance of authoring tools, without consideration of the experience of the end-users who require alt as an accessibility accommodation for graphical web content, the HTML Co-Chairs asserted that:

After considering all these arguments, it seems established that there is a valid use case for allowing the alt attribute to be omitted when the generator mechanism is specified. This use case makes for a moderately strong objection. However, the claim of negative consequences to disallowing this use case was somewhat weakened by the lack of concrete evidence that bogus values have been used in the past or would be used in the future. So overall, this makes for a medium objection.

Given multiple problems with the arguments above, the omission of alternative text in the presence of one or more generator tags has by no means been established as a valid case. These objections to the removal of the generator exception should have been no weight, when juxtaposed with the end-user requirements to have alternative text for images.

The "generator exception" obviates the intent of the Validator

In their decision, the Co-Chairs asserted that because the "generator exception" did not assert benefits relative to the Validator, such benefits were immaterial to the decision:

Another objection argues that the generator mechanism fails to have certain benefits:

The generator mechanism does not improve user experience or the chances of accessible content being produced. It does not help authors catch mistakes. It does not help educate developers.

No one disputed this argument; but conversely, no one argued that generator has these benefits or should be allowed because of such benefits. With no concrete argument as to why the generator exception ought to have these benefits, this was taken to be a weak objection.

The W3C Validator service provides these benefits without regard to whether the generator mechanism claims such benefits. As the W3C Validator documentation states, "Validating Web documents is an important step which can dramatically help improving and ensuring their quality...". It provides a teachable moment, to whit: "Validation helps teach good practices". Additional information is available at HTML5 Should Help Facilitate Accessibility Awareness and Education.

In the presence of the generator exception, the validator suppresses error identification, and is thereby stripped of its educative benefits. If content developers are not aware that a problem (missing alternative text) exists, they are not notified about it, nor do they have the opportunity to rectify specific instances of missing alternative text. They are therefore deprived of the opportunity to learn about the general issue, and deprived of the opportunity to improve their content in the future.

So while the survey comments specifically raised the question of generator benefits, by extension they also raise the important question of Validator benefits, and how not to inadvertently undermine them. The issue of validator benefit should have been evaluated as a strong objection to the generator exception, rather than a weak argument against the generator exception.

Sufficient evidence of harm to end-users is implicit in arguments supporting the generator exception

In their decision, the Co-Chairs mentioned survey comments that asserted harm from the omission of alt:

Some argued that omitting alt and using the generator mechanism had harmful consequences:

Hence, the generator mechanism should not have any bearing on the @alt requirements as the generator string/mechanism has no bearing on the attributes of the <img> or the context in which the img appears in. The negative effects of omission of an empty or non-empty @alt are in no way made up for by the presence of the generator mechanism.

This statement in itself lacks specifics...

Inexplicably this was taken as non-evident despite widespread understanding that alternative text is necessary to ensure accessibility of images on the web for people who cannot see. Additional information on this is available at Understanding Guideline 1.1 of WCAG 2.0 as needed. The negative effects of omission of an empty or non-empty alt are in no way made up for by the presence of the generator mechanism, because the generator mechanism does nothing to fulfill the function that alternative text would have. The generator mechanism does not provide a functional replacement for the information provided by alternative text; nor does it provide a replacement for the visual functions of sighted users. It simply provides an excuse for why the necessary but absent alternative text is not there.

The HTML Co-Chairs also asserted that lists of authoring tools inserting the generator tag had no evidentiary value, dismissing the notion that a significant amount of web content would use the generator exception:

A list was provided of example <meta name=generator> values, and from this a conclusion was drawn that a tremendous amount of Web content would make use of the generator exemption. However, it's not clear where this list came from. It is not present in the spec, and does not seem to align with the spec's definition of a content generator. In particular, it includes many text editors which do not seem to qualify as automated markup generators. Was this list derived from the output of actual authoring tools? Was it found by looking at real Web content? In the absence of information about where this list came from, it was taken to have no evidentiary value.

The generator exception does not differentiate between content generators and authoring tools that insert a generator string. Authors of all documents with a <meta name=generator> string would remain ignorant that their document had any missing text alternatives. The HTML Co-Chairs were mistaken to disqualify the original evidence. It was derived from searching and from examining tools that automatically insert a meta generator string and their resulting content. Since this initial evidence was provided, a copious amount of further evidence of widespread use has been provided.

Other generator studies seem to indicate further usage of <meta name=generator>.

The HTML Co-Chairs also dismissed objections that a document-level generator option would make it easy for authors to forget to provide alternative text for images:

...but there were some concrete arguments supporting the case for negative consequences:

The generator mechanism facilitates the creation of inaccessible content.

No evidence was provided that more inaccessible content would be created if the generator exemption is allowed than otherwise. So this was taken to be a weak objection.

And, in a related argument, they rated an objection to the generator exception as moderately weak on the basis that there should have been sufficient time already for expanded evidence of harm from missing alt -- though it is unclear which authoring tools and validators, if any have already built in a generator exception for alternative text, and no time to accumulate statistical evidence of this presumed expanded harm.

Another objection was based on the possibility of authoring mistakes:

The generator mechanism is actively harmful to accessibility. If the generator option is left at document level, it would be far too easy for authors to have the software automatically insert "generator" and then forget to provide any text alternatives for images.

If supported by concrete evidence, this would have been a strong objection. This seems like a plausible authoring mistake which would have negative consequences. But it was weakened by lack of any specific evidence that this problem has actually occurred in practice. This provision has been in HTML WG Editor's Drafts and Working Drafts since September 3, 2009:

http://dev.w3.org/cvsweb/html5/spec/Overview.html?rev=1.2915

This should be enough time to see at least anecdotal evidence of the claimed problem. Even though normally lack of supporting evidence would render an objection weak, in this case, there is a plausible-sounding argument even in the absence of evidence, so the objection based on authoring mistakes is overall taken as moderately weak.

It is incongruous to, on the one hand, argue that the generator mechanism is in sufficient use to merit an exception, while on the other hand dismissing objections to the generator exception by claiming that insufficient evidence was provided that the generator mechanism is in use.

It is likewise incongruous to claim that content authors would be bothered by validation failures in the event of missing alt, while on the other hand claiming insufficient evidence that the generator exception would undermine the creation of accessible content.

These objections regarding harm to end-users with disabilities should have been evaluated as strong, not weak, objections to the generator exception, as they speak to the core issue of unmet user requirements resulting from exempting pages containing "generator" from the requirement to provide alternative text.

Details

Change [1], at 4.2.5.1

Remove:

The value must not be used on hand-authored pages.

Change [2] at 4.8.1.1.13

Remove:

The document has a meta element with a name attribute whose value is an ASCII case-insensitive match for the string "generator". (This case does not represent a case where the document is conforming, only that the generator could not determine appropriate alternative text - validators are required to not show an error in this case to discourage markup generators from including bogus alternative text purely in an attempt to silence validators.)

Impact

Positive Effects

  • Authors can more safely assume that when they use a conformance checker it will check the conformance of the document, rather than some arbitrary subset of requirements.
  • The use of meta name=generator will not change in a way that is contrary to its current usage and effects.
  • Authors will be made aware that they have not provided a text alternative giving them the opportunity to fix their error and produce a conforming document.
  • Upholds the structural integrity of <img> element.
  • Enables automatic validators to programmatically detect occurrences of the presence or absence of text alternatives. Bug 9218.
  • Facilitates accessibility awareness and education.
  • Upholds the HTML section 3.2. "Priority of Constituencies" Design Principle.

Negative Effects

  • None

Conformance Class Changes

  • The meta generator exception is removed from section 4.8.1.1.13 Guidance for conformance checkers. (Refer to Change [2] in the Details section of this document.)

Risks

  1. There are no risks in doing the changes.
  2. Risks to not doing the changes include:
    1. unmet user requirements, in that individuals who are blind will not have equitable access to web content;
    2. HTML5 conformance evaluation for <meta name="generator"> will assert false conditions on authoring tool production processes;
    3. The HTML5 spec will be ambiguous with regard to the hand-authoring test for alternative text in the presence of the "generator" tag.