HTML 4.01 validity errors

This is a collection of notes collected as research for a proposal for WCAG 2.0 Guideline 4.1 Level 1 Success Criterion. It has not been reviewed or approved by the WCAG WG. For questions or comments please contact Wendy Chisholm.

Common HTML 4.01 validity errors and their effect on accessibility

HTML 4.01 error (validity or conformance) where the error is defined Accessibility issue caused by invalid content Examples How this is addressed in WCAG 2.0
improper nesting of elements "nesting" refers to well-formedness which is perceived as an XML concept. it is also a concept in SGML but it is called "overlapping" and it is not allowed, but tolerated. @@ ref from SGML standard incorrect rendering of content, "bad" info in the DOM Guideline 1.3 L1 SC1 Structures within the content can be programmatically determined.

If "programmatically determined" means that the assistive technology can determine the appropriate structure (DOM), then this would address.

html:img or html:area missing alt attribute DTD and spec @@copy/paste from dtd Unknown purpose of image, missing critical information, may be unable to interact with buttons
  • Guideline 1.1 L1

decorative image should have alt="" @@section in spec Repetitive or unnecessary information clutters reading Guideline 1.1 L1 SC4 Non-text content that is not functional, is not used to convey information, and does not create a specific sensory experience is implemented such that it can be ignored by assistive technology. [
alt attribute should not be placeholder text @@section in spec Information provided isn't useful, missing critical information, may be unable to interact with buttons
  • G1.1 L1 SC1 For all non-text content that is used to convey information, text alternatives identify the non-text content and convey the same information. For multimedia, provide a text-alternative that identifies the multimedia.
  • G1.1 L1 SC2 For functional non-text content, text alternatives serve the same purpose as the non-text content. If text alternatives can not serve the same purpose as the functional non-text content, text alternatives identify the purpose of the functional non-text content
  • G1.1 L1 SC3 For non-text content that is intended to create a specific sensory experience, text alternatives at least identify the non-text content with a descriptive label.

These all describe how to provide useful text alternatives.

undefined attribute DTD, spec. List of defined attributes. Depends on the purpose of the attribute. Unknown attributes are ignored, therefore information might be missing. More of a user agent support issue than assistive technology unless there is some reason the A.T. is not able to implement as quickly or soon after other user agents. Could also be an accessibility benefit. in the case of 'tabindex on div' this is not valid, but an accessibility benefit but only usable by the few who have user agents and ats that support. @@not addressed by wcag?? doesn't need to be if support issue? baseline?
Element not allowed by document type DTD, spec. List of defined elements. Depends on the purpose of the element. Unknown elements are usually ignored although their content may be displayed, therefore information might be missing or may be displayed when it should be hidden. More of a user agent support issue than assistive technology unless there is some reason the A.T. is not able to implement as quickly or soon after other user agents. Could also be an accessibility benefit. @@example of a new element from dhtml roadmap or only introducing attributes? @@not addressed by wcag?? doesn't need to be if support issue? baseline?
End tag for not opened element DTD (or SGML standard?) @@ example 5 in test file ignored? shouldn't effect DOM other than to create an orphan, no? or part of "programmatically determined? handled by the 1.3 criterion? same effect for everyone?
Required attribute not specified required attributes related to accessibility
  • alt on area
  • alt on img

DTD

Unknown purpose of image, missing critical information, may be unable to interact with buttons Guideline 1.1
Missing required end tag DTD @@ example 4 in test file part of "programmatically determined? handled by the 1.3 criterion? or same effects for everyone? if a is not closed, the anchor will be very long. a screen reader will read all of the text as anchor text.
Non–SGML character number ??not sure what this one means.
General entity not defined and no default entity ??not sure what this one means.
Attribute value must be literal DTD @@ @@create examples and test
End tag for element not finished DTD @@ @@create examples and test part of "programmatically determined? handled by the 1.3 criterion?
Attribute value not allowed DTD similar to "undefined attribute" no? similar to "undefined attribute" no?
ID defined more than once DTD? SGML standard? Sailesh's example of heading of table w/same id, then referenced by data cell. created sample file, couldn't replicate issue in HPR. part of "programmatically determined? handled by the 1.3 criterion?
literal is missing closing delimiter SGML standard? example 6 in test file HPR shows the code and does not create a link, firefox creates a link. a support issue or an accessibility issue? is there some reason HPR doesn't have better error handling on this or ??
character data is not allowed here SGML standard? or DTD? example 7 in test file HPR reads both the text in the p elements and those outside of it. Firefox also displays both. Not an accessibility error?
document type does not allow element X here; missing one of Y start-tag DTD example 8 in test file emphasis and list item are handled without being enclosed in the proper parent elements. td and tr are not. Not an accessibility error?
duplicate specification of attribute X DTD

misc. notes and references

HTML 4.01 section 19.1 Document Validation:

Beware that such validation, although useful and highly recommended, does not guarantee that a document fully conforms to the HTML 4 specification. This is because an SGML parser relies solely on the given SGML DTD which does not express all aspects of a valid HTML 4 document. Specifically, an SGML parser ensures that the syntax, the structure, the list of elements, and their attributes are valid. But for instance, it cannot catch errors such as setting the width attribute of an IMG element to an invalid value (i.e., "foo" or "12.5"). Although the specification restricts the value for this attribute to an "integer representing a length in pixels," the DTD only defines it to be CDATA, which actually allows any value. Only a specialized program could capture the complete specification of HTML 4.

Notes from "A Gentle Introduction to SGML"

From these rules, it may be inferred that we do not need to mark the ends of stanzas or lines explicitly. From rule 2 it follows that we do not need to mark the end of the title---it is implied by the start of the first stanza. Similarly, from rules 3 and 1 it follows that we need not mark the end of the poem: since poems cannot occur within poems but must occur within anthologies, the end of a poem is implied by the start of the next poem, or by the end of the anthology. Applying these simplifications, we could mark up the same poem as follows:...

from 4 SGML Structures. End elements do not need to be provided.

The second part of the declaration specifies what are called minimization rules for the element concerned. These rules determine whether or not start- and end-tags must be present in every occurrence of the element concerned. They take the form of a pair of characters, separated by white space, the first of which relates to the start-tag, and the second to the end-tag. In either case, either a hyphen or a letter O (for ``omissible'' or ``optional'') must be given; the hyphen indicating that the tag must be present, and the letter O that it may be omitted. Thus, in this example, every element except <line> must have a start-tag. Only the <poem> and <anthology> elements must have end-tags as well.

from 5 Defining SGML Document Structures: The DTD. Sometimes begin elements do not even need to be provided? Depends on the DTD.

Some HTML element types allow authors to omit end tags (e.g., the P and LI element types). A few element types also allow the start tags to be omitted; for example, HEAD and BODY. The HTML DTD indicates for each element type whether the start tag and end tag are required.

Some HTML element types have no content. For example, the line break element BR has no content; its only role is to terminate a line of text. Such empty elements never have end tags. The document type definition and the text of the specification indicate whether an element type is empty (has no content) or, if it can have content, what is considered legal content.

from HTML 4.01: 3.2 SGML constructs used in HTML. I do not believe this violates SGML. Since the previous 2 SGML examples show that elements can be defined such that the end tag is optional.

SGML provides a way for creating two hierarchies from the same data, however different dtds and "namespaces" need to be adhered to. example:

<(anthology)anthology>

<(p.anth)p.anth>

<(p.anth)page>

<!-- other titles and lines on this page here -->

<(anthology)poem><title>The SICK ROSE

<(anthology)stanza>

<line>O Rose thou art sick.

<line>The invisible worm,

</(p.anth)page>

<(p.anth)page>

<line>That flies in the night

<line>In the howling storm:

<(anthology)stanza>

<line>Has found out thy bed

<line>Of crimson joy:

<line>And his dark secret love

<line>Does thy life destroy.

</(anthology)poem>

<!-- rest of material on this page here -->

</(p.anth)page>

</(p.anth)p.anth)

</(anthology)anthology>

from 6 Complicating the Issue: More on Element Declarations. Does this violate the concept of well-formedness in XML? How does this relate to "overlapping?" More information at Chapter 10 Multiple Document Structures (SUBDOC, CONCUR and LINK) [from Web SGML and HTML 4.0 Explained]

"Web SGML and HTML 4.0 Explained" by Martin Bryan


$Date: 2005/10/18 15:24:05 $