This document describes techniques to be used by tools (browser, proxy, converter) taking HTML as input (incomplete or otherwise inaccessible, IMG with no ALT, FRAME with no TITLE, etc) and trying to come up with textual alternatives for all sort of visuals lacking their native description.
See the WAI/ER/WG page for a recent discussion of implemention of these techniques in the Altifier
It has been discussed and completed based on comments received on the WAI ER IG list.
Daniel Dardailler Last updated: October 6, 1998
Visual information is taken here in the broad sense:
Visuals left aside from now: APPLET, TABLE, Structure (Hx/LI/Color)
For each case, the first line presents the markup situation, followed by an ordered list of candidate techniques for getting to the textual equivalent.
Note that this doesn't mandate any particular UI and in many cases it would be good to present the human author with several alternatives.
<A HREF=url><IMG SRC=url></A>
<MAP> <AREA HREF=url> ...
<A HREF=url> "click here" </A>
<INPUT SRC=url> no ALT
Sometimes, elements, like IMG or FRAME, carry a NAME or ID attributes which is a human understandable string, like "west", "letter", and a tool could come up with an algorithm (sort of the inverse algorithm used to validate passwd) that would decide it's OK to present it to a user or in which order to present things (e.g. "#id9808" or "ASD-987-000X" have low priority)
Some image file formats (like PNG, or GIF) can carry some human readable comments which are supposedly Metadata about the image, like a description. A tool could fetch that part, either by doing a byte-range request if the size of the comment section is known (hence the format) or getting the full image and extracting the comments on the tool side.
Because it makes it easier to retrieve them on their local system, people often create and save images with meaninfull names, like "mailbox.gif" or "Swimming Pool". A tool could extract the filename part of the URL indicated in the HREF (fax.gif in http://www.foo.com/People/fax.gif) and present that to the user.
A lot of images used as decoration have specific sizes, like 2x400 (a colored line), 1x1 (spacer), less than 15x15 (probably a bullet of some sort). A tool can try to recognize these (in the absence of descriptive text of course) and attach decorative ALT.
Some images have some easy repeated pattern to recognize: the same pixel everywhere (monocolor image), or N pixels one color, M pixels other color in horizontal or vertical direction.
Optical Character Recognition has made a lot of progress in the past few years. There are little devices you can buy in drugstore that looks like a pencil and can read (and translate) text on paper. Same things for Fax. I think the technology is available that could take an image and determine: first if there is some text in it (this information alone is important), in whatever direction, and second, what this text is. The same technique can be used for Client Side Image Map on the sub-images defined by the coord/share attributes.
Whenever an image used an anchor, or an imagemap, or an frame without label or an meaningless anchor, one just needs to follow the pointer to the document on the other side of the HREF to get Metadata about it and present that to the user so that s/he can decide where to go next. For HTML document, this Metadata should be the TITLE, either returned as an HTTP header, or as a byte-range on the HTML source (in the HEAD). For other formats, like image, pdf, Word document, the tool can determine Metadata in an adhoc fashion: size, type, name.
When a link has only "Click Here" as content, one usually looks around to find where is "Here", so a tool could do some minimal analysis of the context and the sentence to find out more, like "Click Here for *ThisAndThat*", or "To get the new *BLAH*, Click Here".
This one is far fetched but can be done. The goal is to find all the possible links in an server-side image map, and where they go. A tool can emulate a series of click by generating a loop of requests to the server with increasing the X and Y coordinates in the request, 10 by 10 for instance:
There are more heuristic one can apply to identify images with no proper description. For instance, if an image is small and is repeated on multiple consecutive lines, it probably is a bullet graphic. If there is a graphic in the first couple of lines that is centered, there is a good chance it is a masthead
If an image is repeated in the same relative location on multiple pages, it probably is a logo.