Textual Equivalents

This document describes techniques to be used by tools (browser, proxy, converter) taking HTML as input (incomplete or otherwise inaccessible, IMG with no ALT, FRAME with no TITLE, etc) and trying to come up with textual alternatives for all sort of visuals lacking their native description.

See the WAI/ER/WG page for a recent discussion of implemention of these techniques in the Altifier

It has been discussed and completed based on comments received on the WAI ER IG list.

Daniel Dardailler
Last updated: October 6, 1998

Visual information is taken here in the broad sense:

Visuals left aside from now: APPLET, TABLE, Structure (Hx/LI/Color)

Multiple cases to consider

For each case, the first line presents the markup situation, followed by an ordered list of candidate techniques for getting to the textual equivalent.

Note that this doesn't mandate any particular UI and in many cases it would be good to present the human author with several alternatives.

  1. simple image

    <IMG SRC=url>

  2. image used as link

    <A HREF=url><IMG SRC=url></A>

  3. client side imagemap

    <MAP> <AREA HREF=url> ...

  4. frameset

    <FRAME SRC=url>

  5. link text

    <A HREF=url> "click here" </A>

  6. input field image

    <INPUT SRC=url> no ALT

  7. server side imagemap

    <IMG ismap>
    <INPUT type=image>

    Details on techniques

    Element Name

    Sometimes, elements, like IMG or FRAME, carry a NAME or ID attributes which is a human understandable string, like "west", "letter", and a tool could come up with an algorithm (sort of the inverse algorithm used to validate passwd) that would decide it's OK to present it to a user or in which order to present things (e.g. "#id9808" or "ASD-987-000X" have low priority)

    Image Comments

    Some image file formats (like PNG, or GIF) can carry some human readable comments which are supposedly Metadata about the image, like a description. A tool could fetch that part, either by doing a byte-range request if the size of the comment section is known (hence the format) or getting the full image and extracting the comments on the tool side.


    Because it makes it easier to retrieve them on their local system, people often create and save images with meaninfull names, like "mailbox.gif" or "Swimming Pool". A tool could extract the filename part of the URL indicated in the HREF (fax.gif in http://www.foo.com/People/fax.gif) and present that to the user.

    Image Size

    A lot of images used as decoration have specific sizes, like 2x400 (a colored line), 1x1 (spacer), less than 15x15 (probably a bullet of some sort). A tool can try to recognize these (in the absence of descriptive text of course) and attach decorative ALT.

    Image Pattern

    Some images have some easy repeated pattern to recognize: the same pixel everywhere (monocolor image), or N pixels one color, M pixels other color in horizontal or vertical direction.

    Image OCR

    Optical Character Recognition has made a lot of progress in the past few years. There are little devices you can buy in drugstore that looks like a pencil and can read (and translate) text on paper. Same things for Fax. I think the technology is available that could take an image and determine: first if there is some text in it (this information alone is important), in whatever direction, and second, what this text is. The same technique can be used for Client Side Image Map on the sub-images defined by the coord/share attributes.

    Target Document Title

    Whenever an image used an anchor, or an imagemap, or an frame without label or an meaningless anchor, one just needs to follow the pointer to the document on the other side of the HREF to get Metadata about it and present that to the user so that s/he can decide where to go next. For HTML document, this Metadata should be the TITLE, either returned as an HTTP header, or as a byte-range on the HTML source (in the HEAD). For other formats, like image, pdf, Word document, the tool can determine Metadata in an adhoc fashion: size, type, name.

    Context Sentence

    When a link has only "Click Here" as content, one usually looks around to find where is "Here", so a tool could do some minimal analysis of the context and the sentence to find out more, like "Click Here for *ThisAndThat*", or "To get the new *BLAH*, Click Here".

    Server Side Image Map Loop

    This one is far fetched but can be done. The goal is to find all the possible links in an server-side image map, and where they go. A tool can emulate a series of click by generating a loop of requests to the server with increasing the X and Y coordinates in the request, 10 by 10 for instance:

    What you get back is a list of documents, that the tool can then sort for uniqueness (might not be easy) and present to the user as a simple list.
    Image repetition or position

    There are more heuristic one can apply to identify images with no proper description. For instance, if an image is small and is repeated on multiple consecutive lines, it probably is a bullet graphic. If there is a graphic in the first couple of lines that is centered, there is a good chance it is a masthead

    If an image is repeated in the same relative location on multiple pages, it probably is a logo.

    URL repetition
    Another technique for image map AREA tags that don't have ALT text (or other url with no description of the target) would be to search for a text link with the same URL, and use that text link as ALT text for the AREA tag (optionally, the tool would then delete the text link at the bottom since it is then redundant).
    Hash code repetition
    Keep a record of all distinct image signatures processed by the tool. Collect text equivalents from other pages that use the same image and provide text. This should handle bullets and popular clip art.