In the discussion about alt text for images, there has been little, if any, data presented about the way that alt text is actually used in the wild. Data that has been presented has all been automatically collected and does not provide information about whether the alt text makes a suitable replacement for the image.
Questions To Address
- What fraction of images (or pages) use appropriate alt text
- How does this fraction vary with characteristics of the image:
- The function of the image (e.g. icon vs photo)
- Whether the image is decorative or not
- How does this fraction vary with the characteristics of the page:
- The validity of the page
- The claimed validity of the page (whether it has a link to a validation service)
Other questions that it may be possible to address as a side effect
- How people with different areas of expertise assess the appropriateness of the same alt text
The need to assess a large number of documents suggests that a distributing the analysis will be necessary to get sufficient results to draw meaningful conclusions. This may itself introduce some problems but without dedicated resources for the testing it would be hard to do in more controlled conditions.
In outline, the proposed approach is:
- Provide the tester with a (web)application with a 2 pane display. In one pane display a website chosen from a central list.
- In the second display, one at a time, each unique image, alt text pair on the website (in principle each image should be displayed regardless of whether it has been seen before since it might be in a different context. However it is necessary to deal with the possibility of a page using hundreds of spacer.gir images. Another option is to limit the number of times that an image can be repeated, or just to take a random subset of the images on a page).
- For each image displayed in the sidebar, hide the image in the main site and replace it with its alt text. Highlight the alt text somehow so that the tester could see the alt text in the context of the surrounding content (maybe we should turn CSS in the main pane to make highlighting easier? maybe images should always be off in the main pane? This would be a problem in cases where some but not all of the images had appropriate alt text since the ones without appropriate alt text might make it difficult to understand the ones with appropriate alt text)
- Get the tester to fill the answers in some questions about the image and its alt text:
- What function does the image have (icon / banner / photograph/ etc.)
- Is the image purely decorative (decorative / non decorative)
- How well does the alt text replace the image (not at all / mediocre / good)
- Collect some automated data about each image
- Collect some automated data about the page
- Validity according to external conformance checker
- Presence of any validate-this-page type links or well known conformance badges
- Performance in an automated accessibility checker?
Each tester will need to be uniquely identified in order to allow an assessment of their input (see the biases section below). In addition, by collecting a small amount of information about the background of each tester, for example their main occupation (web developer / accessibility expert / etc.) and their familiarity with different classes of UA (text-only browser / aural browser / etc.) we will be able to detect systematic effects from people's different biases about what makes good alt text (although we should nevertheless try to provide clear instructions -- see the biases section below).
Client could be a web app or not.
- Easy to get going with little effort -- makes attracting testers easier
- Harder to control things like whether images are displayed and/or whether CSS is on or not
- Harder to do things like cross domain requests -- needs more server side code
Desktop app or FF extension has opposite problems. Leaning toward a simple FF extension but discouraging participation is a big worry.
In either case need something on the server side. Should be a simple database backed site - no problem in most frameworks. Just needs to accept e.g. JSON via post and be able to give out URLs. Handling authentication and user accounts is more difficult, suggests that using something where that problem is already solved rather than trying to roll something new might be a better idea.
Possible Sources of Bias
Choice of Sites
This is a hard problem but we have to suck it up.
- Not possible to get a totally random sample of the web (and it's not clear that that's what we want anyway)
- Ideally want a large sample probably biased by page rank (since people are more likely to visit high page rank sites, although there is a long tail)
- Language is a problem. Start with English only and then, if successful localize.
- Will inevitable have some bias but this is not necessarily a problem as long as it is not too large and well understood
- Breath of sites more important than depth in particular sites - but we don't want to be biased entirely toward front pages
Possible strategy (needs refinement)
- Start with searches for common terms or individual letters (bias toward popular, high page rank sites)
- Select a random selection of the sites returned in the first n (e.g. 100) results
- For each site selected select up to 2 internal (same domain) and 2 external links
- When following the subsequent internal links, select only one internal link and on that page select only external links
- These numbers could be optimized for looking at data for the depth of sites - may want to go deeper but only select e.g. 1 front page link
- people could also opt-in websites by activating the plugin/app...
Different Expectations from Different Testers
Have a group of test pages that each tester is sent to at random during their first few sites. Check that their answers broadly match the expected answers for those sites.