W3C

Inaccessibility of Visually-Oriented Anti-Robot Tests

Problems and Alternatives

W3C Working Draft 5 November 2003

This version:
http://www.w3.org/TR/2003/WD-turingtest-20031105/
Latest version:
http://www.w3.org/TR/turingtest
Editor/Author:
Matt May, W3C

Abstract

A common method of limiting access to services made available over the Web is visual verification of a bitmapped image. This presents a major problem to users who are blind, have low vision, or have a learning disability such as dyslexia. This document examines a number of potential solutions that allow systems to test for human users while preserving access by users with disabilities.

Status of this document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a W3C Working Draft produced by the WAI Protocols and Formats Working Group. The Working Group intends to release this document as a W3C Note.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

The Protocols and Formats Working Group is part of the WAI Technical Activity.

Please send comments to the WAI XTech list. Messages to this list are archived publicly.

Table of contents


The problem

Web sites with resources that are attractive to aggregators (travel and event ticket sites, etc.) or other forms of automation (Web-based email and message boards) have taken measures to ensure that they can offer their service to individual users without having their content harvested or otherwise exploited by Web robots.

The most popular solution at present is the use of graphical representations of text in registration areas. The site attempts to verify that the user in question is in fact a human by requiring the user to read a distorted set of characters from a bitmapped image, then enter those characters into a form.

Researchers at Carnegie Mellon University have pioneered this method, which they have called CAPTCHA (Completely Automated Public Turing test to Tell Computers and Humans Apart) [CAPTCHA]. A Turing test [TURING], named after famed computer scientist Alan Turing, is any system of tests designed to differentiate a human from a computer. This type of visual verification comes at a huge price to users who are blind, visually impaired or dyslexic. Naturally, this image has no text equivalent accompanying it, as that would make it a giveaway to computerized systems. In many cases, these systems make it impossible for users with certain disabilities to create accounts or make purchases on these sites.

A false sense of security

It is important to note that, like seemingly every security system that has preceded it, this system can be defeated by those who benefit most from doing so. For example, spammers can pay a programmer to aggregate these images and feed them one by one to a human operator, who could easily verify hundreds of them each hour. The value of visual verification systems is low, and their usefulness will diminish rapidly once it is commonly exploited.

A hierarchy of needs

Sites implementing verification have very different needs, and they fall into a hierarchy. As the bar for authentication is raised, so is the risk that many users may be marginalized, and the damage that may cause.

Privilege

Most systems implement security in some form or another to preserve privileges for certain users. Authentication of a privileged user without a personal identification scheme that cannot be repudiated is the current mechanism for all but the most secure sites on the Web. We can open accounts on any number of email services, portals, newspapers, and message boards without providing any credentials of our own, such as a passport, driver's license or serial number. In these situations, the first priority may be to point users to the resources they may access; security itself may not take precedence until exploitable details such as credit card information is stored on a given site.

Humanity

Systems that offer attractive privileges are often exploited, particularly when users can do so anonymously. The ability to create several accounts to multiply a user's privileges is often the cause for these Turing tests to be put into place. It is understood to be fact that human users interacting with sites cannot consume resources as quickly as programs designed to acquire and use free privileges. These sites wish to provide credentials to humans while eliminating robot access to the same resources.

Identity

Beyond humanity is unique human identity. A person's identity (including such details as nationality, property, or even personal features) needs to be established authoritatively in order to guarantee everything from secure and legal financial transactions, to the security of medical and legal information, to fair elections. All of these are becoming increasingly available online, including Web-based voting, which is undergoing trials in Sweden, Switzerland, France, the United Kingdom, Estonia, and the United States.

It is important to determine solutions for verifying unique identity in users, while balancing the needs of all potential users of such a system. The cost of failure ranges from inconvenience in privilege-based models to the denial of basic human rights in some identity-based systems.

Possible solutions

There are many techniques available to users to discourage or eliminate fraudulent account creations or uses. Several of them may be as effective as the visual verification technique while being more accessible to people with disabilities. Others may be overlaid as an accommodation for the purposes of accessibility. Seven alternatives are listed below, with their individual pros and cons. Many are achievable today, while some hint at a near future that may render this need obsolete.

1: Logic puzzles

The goal of visual verification is to separate human from machine. One reasonable way to do this is to test for logic. Simple mathematical word puzzles, trivia, and the like may raise the bar for robots, at least to the point where using them is more attractive elsewhere.

Problems: Users with cognitive disabilities may still have trouble. Answers may need to be handled flexibly, if they require free-form text. A system would have to maintain a vast number of questions, or shift them around programmatically, in order to keep spiders from capturing them all. And this approach is also subject to defeat by human operators.

2: Sound output

To reframe the problem, text is easy to manipulate, which is good for assistive technologies, but just as good for robots. So, a logical means of trying to solve this problem is to offer another non-textual method of using the same content. Hotmail serves a sound file that can be listened to if the visual verification is not suitable for the user.

However, according to a CNet article [NEWSCOM], Hotmail's sound output, which is itself distorted to avoid the same programmatic abuse, was unintelligible to all four test subjects, all of whom had "good hearing". Users who are deaf-blind, don't have or use a sound card, or don't have required sound plugins are likewise left in the lurch. Worse, some implementations of this script are JavaScript-based, and designed in such a way that some blind users may not be able to access them.

3: Credit-card validation

A time-worn technique of testing for humans is requiring a credit card. More recently, a unique 3-digit ID has been attached to credit cards to further protect identity.

Unfortunately, this technique is full of problems. First, all those without credit or debit cards, which includes millions of adults in the United States alone, fail this test. Second, credit-card transactions cost money, and handling cards requires tight security. Third, many credit-card processing sites require matching billing addresses and names. And worst of all, these measures scare people away from registering with sites because of the severity and remaining perception of insecurity about online transactions.

4: Live operators

One way to supplement visual verification is through a link that allows users to state that they cannot read the image; site administrators then manually validate these users. Yahoo offers one such system, with a 24-hour turnaround on account requests.

Here, the problem arises that users with disabilities will not have access to new sites in a timely fashion, and have to depend on another human's help to complete a transaction. While one day may not seem like a lot in the grand scheme of things, imagine having to order tickets for a popular concert, and having the show sell out while waiting for that validation. Time-sensitivity is a factor to be taken into account, as is the cost to maintain the personnel. (Note also that some ticket sites have Web-only prices or allotments, so having a toll-free telephone number may not help.)

5: Limited-use accounts

Users of free accounts very rarely need full and immediate access to a site's resources. For example, users who are searching for concert tickets may need to conduct only three searches a day, and new email users may only need to send a canned notification of their new address to their friends, and a few other free-form messages. Sites may create policies that limit the frequency of interaction explicitly (that is, by disabling an account for the rest of the day) or implicitly (by slowing the response times incrementally). Creating limits for new users can be an effective means of making high-value sites unattractive targets to robots.

The drawbacks to this approach include having to take a trial-and-error approach to determine a useful technique. It requires site designers to look at statistics of normal and exceptional users, and determine whether a bright line exists between them.

6: Heuristic checks

Heuristics are discoveries in a process that seem to indicate a given result. It may be possible to detect the presence of a robotic user based on the volume of data the user requests, series of common pages visited, IP addresses, data entry methods, or other signature data that can be collected.

Again, this requires a good look at the data of a site. If pattern-matching algorithms can't find good heuristics, then this is not a good solution. Also, polymorphism, or the creation of changing footprints, is apt to result, if it hasn't already, in robots, just as polymorphic ("stealth") viruses appeared to get around virus checkers looking for known viral footprints.

7: Federated identity systems

Competing efforts by Microsoft and the Liberty Alliance are attempting to establish a "federated network identity" system, which can allow a user to create an account, set his or her preferences, payment data, etc., and have that data persist across all sites that use the same service. This sort of system, which is making inroads in both Web sites and Web Services, would allow a portable form of identification across the Web.

7.1: Single sign-on

Ironically enough, the Passport system itself is one of the very same services that currently utilizes visual verification techniques. These single sign-on services will have to be among the most accessible on the Web in order to offer these benefits to people with disabilities. Additionally, use of these services will need to be ubiquitous to truly solve the problems addressed here once and for all.

7.2: Public-key infrastructure solutions

A central authority could issue a set of certificates to individuals who wish to verify their identity. The certificate could be issued in such a way as to ensure something close to a one-person-one-vote system, by issuing these identifiers, for example, in person. The work and risk of creating fraudulent certificates would be so onerous as to repel all but the most severe schemes from circumventing them.

This is a large amount of work, which would need to be coordinated by one or a small number of certifying authorities in order to be done effectively. Sites would need to agree on this standard and implement it in their registration systems, either by itself or as an adjunct to existing systems.

A subset of this concept, in which only people with disabilities who are affected by other verification systems would register, raises a privacy concern in that the user would need to telegraph to every site that she has a disability. The stigma of users with disabilities having to register themselves to receive the same services should be avoided. With that said, there are a few instances in which users may want to inform sites of their disabilities or other needs: sites such as Bookshare [BOOKSHARE] require evidence of a visual disability in order to allow users to access printed materials which are often unavailable in audio or Braille form. An American copyright provision known as the Chafee Amendment [CHAFEE] allows copyrighted materials to be reproduced in forms that are only usable by blind and visually impaired users. A public-key infrastructure system would allow Bookshare's maintainers to ensure that the site and its users are in compliance with copyright law.

7.3: Biometrics

On the horizon, a more foolproof method of user verification is being formulated in the field of biometric technology. A host of tests, from fingerprint and retinal scanning to DNA matching, promise to check a person's identity authoritatively -- effectively limiting the ability of, say, spammers to create infinite email accounts. Microsoft has announced a new biometric system in its Longhorn operating system, complete with a new, secure connector to capture this data. Biometrics will very likely be used in conjunction with single sign-on services.

Again, the weakness here is based on infrastructure. It will take several years for biometric hardware to penetrate a market, and some political and social issues exist which may hold back the process. Biometric systems will also have to take into account the fact that not all people have the same physical features: for example, retinal scanning does not work for a user who was born without eyes.

Conclusion

Visual verification alone is known to create problems with users. It is imperative that site designers take the needs of users with disabilities into account, and it is likewise hoped that one or more of these potential solutions can make that process easier.


Acknowledgments

Thanks to the following contributors: Al Gilman, Charles McCathieNevile, David Pawson, David Poehlman, Janina Sajka, and Jason White.

References

[BOOKSHARE]
Bookshare.org home page. The site is online at http://www.bookshare.org
[CHAFEE]
17 USC 121, Limitations on exclusive rights: reproduction for blind or other people with disabilities (also known as the Chafee Amendment): This amendment is online at http://www.loc.gov/copyright/title17/92chap1.html
[NEWSCOM]
Spam-bot tests flunk the blind, Paul Festa. News.com, 2 July 2003. This article is online at http://news.com.com/2100-1032-1022814.html
[CAPTCHA]
The CAPTCHA Project, Carnegie Mellon University. The project is online at http://www.captcha.net
[TURING]
The Turing Test, The Alan Turing Internet Scrapbook, 2002. The document is online at http://www.turing.org.uk/turing/scrapbook/test.html