Managing and Monitoring Web Site Accessibility

Shadi Abou-Zahra - W3C, Web Accessibility Initiative
2004, Route des Lucioles BP93 - 06902 Sophia-Antipolis, France
shadi@w3.org - http://www.w3.org/People/shadi/

Abstract

Evaluating Web sites for accessibility is a quality assurance process that becomes increasingly difficult to manage as the size and complexity of a Web site increases. There is a growing need to effectively manage and monitor the accessibility of Web sites throughout the development process. The Evaluation and Report Language (EARL) is a semantic Web vocabulary, and is a W3C royalty-free format for expressing test results. While EARL can be used to support generic Web quality assurance processes, it has been specifically developed to assist Web accessibility evaluation reviews. EARL facilitates the exchange of test results between development and quality assurance tools. While a lot of development has already gone into EARL and tools that support the EARL format, a lot more work and research remains to be addressed.

Introduction

While the markup code of Web sites can generally be automatically validated against formal grammars, the accessibility features are mostly evaluated with manual input from human reviewers. For example, it is generally not possibly to automatically verify the validity of a textual description for an image⁽¹⁾. Depending on the size and complexity of a Web site as well as on the type and thoroughness of the evaluation review, a considerable amount of time and resources may be required to determine the level of accessibility. Also, significant amounts of evaluation test results can be generated from detailed evaluations of Web sites for accessibility.

The challenge for many Web site owners is to employ quality assurance processes for Web accessibility that will improve the efficiency of evaluation reviews as well as enhance the management of issues. This paper describes how the Evaluation and Report Language (EARL) [1], a common format for expressing test results, supports such quality assurance processes. This paper will highlight some of the key features of EARL that facilitate monitoring and managing the accessibility of Web sites.

Types of Web Accessibility Evaluation Reviews

There are many of types of Web accessibility evaluation reviews that range from less formal preliminary reviews to more comprehensive approaches that include technical and user testing of accessibility features. The W3C/WAI resource suite called "Evaluating Web Sites for Accessibility" [2] describes some of these approaches in more detail. This paper focuses on reviews that evaluate the conformance of Web sites to accessibility standards such as the Web Content Accessibility Guidelines (WCAG) 1.0 [3]. Other examples of Web accessibility standards⁽²⁾ include the requirements defined by Section 508 in the USA, BITV in Germany, or JIS in Japan.

While the difference between some of the currently existing standards for Web accessibility may vary significantly, most share a common anatomy⁽³⁾. Basically, it can be assumed that there is a target set of criteria to which a Web site is evaluated. The conformance to each criteria is determined by conducting a series of atomic tests. While some of these tests can be executed automatically by software, most of them usually need human reviewers to determine the result of such tests. The sequence in which the tests are executed usually depends on the results from previously executed tests. This introduces a complexity vs. transparency dilemma for reporting evaluation results: if each executed atomic test is recorded then huge reports will be generated but if they are excluded, it becomes less transparent why a criteria was not met.

Another aspect that affects the evaluation review is the sampling strategy. Usually it is not economically feasible to manually evaluate every page on a Web site. Web applications that generate content dynamically also add another dimension of size complexity. Therefore, most Web accessibility evaluation reviews employ some sort of sampling mechanism to reduce the number of tests that need to be manually executed. Basic sampling mechanisms rely on a selection of Web pages that cover different types of features, templates, production methods, or content styles. More sophisticated mechanisms consider additional parameters such as link paths and transactions, page traffic and relative importance, as well as other factors.

Role of Evaluation Tools during Review Processes

Despite the fact that most of the tests required for conformance evaluations need to be executed manually, tools can assist many tasks and hence significantly improve the efficiency of Web accessibility evaluation processes. For example, tools can guide reviewers through the testing procedure, highlight issues that may be more applicable to specific areas of the Web site, or provide functionality to help reviewers determine the result of specific tests. W3C/WAI maintains a list of Web accessibility evaluation tools [4] which can be used during evaluation and development processes.

Tool Support for Automatic Testing

For accessibility tests that can be executed automatically, automation is generally cost effective. Testing can often be repeated periodically on large numbers of pages. Even though the accuracy of Web accessibility evaluation tools may vary, the error rate per tool and per Web site can be usually assumed as constant and thus simple to adjust. Also, it is important to note that while some tests can not be executed automatically, sometimes their applicability on a given Web page or site can be automatically determined. This potentially reduces the amount of manual evaluations that need to be carried out by human reviewers and improves the efficiency of evaluations.

Tool Support for Manual Testing

Web accessibility evaluation tools can support human reviewers to manually carry out accessibility tests in a number of ways. So called transformation tools modify the presentation of Web pages to help reviewers find potential barriers for people with disabilities. For example, transformation tools could display Web pages in low color contrast, with large font sizes, or simulate page elements as they would be presented by assistive technology⁽⁴⁾. Other functions of evaluation tools can highlight areas of the Web pages or the underlying markup code (such as HTML or CSS, etc.) to help reviewers identify barriers of more technical nature on a Web site.

Need for a Common Format for Evaluation Results

Some enterprise Web accessibility evaluation tools and custom solutions implement different formats for expressing evaluation test results. This enables these types of tools to integrate different modules (or other evaluation tools) for automatic and manual evaluations into a more complete testing framework⁽⁵⁾. Such frameworks often offer other data monitoring and analysis capabilities. For example, visualizing the number of issues encountered on a Web site against time to monitor the performance over time. Also, some authoring tools (such as editors or content management systems) provide APIs⁽⁶⁾ that can be used to integrate the output from evaluation tools.

This exchange of test results between different types of development tools generally contributes to more efficient development of accessible Web sites. However, only few tools provide support for the exchange of test results. The reason for this rather slow implementation of support is possibly the cost of developing a format for expressing test results, or for developing the API functionality to process such data. Also the large amount of different and mostly proprietary formats and APIs creates a challenge for developers of evaluation tools to support them effectively.

One possible approach to address this lack of support for exchanging results between different types of development tools is to provide a commonly accepted format for expressing evaluation test results. Such a standardized format would encourage evaluation tool developers to provide mechanisms for exporting test results in this format because it can be potentially processed by a large number of tools. Conversely, authoring tool developers would be encouraged to provide mechanisms for importing evaluation results in this standardized format as it allows the integration of potentially more variety of evaluation tools. Finally, third-party data analysis tools (for example to prioritize the repair of reported problems according to their cost of repair or impact on Web accessibility; or to generate customized reports for specific developers) can be developed based upon this commonly used format for expressing test results.

Features of the Evaluation and Report Language

The Evaluation and Report Language (EARL) [1] is a simple vocabulary to record the following aspects of a quality assurance test:

Which content was tested?
Which criteria was it tested for?
Who/what carried out the test?
What is the outcome of the test?

While these questions apply to generic Web quality assurance testing (for example to determine conformance against usability or corporate design guidelines), many of the use cases and scenarios for developing EARL have originated from requirements of Web accessibility evaluation reviews. EARL can also be used beyond quality assurance processes on the Web, for example for generic software testing, but this remains outside the scope of this paper.

The vocabulary of EARL has been developed using the W3C "Resource Description Framework" (RDF) [5] which is a semantic Web format that allows the definition of custom vocabulary in a machine readable form. EARL is being developed under the W3C process⁽⁷⁾ which encourages the participation of different stakeholder and ensures consensus among them. W3C specifications are also royalty-free.

Another compelling reason to implement the EARL vocabulary using RDF is to benefit from the existing and growing semantic Web community. For example, EARL reports can be queried using SPARQL⁽⁸⁾. RDF also finds support in databases and APIs in different programming languages⁽⁹⁾. Furthermore, RDF is flexible and enables languages such as EARL to be extended or refined for specific usages yet retain the overall structure and compatibility between tools. For example, developers could adopt different test criteria descriptions yet retain interoperability of the reports.

Use Cases

The following are brief descriptions of some of the use cases for a common format for expressing test results in the context of evaluating Web sites for accessibility.

Combining Results: A standardized format for expressing test results allows the combination reports from different Web accessibility evaluation tools. This enables different reviewers to carry out separate evaluation tasks using different tools, or for reviewers to employ different tools at different stages evaluation process and combine the results.
Comparing Test Results: Calibrated results, such as results from pre-defined test suites could be recorded in a standardized format to facilitate the benchmarking of Web accessibility evaluation tools. This allows tool users to compare the output from different tools, and for tool developers to better test and improve their tool implementation.
Processing Results: Quality assurance tools can rely on a standardized format to analyze, sort, prioritize, or infer conformance claims. Such processing tools could also provide their output in the same initial format to allow cascades of processing tools. For example, one tool could specialize in collecting results, while another one prioritizes them.
Generating Reports: Tools that specialize in reporting could provide customized views on the results so as to provide specific developers with information according to their preferences. For example, Web programmers may want to receive more verbose bug reports with line numbers and error messages, while project managers and executive may want to receive higher level management reports.
Integration into Authoring Tools: As already described, a standardized format for expressing test results provides a mechanism for the integration of Web accessibility evaluation tools into authoring tools. It also allows evaluation tool developers to focus on testing techniques and algorithms and relying on authoring tools to provide the interface to the developers.
Integration into Web Browsers: Web accessibility evaluation results could be used by Web browsers to improve the experience of the end-users. For example, Web browsers could use the results from Web accessibility evaluation tools to detect and linearize complex tables or to suppress moving content according to user preferences.
Integration into Search Engines: Similar to Web browsers, search engines could enhance the experience of their users by considering user preferences. For example, a search engine may be configured to only return Web pages that match the request and that provide certain accessibility features. For example, Web pages that claim to be operable by keyboard⁽¹⁰⁾.
Justifying Conformance Claims: To provide more credibility and transparency, Web site owners may chose to publish some of the test results that led to the conformance claims. For example, to supplement a quality mark label with a consistent report of what has been tested and when, could help the user set expectations about the accessibility level of a Web site.

Current Status and Future Development

At the time of writing this paper, the Evaluation and Report Language (EARL) [1] is a W3C Working Draft. However, it is fairly mature and expected to proceed to Last Call stage in due course. At the same time, building blocks on which EARL is dependent (such as Dublin Core or Friend Of A Friend vocabularies, OWL Web Ontology Language, or various query languages as described in the previous section) are widely deployed and implemented. So, while EARL itself is still in draft stage, it is gaining stability through the evolution of the related technologies.

This is also reflected by the availability of tools that support the EARL format. There is a wide range of EARL implementations, however most of these produce EARL output and only few tools process EARL reports. It is interesting to see the uptake of EARL in both reference prototypes and research projects⁽¹¹⁾, as well as in operational tools⁽¹²⁾. The W3C/WAI Evaluation and Repair Tools Working Group (ERT WG) [6] develops EARL, and maintains a list of Resources related to EARL [7]. This list references more implementations and projects.

The main objective for the ERT WG [6] is to resolve the currently outstanding issues in the EARL working draft and publish the first version as a W3C Recommendation. One of the issues includes describing the occurrence of test results (for example accessibility violations) within a Web site. To promote the deployment of EARL, the working group is also developing an EARL 1.0 Guide to complement the EARL 1.0 Schema with more examples and guidance for implementing EARL. The EARL 1.0 specification is therefore becoming increasingly comprehensive yet modularized to fit the needs of different audiences.

At the same time, feedback from implementation experience will be essential to avoid unexpected issues. For example, current implementations show that methods need to be developed that reduce the amount of output generated by EARL reports. Also, there is a strong need to increase the uptake of EARL in authoring tools, especially in content management systems. While many of these implementations could be reference prototypes and research projects during the development of EARL, deployment in operational tools will be necessary when EARL become more mature.

There are also some research questions open, mainly for later versions of EARL. For example, how can EARL reports be more resilient towards changes made on an already evaluated Web site? One approach to address this issue could be by analyzing where the changes occurred on a page and infer which tests a given change may have affected. Another open research question is how to map or relate tests that have been developed by different vendors and that evaluate the same criteria to each other. This is related to the issue that many vendors do not want to disclose the proprietary tests, or the sequence in which they were carried out in order to determine a result.

Conclusion

Managing and monitoring the accessibility level of Web sites is a challenging quality assurance task. Several evaluation tool developers have already reacted to this demand and developed systems to support reviewers or whole review teams in carrying out specialized actions in a comprehensive quality assurance process.

However, a common approach using a standardized format for expressing test results would facilitate the integration of different type of tools. A common standard will promote the implementation into currently available tools as well as the development of new ones. It will simplify data analysis and will enable many different use cases.

The Evaluation and Report Language (EARL) is currently being developed by the W3C/WAI Evaluation and Repair Tools Working Group (ERT WG). It addresses several use cases for integrating Web accessibility evaluation reviews into the development processes. EARL is developed under the W3C Process and is the result of consensus among different stakeholders.

While EARL is basically a simple vocabulary to describe evaluation test results, it has the power and potential for different tasks. EARL uses the Semantic Web Resource Description Framework (RDF) to define its vocabulary. This allows it to be machine readable and to benefit from a rapidly growing community. It benefits from readily available support in databases, query languages, and programming language APIs.

At the time of writing this paper, EARL 1.0 is still a W3C Working Draft but relatively mature. There is some existing tools support but also the opportunity for much more research and development work.

Foot Notes

With the exception of images of text that can sometimes be automatically processed using character recognition. However, this approach has many limitations and often does not work.
W3C/WAI maintains a list of international policies relating to Web accessibility [8]
Possibly because many of the existing Web accessibility standards are derived (directly or indirectly) from the W3C Web Content Accessibility Guidelines 1.0 [3].
Assistive technology are computer programs or devices that assist users with disabilities accessing the Web. These include screen readers, screen magnifiers, or voice command tools.
Examples include HiSoftware AccVerify, Watchfire WebXM, or WebThing AccessValet.
Examples include Macromedia Dreamweaver, Microsoft Front Page, or RedDot XCMS.
The W3C Process Document is available at http://www.w3.org/Consortium/Process/
SPARQL is an SQL-like query language for RDF. Other examples include RDQL or Squish.
For example Jena for Java, RAP for PHP, or Raptor for Python (and other languages).
Note that this raises issues related to trust and content labeling. For example, one thought is that third party services may provide information about Web sites rather than Web site owners. However, this discussion is beyond the scope of this paper.
Examples include EARL filter, EARL client and database, or WAINU.
Examples include HiSoftware AccVerify, Sidar Hera, or WebThing Site Valet.

References

Last changed on $Date: 2006/04/13 10:46:43 $ by $Author: shadi $