Shadi Abou-Zahra - W3C, Web Accessibility Initiative
2004, Route des Lucioles BP93 - 06902 Sophia-Antipolis, France
shadi@w3.org - http://www.w3.org/People/shadi/
Evaluating Web sites for accessibility is a quality assurance process that becomes increasingly difficult to manage as the size and complexity of a Web site increases. There is a growing need to effectively manage and monitor the accessibility of Web sites throughout the development process. The Evaluation and Report Language (EARL) is a semantic Web vocabulary, and is a W3C royalty-free format for expressing test results. While EARL can be used to support generic Web quality assurance processes, it has been specifically developed to assist Web accessibility evaluation reviews. EARL facilitates the exchange of test results between development and quality assurance tools. While a lot of development has already gone into EARL and tools that support the EARL format, a lot more work and research remains to be addressed.
While the markup code of Web sites can generally be automatically validated against formal grammars, the accessibility features are mostly evaluated with manual input from human reviewers. For example, it is generally not possibly to automatically verify the validity of a textual description for an image(1). Depending on the size and complexity of a Web site as well as on the type and thoroughness of the evaluation review, a considerable amount of time and resources may be required to determine the level of accessibility. Also, significant amounts of evaluation test results can be generated from detailed evaluations of Web sites for accessibility.
The challenge for many Web site owners is to employ quality assurance processes for Web accessibility that will improve the efficiency of evaluation reviews as well as enhance the management of issues. This paper describes how the Evaluation and Report Language (EARL) [1], a common format for expressing test results, supports such quality assurance processes. This paper will highlight some of the key features of EARL that facilitate monitoring and managing the accessibility of Web sites.
There are many of types of Web accessibility evaluation reviews that range from less formal preliminary reviews to more comprehensive approaches that include technical and user testing of accessibility features. The W3C/WAI resource suite called "Evaluating Web Sites for Accessibility" [2] describes some of these approaches in more detail. This paper focuses on reviews that evaluate the conformance of Web sites to accessibility standards such as the Web Content Accessibility Guidelines (WCAG) 1.0 [3]. Other examples of Web accessibility standards(2) include the requirements defined by Section 508 in the USA, BITV in Germany, or JIS in Japan.
While the difference between some of the currently existing standards for Web accessibility may vary significantly, most share a common anatomy(3). Basically, it can be assumed that there is a target set of criteria to which a Web site is evaluated. The conformance to each criteria is determined by conducting a series of atomic tests. While some of these tests can be executed automatically by software, most of them usually need human reviewers to determine the result of such tests. The sequence in which the tests are executed usually depends on the results from previously executed tests. This introduces a complexity vs. transparency dilemma for reporting evaluation results: if each executed atomic test is recorded then huge reports will be generated but if they are excluded, it becomes less transparent why a criteria was not met.
Another aspect that affects the evaluation review is the sampling strategy. Usually it is not economically feasible to manually evaluate every page on a Web site. Web applications that generate content dynamically also add another dimension of size complexity. Therefore, most Web accessibility evaluation reviews employ some sort of sampling mechanism to reduce the number of tests that need to be manually executed. Basic sampling mechanisms rely on a selection of Web pages that cover different types of features, templates, production methods, or content styles. More sophisticated mechanisms consider additional parameters such as link paths and transactions, page traffic and relative importance, as well as other factors.
Despite the fact that most of the tests required for conformance evaluations need to be executed manually, tools can assist many tasks and hence significantly improve the efficiency of Web accessibility evaluation processes. For example, tools can guide reviewers through the testing procedure, highlight issues that may be more applicable to specific areas of the Web site, or provide functionality to help reviewers determine the result of specific tests. W3C/WAI maintains a list of Web accessibility evaluation tools [4] which can be used during evaluation and development processes.
For accessibility tests that can be executed automatically, automation is generally cost effective. Testing can often be repeated periodically on large numbers of pages. Even though the accuracy of Web accessibility evaluation tools may vary, the error rate per tool and per Web site can be usually assumed as constant and thus simple to adjust. Also, it is important to note that while some tests can not be executed automatically, sometimes their applicability on a given Web page or site can be automatically determined. This potentially reduces the amount of manual evaluations that need to be carried out by human reviewers and improves the efficiency of evaluations.
Web accessibility evaluation tools can support human reviewers to manually carry out accessibility tests in a number of ways. So called transformation tools modify the presentation of Web pages to help reviewers find potential barriers for people with disabilities. For example, transformation tools could display Web pages in low color contrast, with large font sizes, or simulate page elements as they would be presented by assistive technology(4). Other functions of evaluation tools can highlight areas of the Web pages or the underlying markup code (such as HTML or CSS, etc.) to help reviewers identify barriers of more technical nature on a Web site.
Some enterprise Web accessibility evaluation tools and custom solutions implement different formats for expressing evaluation test results. This enables these types of tools to integrate different modules (or other evaluation tools) for automatic and manual evaluations into a more complete testing framework(5). Such frameworks often offer other data monitoring and analysis capabilities. For example, visualizing the number of issues encountered on a Web site against time to monitor the performance over time. Also, some authoring tools (such as editors or content management systems) provide APIs(6) that can be used to integrate the output from evaluation tools.
This exchange of test results between different types of development tools generally contributes to more efficient development of accessible Web sites. However, only few tools provide support for the exchange of test results. The reason for this rather slow implementation of support is possibly the cost of developing a format for expressing test results, or for developing the API functionality to process such data. Also the large amount of different and mostly proprietary formats and APIs creates a challenge for developers of evaluation tools to support them effectively.
One possible approach to address this lack of support for exchanging results between different types of development tools is to provide a commonly accepted format for expressing evaluation test results. Such a standardized format would encourage evaluation tool developers to provide mechanisms for exporting test results in this format because it can be potentially processed by a large number of tools. Conversely, authoring tool developers would be encouraged to provide mechanisms for importing evaluation results in this standardized format as it allows the integration of potentially more variety of evaluation tools. Finally, third-party data analysis tools (for example to prioritize the repair of reported problems according to their cost of repair or impact on Web accessibility; or to generate customized reports for specific developers) can be developed based upon this commonly used format for expressing test results.
The Evaluation and Report Language (EARL) [1] is a simple vocabulary to record the following aspects of a quality assurance test:
While these questions apply to generic Web quality assurance testing (for example to determine conformance against usability or corporate design guidelines), many of the use cases and scenarios for developing EARL have originated from requirements of Web accessibility evaluation reviews. EARL can also be used beyond quality assurance processes on the Web, for example for generic software testing, but this remains outside the scope of this paper.
The vocabulary of EARL has been developed using the W3C "Resource Description Framework" (RDF) [5] which is a semantic Web format that allows the definition of custom vocabulary in a machine readable form. EARL is being developed under the W3C process(7) which encourages the participation of different stakeholder and ensures consensus among them. W3C specifications are also royalty-free.
Another compelling reason to implement the EARL vocabulary using RDF is to benefit from the existing and growing semantic Web community. For example, EARL reports can be queried using SPARQL(8). RDF also finds support in databases and APIs in different programming languages(9). Furthermore, RDF is flexible and enables languages such as EARL to be extended or refined for specific usages yet retain the overall structure and compatibility between tools. For example, developers could adopt different test criteria descriptions yet retain interoperability of the reports.
The following are brief descriptions of some of the use cases for a common format for expressing test results in the context of evaluating Web sites for accessibility.
At the time of writing this paper, the Evaluation and Report Language (EARL) [1] is a W3C Working Draft. However, it is fairly mature and expected to proceed to Last Call stage in due course. At the same time, building blocks on which EARL is dependent (such as Dublin Core or Friend Of A Friend vocabularies, OWL Web Ontology Language, or various query languages as described in the previous section) are widely deployed and implemented. So, while EARL itself is still in draft stage, it is gaining stability through the evolution of the related technologies.
This is also reflected by the availability of tools that support the EARL format. There is a wide range of EARL implementations, however most of these produce EARL output and only few tools process EARL reports. It is interesting to see the uptake of EARL in both reference prototypes and research projects(11), as well as in operational tools(12). The W3C/WAI Evaluation and Repair Tools Working Group (ERT WG) [6] develops EARL, and maintains a list of Resources related to EARL [7]. This list references more implementations and projects.
The main objective for the ERT WG [6] is to resolve the currently outstanding issues in the EARL working draft and publish the first version as a W3C Recommendation. One of the issues includes describing the occurrence of test results (for example accessibility violations) within a Web site. To promote the deployment of EARL, the working group is also developing an EARL 1.0 Guide to complement the EARL 1.0 Schema with more examples and guidance for implementing EARL. The EARL 1.0 specification is therefore becoming increasingly comprehensive yet modularized to fit the needs of different audiences.
At the same time, feedback from implementation experience will be essential to avoid unexpected issues. For example, current implementations show that methods need to be developed that reduce the amount of output generated by EARL reports. Also, there is a strong need to increase the uptake of EARL in authoring tools, especially in content management systems. While many of these implementations could be reference prototypes and research projects during the development of EARL, deployment in operational tools will be necessary when EARL become more mature.
There are also some research questions open, mainly for later versions of EARL. For example, how can EARL reports be more resilient towards changes made on an already evaluated Web site? One approach to address this issue could be by analyzing where the changes occurred on a page and infer which tests a given change may have affected. Another open research question is how to map or relate tests that have been developed by different vendors and that evaluate the same criteria to each other. This is related to the issue that many vendors do not want to disclose the proprietary tests, or the sequence in which they were carried out in order to determine a result.
Managing and monitoring the accessibility level of Web sites is a challenging quality assurance task. Several evaluation tool developers have already reacted to this demand and developed systems to support reviewers or whole review teams in carrying out specialized actions in a comprehensive quality assurance process.
However, a common approach using a standardized format for expressing test results would facilitate the integration of different type of tools. A common standard will promote the implementation into currently available tools as well as the development of new ones. It will simplify data analysis and will enable many different use cases.
The Evaluation and Report Language (EARL) is currently being developed by the W3C/WAI Evaluation and Repair Tools Working Group (ERT WG). It addresses several use cases for integrating Web accessibility evaluation reviews into the development processes. EARL is developed under the W3C Process and is the result of consensus among different stakeholders.
While EARL is basically a simple vocabulary to describe evaluation test results, it has the power and potential for different tasks. EARL uses the Semantic Web Resource Description Framework (RDF) to define its vocabulary. This allows it to be machine readable and to benefit from a rapidly growing community. It benefits from readily available support in databases, query languages, and programming language APIs.
At the time of writing this paper, EARL 1.0 is still a W3C Working Draft but relatively mature. There is some existing tools support but also the opportunity for much more research and development work.
Last changed on $Date: 2006/04/13 10:46:43 $ by $Author: shadi $