[contents]
Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document specifies the Website Accessibility Evaluation Methodology (WCAG-EM) for Web Content Accessibility Guidelines (WCAG) 2.0  . It is an internationally harmonized methodology for evaluating the conformance of websites to WCAG 2.0. The Methodology supports evaluation in different contexts, such as self-assessment and third-party evaluation of websites. It is applicable to all websites (including web applications and mobile applications) regardless of size and it is independent of any particular evaluation tools, browsers, and assistive technology. It answers a much heard demand for harmonisation of accessibility evaluations for complete websites. The Methodology guides individual evaluators or evaluation teams and supports the rendering of interchangeable results.
. It is an internationally harmonized methodology for evaluating the conformance of websites to WCAG 2.0. The Methodology supports evaluation in different contexts, such as self-assessment and third-party evaluation of websites. It is applicable to all websites (including web applications and mobile applications) regardless of size and it is independent of any particular evaluation tools, browsers, and assistive technology. It answers a much heard demand for harmonisation of accessibility evaluations for complete websites. The Methodology guides individual evaluators or evaluation teams and supports the rendering of interchangeable results.
This clause describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This 9 February 2012 Editors Draft of the Website Accessibility Evaluation Methodology for WCAG 2.0 is an initial outline for future refinement. This document is intended to be published and maintained as a W3C Working Group Note after review and refinement. This version provides a framework for the direction and focus.
The WCAG 2.0 Evaluation Methodology Task Force (Eval TF) invites discussion and feedback about this document by developers, evaluators, researchers, and other practitioners who have interest in web accessibility evaluation. In particular, Eval TF is looking for feedback on the completeness of this outline. We are currently not looking for details on the different sections of the document but would welcome any input on considerations we may have missed with regards to the evaluation process.
Please send comments on this website Accessibility Evaluation Methodology for WCAG 2.0 document to public-wai-evaltf@w3.org (publicly visible mailing list archive).
Publication as Editor Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The Website Accessibility Evaluation Methodology for WCAG2.0 (WCAG-EM) is an internationally harmonized methodology for evaluating the conformance of websites to WCAG 2.0. The Methodology supports evaluation in different contexts, such as self-assessment and third-party evaluation of websites. It is applicable to all websites (including web applications) regardless of size and it is independent of any particular evaluation tools, browsers, and assistive technology.
The Website Accessibility Evaluation Methodology for WCAG 2.0 (WCAG-EM) is a standardized way to evaluate the conformance of websites to WCAG 2.0. The Methodology defines manual and semi-automated methods for selecting representative samples of web pages from websites that include complete processes. It provides guidance for evaluating the selected web pages to WCAG 2.0, and defines methods for integration and aggregation of the evaluation results into structured reports and conformance claims.
The Methodology provides guidance on evaluation throughout the development process but it is specifically applicable to conformance evaluation of existing websites. It extends the existing WAI resources on evaluating websites for accessibility  . It includes complementary resources on Preliminary Review
. It includes complementary resources on Preliminary Review  of Websites for Accessibility and Involving Users
 of Websites for Accessibility and Involving Users  in Evaluating Web Accessibility. It also provides further advice on specific contexts
 in Evaluating Web Accessibility. It also provides further advice on specific contexts  , selecting tools
, selecting tools  and evaluation during other stages of the development process.
 and evaluation during other stages of the development process.
The primary target audience of the Methodology is anyone, including individuals and organizations, wanting to evaluate the conformance of existing websites to WCAG 2.0. This includes but is not limited to:
Other audiences who might benefit from the Methodology include:
Users of the Methodology are assumed to be knowledgeable of WCAG 2.0, accessible web design, assistive technology, and of how people with different disabilities use the Web.
The following documents are important reading for this document and are indispensable for its application. More references can be found in Appendix B.
For the purposes of this document, the following terms and definitions apply.
The Methodology can be used by individuals who want to evaluate the conformance of websites to WCAG 2.0. Users of this methodology are assumed to be knowledgeable of WCAG 2.0 accessible webdesign, assistive technology and of how people with different disabilities use the Web. Evaluation according to this methodology can be carried out by individuals but using review teams with combined expertise involving people with disabilities and older people in the evaluation process is strongly recommended.
The expertise necessary for evaluating a website for WCAG2.0 is more explicitely described in the W3C/WAI Evaluation Resource Suite. Evaluation activities require diverse kinds of expertise and perspectives. Individuals evaluating the accessibility of web content require training and experience across a broad range of disciplines. If you do not have all that expertise yet and you want to start evaluating a website, the W3C/WAI Evaluation Resource Suite contains a less formal preliminary review to get you started.
The W3C/WAI W3C/WAI Evaluation Resource Suite provides complementary guidance on using combined expertise to evaluate web accessibility. The Methodology can be used by individuals, but review teams can provide better coverage of the expertise required in the understanding of web technologies, evaluation tools, barriers that people with disabilities experience, assistive technologies and approaches that people with disabilities use, and accessibility guidelines and techniques.
The W3C/WAI evaluation resource suite provides complementary guidance on Involving Users in Evaluating Web Accessibility. Evaluating with users with disabilities and with older users can identify additional issues that are not easily discovered by expert evaluation alone. It can make the evaluation process more effective and more efficient, especially when users are involved throughout the development process.
Note: Although testing with users is not a requirement under this methodology, we strongly recommend that people with disabilities and older people are involved in the evaluation process.
Description: This clause describes the procedure to express the scope of an evaluation. This procedure will include options to express the scope of different kinds of websites including web and mobile applications regardless of size. The scope will also address issues like evaluation tools, browsers and assistive technology. The procedure also includes how to handle web resources where the state and content of the web page change based on user interaction with the page. The same URI could then be accessible in some states and not be accessible in other states based on the user or external events.
In this Evaluation Methodology, the scope of the evaluation is not limited to full websites only. While the WCAG 2.0 Recommendation focuses on webpages, the scope of an evaluation within this Evaluation Methodology can be any coherent collection of one or more related web pages that together provide common use or functionality. This includes static web pages, dynamically generated web pages, and web applications. The scope has to be clear and unambiguously provided as part of the conformance claim. This information "should be provided in a form that users can use, preferably machine-readable metadata" (from: http://www.w3.org/TR/WCAG/#conformance-claims).
Web pages are not limited to html. They can include PDF, Office documents, audio, video, photos and other technologies. In principle, all web pages that are part of the full website or fall withing complete processes can fall within the scope of an evaluation. It is important that the scope of an evaluation is properly set because the scope delimites the web pages and web applications to be included into the sample.
One of the purposes of this methodology is to support replicability of results. Therefore the scope of an evaluation must be expressed as a list of resources. This list can include a sample that is representing a larger number of pages within the scope. This should be clearly and unambiguously stated in the conformance claim.
It is possible to exclude particular sections of a website from the scope even though they are part of a complete process. Examples for possible exclusion are: user generated content, wiki’s, bulletin boards etc. @@@Note: these are all not excluded in WCAG 2.0.
The Evaluation can also focus only on specific technologies excluding all other technologies used. @@@Outcome from discussion, more discussion necessary.
All pages of the complete process are included into the scope unless a complete process is excluded from the scope. Complete processes can be excluded from the scope in specific circumstances that will be described in this subclause. In this methodology complete processes can include parts of a website into the scope of an evaluation. In case of a series of steps that need to be completed in order to accomplish an activity, all web pages in the process are part of the scope of the evaluation and should conform at the specified level or better.
The Methodology describes the roles of manual and semi-automated evaluation. In case of manual evaluation, an evaluator can manually evaluate all pages, but on a website with millions of web pages that may not be feasible or desirable. This clause describes how to select a representative sample of a website. This can include static web pages, dynamically generated web pages, and web applications. This clause describes the contents of a representative sample of a website for evaluation. Also the clause will look at barriers and their impact and at the relation between the sample size and the barrier probability when sampling more pages.
Editor note: This subclause describes how to select a sample. Primarily focusing on the contents of the sample, the techniques and other relevant items to be included. We have to check if we sufficiently include dynamic pages and the different states of a dynamic page like a web application.
For the sample we distinguish three sets of sampled resources to be included into the total evaluation sample of a website:
Editor note: This subclause is currently taken from [UWEM] for discussion.
Core Resource List (non random sample): The Core Resource List is a set of generic resources, which are likely to be present in most websites, and which are core to the use and accessibility evaluation of a website. The Core Resource List therefore represents a set of resources to be included in any expert accessibility evaluation of the site (if they are available). The Core Resource List cannot, in general, be automatically identified, but will probably requires human judgement to be selected. The Core Resource List should consist of as many of the following resources as are applicable (if available in the scope):
If the evaluation sample is created manually, it must contain the core resources. If the core resources contains fewer pages than the required sample size, additional resources from the following categories must be added.
Of course, any single resource may belong to more than one of the categories above: the requirement is simply that the Core Resources as a whole should, as far as possible, collectively address all the applicable sampling objectives. Any given resource should appear only once in the total evaluation sample.
Task Orientated Resources are the pages necessary for completing the complete processes on the website. They provide examples of real use cases for the website. This might include tasks such as to source certain information, place an order or participate in a discussion.
Editor note: This subclause is currently taken from [UWEM] for discussion.
To generate a representative and unbiased total evaluation sample, a uniform random selection from the set of all resources belonging to the website is required. [Addition to UWEM: This can be done in many ways. The method used to select a random sample should be described in the Evaluation Report. Random sampling better supports comparability of results, e.g. monitoring and synchronous or asynchronous comparisons. Note that it is very difficult to generate a fully random sample without the use of a tool.]
There are different ways to determine the list of all resources belonging to a website to select a sample from. In some cases the list of resources is known beforehand, e.g. because it is provided by a site owner commissioning the evaluation. If the list of resources is unknown, the website has to be explored prior to the evaluation. This will typically be done by a web crawler automatically exploring the website by following links. The crawl starts out from one or more "seed resources" , e.g. the home page.
If a website is very large, it may not be feasible to identify a complete list of resources belonging to a website. In this case the evaluator may choose to stop the crawling process when a sufficiently large number of resources has been identified.
The evaluation sample is chosen from the list of resources belonging to the website using uniform random sampling without replacement.
Note that both the crawling and sampling algorithm used, and any further restrictions limiting or biasing the result, including, but not limited to the set of restrictions below, should be explicitly disclosed in the Evaluation Report:
Editor note: This subclause describes the ideal size of the sample of a website. The size of a sample will be depending on many different factors including but not limited to the number of webpages per technology related to the overall size of the website. The following is still subject to many discussions inside the EvalTF and needs more work:
The sample size is dependant on the size and the complexity of the Website. The sample includes pages (if available) from the Core resource list, the Random resource list and the Task Oriented resource list. The Methodology requires that of every failure of a success criteria, the evaluation report contains a minimum of two examples of resources where these failures can be found. If pages are the same and do not show new errors, the evaluator can limit the evaluation of a page to the selection of particular elements or technologies that are targeted. (Todo: add example and describe how to conclude that a page is the same. Also we will explore the use of scenarios for the selection of the samples like for guest, regular user, taskbased etc.). In this way, more complex websites will in most cases have a larger sample.
Starting with a minimum sample size of X (to be determined) resources (if available) for the Core and the Task Oriented resource list, continue with the Random sample resource. The Random sample resource is chosen from the list of web pages belonging to the web site using uniform random sampling without replacement. The requirement of a minimum pool of pages for sampling and a minimum sample size ensures the representativeness of the sample. The sampling stops if the fixed maximum sample size has been reached or – for sites where the number of crawled pages is less than the maximum sample size – when no more pages are available. The error margin used to decide when to stop sampling must be part of the report.
Editor note:Needs more explanation for the statistical choice that has to be made about when to stop sampling resources for the Random Resources.
The evaluator may choose to stop the process of searching for more Random Resources when a sufficiently large number of resources has been identified. How does the evaluator know that it is sufficient. That can depend on the number of additional barriers when sampling new pages and on the statistical error level that is chosen.
A complete description of the information necessary for the report can be found in clause 7.
Editor note: This is the clause describing step by step how to do the evaluation. The evaluation is depending on many factors, like technologies used, technologies evaluated, Accessibility supported etc. Important factor in the evaluation is formed by the accessibility barriers that are encountered and their impact. This clause will show how they are covered and describe a possible margin for errors on a website in relation to the barrierimpact. Note: The Methodology will not provide WCAG 2.0 techniques but rather explain how to use them effectively for evaluation.
"The detailed evaluation of a full website has to be carried out by an accessibility expert. Many tests require human judgement and therefore have to be carried out manually. However, not the whole evaluation process has to be done by hand. The expert may choose to use tools to support sampling or perform fully automatable tests. To report the results the expert chooses between the text-based template report or the machine readable report in EARL format" [UWEM].
Editor note: This subclause describes how to relate manual and semi-automated evaluation. It describes the both evaluation types and their relation to accessibility conformance evaluation.
Editor note: This subclause is currently taken from [UWEM] for discussion.
Accessibility testing may be done via semi-automated, manual and user testing. The different types of evaluation methods have a number of strengths and weaknesses. Only semi-automated and manual evaluation are covered in this methodology. Semi-automated evaluation can only test for conformance to a subset of the success criteria. Only a limited subset can be identified reliably by using semi-automated tools. Therefore, coverage of semi-automated evaluation as an overall indicator of accessibility is low, however it can identify many barriers reliably. It may also be applied efficiently to test very large numbers of web resources within a website.
Some tools can also act as support systems for the manual evaluation process. The tools can provide reliable results for a subset of tests and can not only speed up the process by performing some tasks automatically, but also, by providing hints about barrier locations, indicate areas the evaluators should focus on.
User testing is able to identify barriers that are not caught by other testing means, and is also capable of estimating the accessibility for tested web pages. It is not included in this methodology.
The main advantages of semi-automated testing are:
Editor note: This subclause describes the role of technologies in the scope and sample. It also describes the dependancy and covering of accessibility supported technologies and the availability of accessible and conformant alternatives.
This subclause provides a step by step description of the evaluation of the website (sample). This does not include going into the guidelines, success criteria etc from WCAG 2.0 or related techniques. It could be possible to propose different ways to evaluate the guidelines: one by one, per theme, per technology, etc. The Methodology will not prescribe one of those ways as necessary.
Editor note: Change name to Stop Criteria and move it to Random Resources. This subclause describes how to recognize the accessibility barriers and ascertain the impact in relation to the sample of web pages. This subclause will provide guidance to setting a barrier level. Also the barrier recognition can help limit the size of a sample of a website. The Methodology could provide input for a barrier probability index. This index could provide a breakdown of the need to extend the sample under specific circumstances. We also want to include quantification of the frequency in the status of a barrier (from minutes 20111215).
@@@ We should include the possibility for website owners to fix incidental errors without a totally new evaluation being necessary. After the website has fixed the failed criteria, it isn't only the original sample of pages that is re-evaluated. Instead it's a combination of the original sample and randomly selected new pages within the scope of the conformance claim.
Editor note: We will not make changes to WCAG 2.0 sections on conformance. In this section we will link the conformance inside WCAG 2.0 to the extended possibilities and options inside the Methodology and to the fact that conformance is for full websites. This include provisions for conformity in relation to scope, error margin etc.
Editor note: We will provide a list of conformity requirements that relate to the scope and sample clauses.
Besides including a full description of the scope of the evaluation for the given conformance claim, the conformance claim should provide a “list of success criteria beyond the level of conformance claimed that have been met. @@@ Are we sure we want this?
Editor note: This subclause will describe the role of accessibility supported in the Methodology and the impact on the website and the relation to the barriers and complete processes. This wil also include what to do if you find technologies that are not accessibility supported. Also we will provide a way to describe the set of AT and User Agents available to the user and the impact on the conformance statement as far as not covered by WCAG 2.0.
Description: Partial conformance seems to be adequately described in and outside of WCAG 2.0 but mostly related to individual web pages. This subclause shows how to claim partial conformance in relation to clauses Scope, Sample and Evaluation for a full website if this should be necessary from the related subclauses.
Editor note: Mostly already inside WCAG 2.0. Additions related to scope, sample, accessibility supported, techniques etc.
Editor note: This section explains how to score both with manual and with machine evaluation of websites. Also this subclause shows how the results can be aggregated and made ready for reporting. The chosen path will be also depending on the way that the evaluation is done (one by one, thematic, etc.).
Description: This clause covers how to write a report from the evaluation that is both human readable and or one that is machine readable. It describes the minimal contents of the report to support optimal interchange of information and harmonization and comparability of data on an international scale. This would make it possible for organisations in different countries, using different evaluators to compare or benchmark the data. Templates will be included in the appendices.
Editor note: Explanation of the manual report in human understandable language. What should minimally be in the report. The template for the text based report is the to be found in the appendix.
Editor note: Explanation of the machine readable report following EARL. The template is then to be found in the appendix.
Desscription: This clause will describe known limitations of the Methodology. It will be filled during the further work on the Methodology. We will place here the limitations and underlying assumptions that are important for readers to understand the Methodology and possible situations where the Methodology provides risk i.e. for repeatability. The issues will be limited as possible by the Taskforce.
This publication has been developed with support from the European Commission funded WAI-ACT Project (IST 287725).
Some of the clauses in this document, in particular section 3, 4.1.1, 4.1.3, and 5.1, are derived from the Unified Web Evaluation Methodology [UWEM].
Editor note: Below list needs correct formatting.
Editor note: A complete example template for evaluation reporting in human understandable language.
Editor note: A complete example template for evaluation reporting using EARL.