[contents]
Copyright © 2012 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document specifies the Website Accessibility Evaluation Methodology (WCAG-EM) for Web Content Accessibility Guidelines (WCAG) 2.0 . It is an internationally harmonized methodology for evaluating the conformance of websites to WCAG 2.0. The Methodology supports evaluation in different contexts, such as self-assessment and third-party evaluation of websites. It is applicable to all websites (including web applications and mobile applications) regardless of size and it is independent of any particular evaluation tools, browsers, and assistive technology. It answers a much heard demand for harmonisation of accessibility evaluations for complete websites. The Methodology guides individual evaluators or evaluation teams and supports the rendering of interchangeable results.
This clause describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This 22 February 2012 Editors Draft of the Website Accessibility Evaluation Methodology for WCAG 2.0 is an initial outline for future refinement. This document is intended to be published and maintained as a W3C Working Group Note after review and refinement. This version provides a framework for the direction and focus.
The WCAG 2.0 Evaluation Methodology Task Force (Eval TF) invites discussion and feedback about this document by developers, evaluators, researchers, and other practitioners who have interest in web accessibility evaluation. In particular, Eval TF is looking for feedback on the completeness of this outline. We are currently not looking for details on the different sections of the document but would welcome any input on considerations we may have missed with regards to the evaluation process.
Please send comments on this Website Accessibility Evaluation Methodology for WCAG 2.0 document to public-wai-evaltf@w3.org (publicly visible mailing list archive).
Publication as Editor Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The Website Accessibility Evaluation Methodology for WCAG2.0 (WCAG-EM) is an internationally harmonized methodology for evaluating the conformance of websites to WCAG 2.0. The Methodology supports evaluation in different contexts, such as self-assessment and third-party evaluation of websites. It is applicable to all websites (including web applications) regardless of size and it is independent of any particular evaluation tools, browsers, and assistive technology.
The Website Accessibility Evaluation Methodology for WCAG 2.0 (WCAG-EM) is a standardized way to evaluate the conformance of websites to WCAG 2.0. The Methodology defines manual and semi-automated methods for selecting representative samples of web pages from websites that include complete processes. It provides guidance for evaluating the selected web pages to WCAG 2.0, and defines methods for integration and aggregation of the evaluation results into structured reports and conformance claims.
The Methodology provides guidance on evaluation throughout the development process but it is specifically applicable to conformance evaluation of existing websites. It extends the existing WAI resources on evaluating websites for accessibility . It includes complementary resources on Preliminary Review of websites for Accessibility and Involving Users in Evaluating Web Accessibility. It also provides further advice on specific contexts , selecting tools and evaluation during other stages of the development process.
The primary target audience of the Methodology is anyone, including individuals and organizations, wanting to evaluate the conformance of existing websites to WCAG 2.0. This includes but is not limited to:
Other audiences who might benefit from the Methodology include:
Users of the Methodology are assumed to be knowledgeable of WCAG 2.0, accessible web design, assistive technology, and of how people with different disabilities use the Web.
The following documents are important reading for this document and are indispensable for its application. More references can be found in Appendix B.
For the purposes of this document, the following terms and definitions apply.
The Methodology can be used by individuals who want to evaluate the conformance of websites to WCAG 2.0. Users of this methodology are assumed to be knowledgeable of WCAG 2.0 accessible webdesign, assistive technology and of how people with different disabilities use the Web. Evaluation according to this methodology can be carried out by individuals but using review teams with combined expertise involving people with disabilities and older people in the evaluation process is strongly recommended.
The expertise necessary for evaluating a website for WCAG2.0 is more explicitely described in the W3C/WAI Evaluation Resource Suite. Evaluation activities require diverse kinds of expertise and perspectives. Individuals evaluating the accessibility of web content require training and experience across a broad range of disciplines. If you do not have all that expertise yet and you want to start evaluating a website, the W3C/WAI Evaluation Resource Suite contains a less formal preliminary review to get you started.
The W3C/WAI W3C/WAI Evaluation Resource Suite provides complementary guidance on using combined expertise to evaluate web accessibility. The Methodology can be used by individuals, but review teams can provide better coverage of the expertise required in the understanding of web technologies, evaluation tools, barriers that people with disabilities experience, assistive technologies and approaches that people with disabilities use, and accessibility guidelines and techniques.
The W3C/WAI evaluation resource suite provides complementary guidance on Involving Users in Evaluating Web Accessibility. Evaluating with users with disabilities and with older users can identify additional issues that are not easily discovered by expert evaluation alone. It can make the evaluation process more effective and more efficient, especially when users are involved throughout the development process.
Note: Although testing with users is not a requirement under this methodology, we strongly recommend that people with disabilities and older people are involved in the evaluation process.
Description: This clause describes the procedure to express the scope of an evaluation. This procedure will include options to express the scope of different kinds of websites including web and mobile applications regardless of size. The scope will also address issues like evaluation tools, browsers and assistive technology. The procedure also includes how to handle web resources where the state and content of the web page change based on user interaction with the page. The same URI could then be accessible in some states and not be accessible in other states based on the user or external events.
In this Evaluation Methodology, the scope of the evaluation is not limited to full websites only. While the WCAG 2.0 Recommendation focuses on web pages, the scope of an evaluation within this Evaluation Methodology can be any coherent collection of one or more related web pages that together provide common use or functionality. This includes static web pages, dynamically generated web pages, and web applications. The scope has to be clear and unambiguously provided as part of the conformance claim. This information "should be provided in a form that users can use, preferably machine-readable metadata" (from: http://www.w3.org/TR/WCAG/#conformance-claims).
Web pages are not limited to html. They can include PDF, Office documents, audio, video, photos and other technologies. In principle, all web pages that are part of the full website or fall withing complete processes can fall within the scope of an evaluation. It is important that the scope of an evaluation is properly set because the scope delimites the web pages and web applications to be included into the sample.
One of the purposes of this methodology is to support replicability of results. Therefore the scope of an evaluation must be expressed as a list of resources. This list can include a sample that is representing a larger number of pages within the scope. This should be clearly and unambiguously stated in the conformance claim.
It is possible to exclude particular sections of a website from the scope even though they are part of a complete process. Examples for possible exclusion are: user generated content, wiki’s, bulletin boards etc. @@@Note: these are all not excluded in WCAG 2.0.
The Evaluation can also focus only on specific technologies excluding all other technologies used. @@@Outcome from discussion, more discussion necessary.
All pages of the complete process are included into the scope unless a complete process is excluded from the scope. Complete processes can be excluded from the scope in specific circumstances that will be described in this subclause. In this methodology complete processes can include parts of a website into the scope of an evaluation. In case of a series of steps that need to be completed in order to accomplish an activity, all web pages in the process are part of the scope of the evaluation and should conform at the specified level or better.
The Methodology describes the roles of manual and semi-automated evaluation. In case of manual evaluation, an evaluator can manually evaluate all pages, but on a website with millions of web pages that may not be feasible or desirable. This clause describes how to select a representative sample of a website. This can include static web pages, dynamically generated web pages, and web applications. This clause describes the contents of a representative sample of a website for evaluation.
Editor note: This subclause describes how to select a sample. Primarily focusing on the contents of the sample, the techniques and other relevant items to be included. We have to check if we sufficiently include dynamic pages and the different states of a dynamic page like a web application.
For the sample we distinguish three sets of sampled resources to be included into the total evaluation sample of a website:
Editor note: This subclause is currently taken from [UWEM] for discussion.
Core Resource List (non random sample): The Core Resource List is a set of generic resources, which are likely to be present in most websites, and which are core to the use and accessibility evaluation of a website. The Core Resource List therefore represents a set of resources to be included in any expert accessibility evaluation of the site (if they are available). The Core Resource List cannot, in general, be automatically identified, but will probably requires human judgement to be selected. The Core Resource List should consist of as many of the following resources as are applicable (if available in the scope):
If the evaluation sample is created manually, it must contain the core resources. If the core resources contains fewer pages than the required sample size, additional resources from the following categories must be added.
Of course, any single resource may belong to more than one of the categories above: the requirement is simply that the Core Resources as a whole should, as far as possible, collectively address all the applicable sampling objectives. Any given resource should appear only once in the total evaluation sample.
Task Orientated Resources are the pages necessary for completing the complete processes on the website. They provide examples of real use cases for the website. This might include tasks such as to source certain information, place an order or participate in a discussion.
Editor note: This subclause is currently taken from [UWEM] for discussion.
To generate a representative and unbiased total evaluation sample, a uniform random selection from the set of all resources belonging to the website is required. [Addition to UWEM: This can be done in many ways. The method used to select a random sample should be described in the Evaluation Report. Random sampling better supports comparability of results, e.g. monitoring and synchronous or asynchronous comparisons. Note that it is very difficult to generate a fully random sample without the use of a tool.]
There are different ways to determine the list of all resources belonging to a website to select a sample from. In some cases the list of resources is known beforehand, e.g. because it is provided by a site owner commissioning the evaluation. If the list of resources is unknown, the website has to be explored prior to the evaluation. This will typically be done by a web crawler automatically exploring the website by following links. The crawl starts out from one or more "seed resources" , e.g. the home page.
If a website is very large, it may not be feasible to identify a complete list of resources belonging to a website. In this case the evaluator may choose to stop the crawling process when a sufficiently large number of resources has been identified.
The evaluation sample is chosen from the list of resources belonging to the website using uniform random sampling without replacement.
Note that both the crawling and sampling algorithm used, and any further restrictions limiting or biasing the result, including, but not limited to the set of restrictions below, should be explicitly disclosed in the Evaluation Report:
Editor note: This subclause describes the ideal size of the sample of a website. The size of a sample will be depending on many different factors including but not limited to the number of web pages per technology related to the overall size of the website. The following is still subject to many discussions inside the EvalTF and needs more work:
The sample size is dependant on the size and the complexity of the website. The sample includes pages (if available) from the Core resource list, the Random resource list and the Task Oriented resource list. The Methodology requires that of every failure of a success criteria, the evaluation report contains a minimum of two examples of resources where these failures can be found. If pages are the same and do not show new errors, the evaluator can limit the evaluation of a page to the selection of particular elements or technologies that are targeted. (Todo: add example and describe how to conclude that a page is the same. Also we will explore the use of scenarios for the selection of the samples like for guest, regular user, taskbased etc.). In this way, more complex websites will in most cases have a larger sample.
Starting with a minimum sample size of X (to be determined) resources (if available) for the Core and the Task Oriented resource list, continue with the Random sample resource. The Random sample resource is chosen from the list of web pages belonging to the website using uniform random sampling without replacement. The requirement of a minimum pool of pages for sampling and a minimum sample size ensures the representativeness of the sample. The sampling stops if the fixed maximum sample size has been reached or – for sites where the number of crawled pages is less than the maximum sample size – when no more pages are available. The Stop Criteria used to decide when to stop sampling must be part of the report.
Editor note:Needs more explanation for the statistical choice that has to be made about when to stop sampling resources for the Random Resources.
The evaluator may choose to stop the process of searching for more Random Resources when a sufficiently large number of resources has been identified. How does the evaluator know that it is sufficient. That can depend on the number of additional barriers when sampling new pages and on the statistical error level that is chosen.
A complete description of the information necessary for the report can be found in clause 7.
Editor note: This is the clause describing step by step how to do the evaluation. The evaluation is depending on many factors, like technologies used, technologies evaluated, Accessibility supported etc. The three levels of evaluation are related to the three levels of reporting in the appendix and in clause 7.
"The detailed evaluation of a website evaluation has to be carried out by an accessibility expert. Many tests require human judgement and therefore have to be carried out manually. However, not the whole evaluation process has to be done by hand. The expert may choose to use tools to support sampling or perform fully automatable tests. To report the results the expert chooses between the text-based template report or the machine readable report in EARL format" [UWEM]. The evaluation is depending on many factors, including the technologies used on the website and if they are Accessibility Supported and the technologies addressed during the evaluation. The evaluation also depends on the level of expertise or the detail of the information required by the website owner. Therefore, the current editor draft proposes three different levels of evaluation:
Editor note: This subclause describes how to relate manual and semi-automated evaluation. It describes the both evaluation types and their relation to accessibility conformance evaluation.
Editor note: This subclause is currently taken from [UWEM] for discussion.
Accessibility testing may be done via semi-automated, manual and user testing. The different types of evaluation methods have a number of strengths and weaknesses. Only semi-automated and manual evaluation are covered in this methodology. Semi-automated evaluation can only test for conformance to a subset of the success criteria. Only a limited subset can be identified reliably by using semi-automated tools. Therefore, coverage of semi-automated evaluation as an overall indicator of accessibility is low, however it can identify many barriers reliably. It may also be applied efficiently to test very large numbers of web resources within a website.
Some tools can also act as support systems for the manual evaluation process. The tools can provide reliable results for a subset of tests and can not only speed up the process by performing some tasks automatically, but also, by providing hints about barrier locations, indicate areas the evaluators should focus on.
User testing is able to identify barriers that are not caught by other testing means, and is also capable of estimating the accessibility for tested web pages. It is not included in this methodology.
The main advantages of semi-automated testing are:
Editor note: This subclause describes the role of technologies in the scope and sample. It also describes the dependancy and covering of accessibility supported technologies and the availability of accessible and conformant alternatives.
This subclause provides a step by step description of the evaluation of the website (sample). This does not include going into the guidelines, success criteria etc from WCAG 2.0 or related techniques. It could be possible to propose different ways to evaluate the guidelines: one by one, per theme, per technology, etc. The Methodology will not prescribe one of those ways as necessary.
The procedure requires first setting the scope for the evaluation and select the sample of resources. Then determine the targeted conformance level for the evaluation. The evaluation procedure applies to the entire sample and checks each page and element against all applicable WCAG Success Criteria untill the Stop Criteria have been reached.
Editor note: Do we want to use the Stop Criteria? And if yes, are they used before or during the evaluation as indicated here? The Stop Criteria are currently described in the sampling clause. Discussion needed.
The procedure does not require a particular order of evaluation. It may run through all resources testing one Success Criterion at a time, or tackle the sample page by page, working through all applicable Success Crtieria for one page at a time or work per theme (multimedia, forms etc.). The order in which pages and Success Criteria are covered is not prescribed.
For the evaluaton of resources in the sample, all Success Criteria on the chosen level of conformance are applicable. This means that each full page in the sample is tested against all Success Criteria on the selected WCAG conformance level A, AA or AAA untill the Stop Criteria are reached. If the Stop Criteria have been reached, the page can be used for evaluation of single elements without the need to evaluate the page for all success criteria.
For the testing of selected elements (for example, a table or a form) the selected element must be tested against all Success Criteria that are applicable to it.
In the case of a data table selected as element to be tested for conformance on level AA, this would mean that the success criteria 1.3.1 "Info and relationships", 1.3.2 "Meaningful Sequence" and SC 1.4.3 "Contrast (Minimum)" clearly apply, while Success Criteria 2.4.2 "Page Titled", 2.4.5 "Multiple Ways", 3.1.1 "Language of Page" clearly do not apply.
For a range of Success Criteria, applicability will depend on the content of the element selected. For example, for a table containing links, Success Criteria 2.1.1 "Keyboard", 2.4.3 "Focus Order", 2.4.4 "Link Purpose (In Context)" and 2.4.7 "Focus Visible" would also apply.
Editor note:The following draft requires the use of the techniques. But we do not want to use techniques as checkpoints. More discussion needed.
The evaluation of conformance of a particular page or element should draw on the list of Sufficient Techniques provided for each Success Criterion in the WCAG Quickref (http://www.w3.org/WAI/WCAG20/quickref/).
When the evaluation of the page or element shows that one of the documented Sufficient Techniques has been used successfully, the Success Criterion is met. Wherever possible, success should be determined by applying the tests provided at the end of Sufficient Techniques. @@@ More discussion needed.
The success of content under test in implementing a Sufficient Technique (or set of Techniques that are deemed sufficient when used together) demonstrates the conformance of the page or element to the respective Success Criterion.
However, the failure of implementing a Sufficient Technique does not mean that the Success Crtierion is not met since other Techniques might have been used to achieve conformance, including Techniques not yet documented in the WCAG Quickref.
In addition to the Sufficient Techniques, the procedure must also check whether any of the documented WCAG Failures apply. If the test provided at the end of a WCAG Failure shows that the failure condition applies, then the page or element under test fails the associated Success Criterion.
Editor note:@@@ We also want to include quantification of the frequency in the status of a barrier (from minutes 20111215).
@@@ We should include the possibility for website owners to fix incidental errors without a totally new evaluation being necessary. After the website has fixed the failed criteria, it isn't only the original sample of pages that is re-evaluated. Instead it's a combination of the original sample and randomly selected new pages within the scope of the conformance claim.
Editor note: We will not make changes to WCAG 2.0 sections on conformance. In this section we will link the conformance inside WCAG 2.0 to the extended possibilities and options inside the Methodology and to the fact that conformance is for full websites. This include provisions for conformity in relation to scope, error margin etc.
Editor note: We will provide a list of conformity requirements that relate to the scope and sample clauses.
Note: Conformance claims are not required. Authors can conform to WCAG 2.0 without making a claim. However, if a conformance claim is made, then the conformance claim must include the following information:[WCAG2.0]
WCAG 2.0 also describes optional components of a Conformance Claim. They include:
This methodology adds a number of requirements for reporting like:
Editor note: This subclause will describe the role of accessibility supported in the Methodology and the impact on the website and the relation to the barriers and complete processes. This wil also include what to do if you find technologies that are not accessibility supported. Also we will provide a way to describe the set of AT and User Agents available to the user and the impact on the conformance statement as far as not covered by WCAG 2.0.
Description: Partial conformance seems to be adequately described in and outside of WCAG 2.0 but mostly related to individual web pages. This subclause shows how to claim partial conformance in relation to clauses Scope, Sample and Evaluation for a full website if this should be necessary from the related subclauses.
Editor note: Mostly already inside WCAG 2.0. Additions related to scope, sample, accessibility supported, techniques etc.
Editor note: This section explains how to score both with manual and with machine evaluation of websites. Also this subclause shows how the results can be aggregated and made ready for reporting. The chosen path will be also depending on the way that the evaluation is done (one by one, thematic, etc.).
Description: This clause covers how to write a report from the evaluation that is both human readable and or one that is machine readable. It describes the minimal contents of the report to support optimal interchange of information and harmonization and comparability of data on an international scale. This would make it possible for organisations in different countries, using different evaluators to compare or benchmark the data. Templates will be included in the appendices.
The report of an evaluation will have to include at least the required components of a conformance claim. Even if the evaluation is not for stating conformance.
Editor note: Explanation of the manual report in human understandable language. What should minimally be in the report. The template for the text based report is the to be found in the appendix.
Editor note: Explanation of the machine readable report following EARL. The template is then to be found in the appendix.
Desscription: This clause will describe known limitations of the Methodology. It will be filled during the further work on the Methodology. We will place here the limitations and underlying assumptions that are important for readers to understand the Methodology and possible situations where the Methodology provides risk i.e. for repeatability. The issues will be limited as possible by the Taskforce.
This publication has been developed with support from the European Commission funded WAI-ACT Project (IST 287725).
Some of the clauses in this document, in particular section 3, 4.1.1, 4.1.3, and 5.1, are derived from the Unified Web Evaluation Methodology [UWEM].
Editor note: Below list needs correct formatting.
Editor note: A complete example template for evaluation reporting in human understandable language. Discussion proposal below built upon the idea to have three different levels of reporting. The template will have to be ok for official evaluations so some change is still necessary. The difference between option 1 and 2 is currently very small (open for discussion).
This Appendix proposes three different levels of reporting following the discussion in clause 5 on levels of evaluation. The three options are:
The options should at least have the following information:
Results
Guideline X (heading)
Checkpoint: SC XX (subheading)
Result: pass/fail
Character: global/regional (or another wording) if regional: a list of pages where the problem exists
General information about this checkpoint: link to how to meet and to understanding this success criteria.
List of user agents, including assistive technologies that where used during the evaluation of the website and their versions.
List of pages in the sample
Results
Guideline X (heading)
Checkpoint: SC XX (subheading)
Result: pass/fail
Character: global/regional (or another wording) if regional: a list of pages where the problem exists
Description of existing problems and barriers for users or link to descriptions on W3C/WAI website.
General information about this checkpoint: link to how to meet and to understanding this success criteria.
List of user agents, including assistive technologies that where used during the evaluation of the website and their versions.
List of pages in the sample
Results
Guideline X (heading)
Checkpoint: SC XX (subheading)
Result: pass/fail
Character: global/regional (or another wording) if regional: a list of pages where the problem exists
Description of existing problems and barriers for users or link to descriptions on W3C/WAI website..
General information about this checkpoint: link to how to meet and to understanding this success criteria.
Action: Description of techniques for meeting the SC (could be techniques which are already in the techniques document or new techniques which are not in the document, but with which the SC can be met).
List of user agents, including assistive technologies that where used during the evaluation of the website and their versions.
List of pages in the sample
Editor note: A complete example template for evaluation reporting using EARL.