This extend abstract is a contribution to the Online Symposium on Website Accessibility Metrics. The contents of this paper has not been developed by W3C Web Accessibility Initiative (WAI) and does not necessarily represent the consensus view of its membership.
We have designed and implemented an application (named AMA, Accessibility Monitoring Application) to evaluate and collect accessibility status of large sets of Web sites (according to different guidelines, including WCAG 2.0). AMA [1] works in two different phases:
Our metric (Barriers Impact Factor, BIF [2]) aims to summarize results collected by AMA, offering an effective and feasible view of the impact of accessibility barriers measured on a large scale sample of Web sites. In particular, our method evaluates each accessibility error in terms of how it affects user's browsing by means of assistive technologies
In literature, different metrics to measure accessibility barriers have been presented. Some of them are based on specific Web pages sampling methods (often from the same Web site) and they usually include an intense manual control phase too, which can be conducted by experts or users [3, 4, 5, 6].
The metrics we present here has been inspired by such previous and related works (in particular by Giorgio Brajnik's Barrier Walkthrough method), but it has been defined and adapted so as to support an accessibility monitoring activity on a large amount of Web sites. One might used it also to measure only automatic evaluation results. In particular, our metrics is based only on WCAG 2.0 Success Criteria and it can be used by exploiting an automatic evaluation tools: each failed check is intended as an actual barrier.
To compute BIF, we have defined a barrier-error association table. This table reports, for each error detected in evaluating WCAG 2.0, the list of assistive technologies/disabilities affected by such an error.
The BIF metric is computed as follows:
BIF(i) = Σerror #error(i) x weight(i)
Where:
The lowest value of BIF could be 0 and this represents the absence of barriers. The higher is the BIF value and the higher is the impact of a certain barrier on a specific type of assistive technology/disability.
Barriers have been grouped into 7 sets, which impact in the following assistive technologies and disabilities:
The total BIF is:
tBIF = Σi BIF(i)
The average BIF is:
aBIF = tBIF/#pages
A first issue we have faced with is related to the barrier-error association table. To associate errors to barriers in the most effective way we have conducted a study on the WCAG 2.0 and related Techniques [7, 8]. For each error, success criteria and techniques references have been used to identify disabilities/assistive technologies it affects.
The optional manual assessment of the set of pages can be done once the automatic evaluation ends. In the meanwhile or during the manual evaluation itself, the page can change (both in terms of content and structure) compromising the whole evaluation process. To partially overcome this issue, AMA locally stores a copy of all the evaluated pages.
AMA provides data about the type and the amount of accessibility errors detected in the considered set of pages. Although these data represent a quantitative measure of the accessibility level of the set, they cannot assess how much these errors afflict the navigation done by using assistive technologies.
When the user chooses to evaluate Web sites according the Barriers Impact Factor, two data are added at the table which shows AMA results: the total barrier impact factor and the average barrier impact factor. Data about Barrier Impact Factor provide more significant results and can sketch the accessibility level of a large sample of Web sites in a better way. We did several experiments with different trial sets of pages, monitoring page accessibility and evaluating the resulting BIF. Some common issues have emerged. In particular the BIF results on screen reader/blindness are very high, as we could expect it. In fact, barriers which impact on screen reader are more frequent and automatically detectable, without a manual evaluation.
Currently, the parameter called weight in BIF is simply defined by giving an higher value to barriers related to level A errors (3) and decreasing it for level AA (2) and A (1) errors. Manually checked controls are not considered, by setting the weight value to 0.
Future research should work out weights on the basis of experiments with users, associating a value to automatically checked errors as well as to manually verified ones.
Thanks to Giovanni Grazia and Jacopo Deyla (Emilia-Romagna Region) and to Simone Spagnoli.