This paper is a contribution to the Website Accessibility Metrics Symposium. It was not developed by the W3C Web Accessibility Initiative (WAI) and does not necessarily represent the consensus view of W3C staff, participants, or members.
Measuring accessibility barriers on large scale sets of pages
1. Problem Addressed
We have designed and implemented an application (named AMA, (Accessibility Monitoring Application) to evaluate and collect accessibility status of large sets of Web sites (according to different guidelines, including WCAG 2.0). AMA  works in two different phases:
- Data Collecting: Data evaluation and storage are periodically and automatically performed by the system, without any human activity. The system checks all the pages from the database and stores results into it. The evaluated pages are downloaded to permit experts to manually assess their accessibility. This last phase is optional and may require a wide amount of time to be accomplished
- Data Synthesis: Quantitative synthesis of evaluation results can be extracted from the database. This may be done by browsing the Web interface or through direct queries.
Our metric (Barriers Impact Factor, BIF ) aims to summarize results collected by AMA, offering an effective and feasible view of the impact of accessibility barriers measured on a large scale sample of Web sites. In particular, our method evaluates each accessibility error in terms of how it affects user's browsing by means of assistive technologies
In literature, different metrics to measure accessibility barriers have been presented. Some of them are based on specific Web pages sampling methods (often from the same Web site) and they usually include an intense manual control phase too, which can be conducted by experts or users [3, 4, 5, 6].
The metrics we present here has been inspired by such previous and related works (in particular by Giorgio Brajnik's Barrier Walkthrough method), but it has been defined and adapted so as to support an accessibility monitoring activity on a large amount of Web sites. One might used it also to measure only automatic evaluation results. In particular, our metrics is based only on WCAG 2.0 Success Criteria and it can be used by exploiting an automatic evaluation tools: each failed check is intended as an actual barrier.
To compute BIF, we have defined a barrier-error association table. This table reports, for each error detected in evaluating WCAG 2.0, the list of assistive technologies/disabilities affected by such an error.
The BIF metric is computed as follows:
BIF(i) = Σerror #error(i) x weight(i)
- i represents the assistive technologies/disabilities affected by detected errors;
- BIF(i) is the Barrier Impact Factor affected the i assistive technology/disability;
- error(i) represents the number of detected errors which affect the i assistive technology/disability;
- weight(i) represents the weight which has been assigned to the i assistive technology/disability.
The lowest value of BIF could be 0 and this represents the absence of barriers. The higher is the BIF value and the higher is the impact of a certain barrier on a specific type of assistive technology/disability.
Barriers have been grouped into 7 sets, which impact in the following assistive technologies and disabilities:
- screen reader/blindness;
- screen magnifier/low vision;
- color blindness;
- input device independence/movement impairments;
- cognitive disabilities;
- photosensitive epilepsy.
The total BIF is:
tBIF = Σi BIF(i)
The average BIF is:
aBIF = tBIF/#pages
4. Major Difficulties
A first issue we have faced with is related to the barrier-error association table. To associate errors to barriers in the most effective way we have conducted a study on the WCAG 2.0 and related Techniques [7, 8]. For each error, success criteria and techniques references have been used to identify disabilities/assistive technologies it affects.
The optional manual assessment of the set of pages can be done once the automatic evaluation ends. In the meanwhile or during the manual evaluation itself, the page can change (both in terms of content and structure) compromising the whole evaluation process. To partially overcome this issue, AMA locally stores a copy of all the evaluated pages.
AMA provides data about the type and the amount of accessibility errors detected in the considered set of pages. Although these data represent a quantitative measure of the accessibility level of the set, they cannot assess how much these errors afflict the navigation done by using assistive technologies.
When the user chooses to evaluate Web sites according the Barriers Impact Factor, two data are added at the table which shows AMA results: the total barrier impact factor and the average barrier impact factor. Data about Barrier Impact Factor provide more significant results and can sketch the accessibility level of a large sample of Web sites in a better way. We did several experiments with different trial sets of pages, monitoring page accessibility and evaluating the resulting BIF. Some common issues have emerged. In particular the BIF results on screen reader/blindness are very high, as we could expect it. In fact, barriers which impact on screen reader are more frequent and automatically detectable, without a manual evaluation.
6. Open Research Avenues
Currently, the parameter called weight in BIF is simply defined by giving an higher value to barriers related to level A errors (3) and decreasing it for level AA (2) and A (1) errors. Manually checked controls are not considered, by setting the weight value to 0.
Future research should work out weights on the basis of experiments with users, associating a value to automatically checked errors as well as to manually verified ones.
Thanks to Giovanni Grazia and Jacopo Deyla (Emilia-Romagna Region) and to Simone Spagnoli.
- S. Mirri, L.A. Muratori and P. Salomoni (2011) Monitoring accessibility: large scale evaluations at a geo political level. Proceedings of the 13th International ACM SIGACCESS Conference on Computers and Accessibility (ASSETS'11). DOI:10.1145/2049536.2049566
- M. Battistelli, S. Mirri, L.A. Muratori, P. Salomoni (2010) Avoiding to dispense with accuracy: a method to make different DTDs documents comparable. Proceedings of the 2010 ACM Symposium on Applied Computing (SAC'10). 862-866. DOI:10.1145/1774088.1774265
- G. Brajnik (2006) Web Accessibility Testing: When the Method is the Culprit. Proceedings of the 10th International Conference on Computers Helping People with Special Needs (ICCHP'06). 156-163. DOI:10.1007/11788713_24
- M. Vigo, M. Arrue, G. Brajnik, R. Lomuscio and J. Abascal (2007) Quantitative Metrics for Measuring Web Accessibility. Proceedings of the 2007 International Cross-Disciplinary Workshop on Web accessibility (W4A'07). 99-107. DOI:10.1145/1243441.1243465
- B. Parmanto, X. Zeng (2005) Metric for Web Accessibility Evaluation. Journal of the American Society for Information Science and Technology, 56(13):1394-1404. DOI:10.1002/asi.20233
- Y. Yesilada, G. Brajnik and S. Harper (2009) How Much Does Expertise Matter? A Barrier Walkthrough Study with Experts and Non-Experts. Proceedings of the Eleventh International ACM SIGACCESS Conference on Computers and Accessibility (ASSESTS'09). DOI:10.1145/1639642.1639678
- World Wide Web Consortium (2008) Web Content Accessibility Guidelines (WCAG) 2.0, W3C Recommendation 11 December 2008. Available from: http://www.w3.org/TR/WCAG/
- World Wide Web Consortium (2010) Techniques and Failures for Web Content Accessibility Guidelines 2.0, W3C Working Group Note 14 October 2010. Available from: http://www.w3.org/TR/WCAG20-TECHS/