This extend abstract is a contribution to the Online Symposium on Website Accessibility Metrics. The contents of this paper has not been developed by W3C Web Accessibility Initiative (WAI) and does not necessarily represent the consensus view of its membership.
Many authors  have analyzed and developed accurate metrics for Web accessibility evaluation processes to reflect results in a quantitative way. These metrics, especially the ones associated to semi-automatic processes, require an additional effort as they compute barriers rather than compliance with accessibility guidelines. In addition, there are several software and web-based applications – widely used in the software industry and education- that check the accessibility compliance according to WCAG 2.0 guidelines, and could easily have a related metric. However, those automatic tools introduce false positives that add noise to later measurement processes.
This study addresses the integration of several Web Accessibility metrics into a semi-automatic measuring process performed by prototype application that checks the accessibility of a website according to WCAG checkpoints. The idea behind this research is to implement some of the existing accessibility metrics and compute them automatically in the software, in order to analyze the contribution of this approach in a real scenario with existing Web pages. As automatic evaluations are not as reliable as the ones that incorporate human judgment, the proposed tool is also designed to let users filter the results before generating the metrics.
The goals of this experience are:
This study is based on previous research about accessibility measurement, especially the automatic and semi-automatic experiences performed in  and .The quantitative analysis of the relevant metrics is based on , while WCAG analysis is taken from .
A software application prototype named 'OceanAcc', which implements a semi-automatic accessibility evaluation process, was developed to address the problem of metric calculation. OceanAcc tool executes an automatic test of WCAG 2.0 guidelines, using ATutor Web services. Test results are stored, and the tool automatically matches them with a corresponding barrier by using G. Brajnik Barrier-Walkthorugh relationship. Then, the evaluators filter the results by removing checkpoint and barrier false violations, and also adding false negatives. With a reduced set of results, the tool generates and stores metrics of a specific site, and it also tracks the history.
The following metrics were selected for the study :
One of the challenges of integrating different metrics in a certain process is that, even if many of them are based on WCAG 2.0 checkpoints assessment, they require additional information that is external to evaluation results. For instance, the Failure Rate metric requires the number of failures a specific website can have. For that purpose, the tool prototype renders an estimate of the existing failure points by prompting the user for specific elements on the website (images, video, etc.). For semi-automatic metrics based on barriers, the tool provides a matrix to match each of the checkpoint failures to one or more barriers. Additionally, many metrics adjust each failure with a weight to control its impact on the results. This is also considered when performing the barrier-checkpoint mapping.
The major difficulties presented during the experience of integration were related to the adaptation of results which came from an automatic input. The most relevant issues are detailed in the following list:
After the execution of this experience, Failure Rate, WAB Score, UWEM Score, and False Positive rate metrics were computed from an automatic result set in a timely manner. A small set of websites was selected to run the test, which consisted in ACM, Yahoo Ar and Google home pages. The main empirical conclusions of this study are the following:
Integrating metrics to accessibility evaluation processes provides valuable information on the accessibility status of each page, also enabling a quantitative comparison among sites. However, in order to integrate metrics with accuracy, there are some open issues to solve. First, the extra parameters that are required to calculate metrics should be automatically generated to prevent any bias. As for the variability of the human criteria, processes should aid the less experienced evaluators. For instance, tools could track the decision history and suggest the most compelling alternatives based on the probability of occurrence, or some other pattern.
Finally, it would be interesting to explore the barrier-checkpoint mapping approach because evaluating checkpoints consumes fewer resources, while barrier metrics are more meaningful and can be easily matched with additional data. For instance, they can give a better idea of which groups of people with disabilities are really being affected by the low accessibility of a website.
To summarize, it has been demonstrated that metrics are useful from a quality assurance perspective. Integrating metrics into software tools is promising, and will result in a more efficient computing process.
The resulting insights will open research avenues that will contribute to an enhanced metric integration, with less human intervention. This experience worked accordingly by assessing the set of WCAG checkpoints that can be easily tested by automatic tools.
However, the most complex checkpoints, which usually belong to AAA level, still require human intervention and a process adaptation to impact the results without introducing a considerable noise.
To deal with these WCAG checkpoints and facilitate the automatic calculation of metrics, the following approach is proposed (to be analyzed in a future study). Metrics should be categorized by process level to allow a better integration with tools:
This research was part of my degree thesis, mentored by Eng. Ba. Osvaldo Clua (Universidad de Buenos Aires, Facultad de Ingenieria, 2010-04-15).