This extend abstract is a contribution to the Online Symposium on Website Accessibility Metrics. The contents of this paper has not been developed by W3C Web Accessibility Initiative (WAI) and does not necessarily represent the consensus view of its membership.
Metrics about barriers on Web pages have been proposed and justified both experimentally and theoretically, to compare or order accessibility degrees. Consistency of results is guaranteed by constraints and goals the international guidelines and national regulations state. Admitted variations and all-out presence of violations in some requirements limit or prevent the possibility of realistic assessment procedures. This is the case of the Italian regulations about Web content accessibility [1] and their pursuance by the Italian Public institutions Web sites. On the one hand, the law imposes the markup being HTML 4.01 strict, XHTML 1.0 strict or superior, and thus it provides distinct dominions which are comparable in a difficult way. On the other hand, just about 43% of Italian Public institutions homepages are declared DTD strict, while the others refer to a transitional or frameset grammar. Any meaningful synthesis about code validity cannot simply exclude 57% of monitored Web contents. On [2] we approached these limitations by defining some formulas to quantify expected errors on a target DTD from a different one. Such a metrics, which is born from peculiar characteristics of Italian regulation can be generalized to approach a more general instance about comparing documents with different DTDs.
Metrics about Web content accessibility has been defined and assessed on several works. Typically, they assume the necessity of a quantitative, continuous scale to cover accessibility degrees. They try to go beyond the dichotomy between an accessible and a not accessible Web content, since binary values might be not meaningful for quality assurance and assessment. Sullivan and Matson [3] propose a failure rate value as the ratio between actual and potential errors. On [4] Parmanto and Zeng are driven by some general considerations about meanings and expectations of a metrics, such as the quantitative possibility to evaluate evolution of accessibility in time on a continuous scale. Brajnik and Lomuscio [5] have defined the Semi-Automatic Method for measuring Barriers of Accessibility (SAMBA), to integrate manual and automatic evaluations on the strength of barriers harshness and of tools errors rate. The Web Accessibility Quantitative Metric (WAQM) metrics has been proposed by Vigo et al. [6] to overcome previous measures limitations. It takes into account each accessibility attribute (according to the WCAG 2.0 vocabulary), each checkpoint, its priority and how many times it has been tested, every warning (or potential error to be manually verified) and finally, the ratio between errors and tests.
Italian regulation about Web accessibility admits three different strict grammars: HTML 4.01 strict, XHTML 1.0 and XHTML 1.1.
In order to provide a more complete evaluation, we have considered three different properties (resulting from three different evaluations):
We defined the expected number of errors for a given DTD as:
(1) ñj = x'ni + x''(ni + 1)
Where j represents the target-DTD, while i is the initial one, according to a suitable order (from frameset to strict). Finally, ni denotes the number of errors in the declared DTD divided by the number of DOM elements inside that page p. Analogously ñj is the expected value calculated on the j DTD divided by the number of DOM elements inside that page p.
We called the couple: (x'i, x''i) Errors Springing Up Rate (ESUR). They are computed, based on a wide sample of sites according to a couple of weighted averages on errors about DTDs.
For each page, we considered all the three conformance evaluations listed above. The resulting data have been used to compute x'i and x''i as follows:
x'i = (1/(2S#Pe))Σk=1#Pe(nkj/nki);
x''i = (1/#Pv)Σk=1#Pv(nkj);
Where #Pe and #Pv are the amounts of pages (declared with the i DTD) with and without errors. S has been experimentally assessed to minimize the variance between estimated and computed values. In particular:
(1) lets to set apart any markup grammar from evaluation of accessibility errors, by allowing a quantitative comparison.
Final, analytical form for ESUR has been made possible through the Vamolà Monitor, an application to evaluate accessibility on a large amount of Web sites according to the Italian regulation, which is used at Emilia Romagna. Such an empirical approach needed a very high amount of estimations before being enough general to appear consistent for any other given case.
ESUR and expected values for errors in different-grammar documents provide interesting hints about point 3 of Section 3. On [2] are reported results about computed ESUR values for Sites on Vamolà and expected errors about strictness, validity and quality on different geographical zones of Italy.
Indeed, our approach is strictly addressed to the application of the Italian law, but it could be generalized to any scenario where different grammars make evaluation of code-quality unfeasible.
Refinements of ESUR formulas and a widest amount of evaluated Web pages to use for them are actually a goal we must pursue. Analogous process of evaluation is going to be applied to the other accessibility requirements, in order to provide an accessibility evaluation beyond code quality.
Thanks to Giovanni Grazia and Jacopo Deyla (Emilia-Romagna Region) and Simone Spagnoli.