This paper is a contribution to the Website Accessibility Metrics Symposium. It was not developed by the W3C Web Accessibility Initiative (WAI) and does not necessarily represent the consensus view of W3C staff, participants, or members.
A metrics to make different DTDs documents evaluations comparable
1. Problem Addressed
Metrics about barriers on Web pages have been proposed and justified both experimentally and theoretically, to compare or order accessibility degrees. Consistency of results is guaranteed by constraints and goals the international guidelines and national regulations state. Admitted variations and all-out presence of violations in some requirements limit or prevent the possibility of realistic assessment procedures. This is the case of the Italian regulations about Web content accessibility  and their pursuance by the Italian Public institutions Web sites. On the one hand, the law imposes the markup being HTML 4.01 strict, XHTML 1.0 strict or superior, and thus it provides distinct dominions which are comparable in a difficult way. On the other hand, just about 43% of Italian Public institutions homepages are declared DTD strict, while the others refer to a transitional or frameset grammar. Any meaningful synthesis about code validity cannot simply exclude 57% of monitored Web contents. On  we approached these limitations by defining some formulas to quantify expected errors on a target DTD from a different one. Such a metrics, which is born from peculiar characteristics of Italian regulation can be generalized to approach a more general instance about comparing documents with different DTDs.
Metrics about Web content accessibility has been defined and assessed on several works. Typically, they assume the necessity of a quantitative, continuous scale to cover accessibility degrees. They try to go beyond the dichotomy between an accessible and a not accessible Web content, since binary values might be not meaningful for quality assurance and assessment. Sullivan and Matson  propose a failure rate value as the ratio between actual and potential errors. On  Parmanto and Zeng are driven by some general considerations about meanings and expectations of a metrics, such as the quantitative possibility to evaluate evolution of accessibility in time on a continuous scale. Brajnik and Lomuscio  have defined the Semi-Automatic Method for measuring Barriers of Accessibility (SAMBA), to integrate manual and automatic evaluations on the strength of barriers harshness and of tools errors rate. The Web Accessibility Quantitative Metric (WAQM) metrics has been proposed by Vigo et al.  to overcome previous measures limitations. It takes into account each accessibility attribute (according to the WCAG 2.0 vocabulary), each checkpoint, its priority and how many times it has been tested, every warning (or potential error to be manually verified) and finally, the ratio between errors and tests.
Italian regulation about Web accessibility admits three different strict grammars: HTML 4.01 strict, XHTML 1.0 and XHTML 1.1.
In order to provide a more complete evaluation, we have considered three different properties (resulting from three different evaluations):
- validity, to the declared DTD; this value is computed by dividing the number of errors in the declared DTD by the number of DOM elements inside a page p;
- strictness, i.e. compliance to the correspondent strict DTD. Pages declared as transitional or frameset in a specific markup language (HTML 4.01 or XHTML 1.0) are evaluated as declared with the correspondent strict DTD;
- markup quality, i.e. compliance to XHTML 1.1. Pages declared with different DTDs are evaluated as declared with the XHTML 1.1 DTD.
We defined the expected number of errors for a given DTD as:
(1) ñj = x'ni + x''(ni + 1)
Where j represents the target-DTD, while i is the initial one, according to a suitable order (from frameset to strict). Finally, ni denotes the number of errors in the declared DTD divided by the number of DOM elements inside that page p. Analogously ñj is the expected value calculated on the j DTD divided by the number of DOM elements inside that page p.
We called the couple: (x'i, x''i) Errors Springing Up Rate (ESUR). They are computed, based on a wide sample of sites according to a couple of weighted averages on errors about DTDs.
For each page, we considered all the three conformance evaluations listed above. The resulting data have been used to compute x'i and x''i as follows:
x'i = (1/(2S#Pe))Σk=1#Pe(nkj/nki);
x''i = (1/#Pv)Σk=1#Pv(nkj);
Where #Pe and #Pv are the amounts of pages (declared with the i DTD) with and without errors. S has been experimentally assessed to minimize the variance between estimated and computed values. In particular:
- for strictness, S is 2 whenever the declared DTD is HTML and 1 for XHTML;
- for markup quality, S is 3 whenever the declared DTD is HTML and 2 for XHTML.
(1) lets to set apart any markup grammar from evaluation of accessibility errors, by allowing a quantitative comparison.
4. Major Difficulties
Final, analytical form for ESUR has been made possible through the Vamolà Monitor, an application to evaluate accessibility on a large amount of Web sites according to the Italian regulation, which is used at Emilia Romagna. Such an empirical approach needed a very high amount of estimations before being enough general to appear consistent for any other given case.
ESUR and expected values for errors in different-grammar documents provide interesting hints about point 3 of Section 3. On  are reported results about computed ESUR values for Sites on Vamolà and expected errors about strictness, validity and quality on different geographical zones of Italy.
Indeed, our approach is strictly addressed to the application of the Italian law, but it could be generalized to any scenario where different grammars make evaluation of code-quality unfeasible.
6. Open Research Avenues
Refinements of ESUR formulas and a widest amount of evaluated Web pages to use for them are actually a goal we must pursue. Analogous process of evaluation is going to be applied to the other accessibility requirements, in order to provide an accessibility evaluation beyond code quality.
Thanks to Giovanni Grazia and Jacopo Deyla (Emilia-Romagna Region) and Simone Spagnoli.
- Italian parliament. Law nr. 4 - 01/09/2004. Official Journal nr. 13 - 01/17/2004, January 2004.
- M. Battistelli, S. Mirri, L.A. Muratori, P. Salomoni, S. Spagnoli (2010) Avoiding to dispense with accuracy: a method to make different DTDs documents comparable. Proceedings of the 2010 ACM Symposium on Applied Computing (SAC'10). 862-866. DOI:10.1145/1774088.1774265
- T. Sullivan and R. Matson (2000) Barriers to use: usability and content accessibility on the Web's most popular sites. Proceedings of ACM Conference on Universal Usability 139-144. 10.1145/355460.355549
- B. Parmanto, X. Zeng (2005) Metric for Web Accessibility Evaluation. Journal of the American Society for Information Science and Technology, 56(13):1394-1404. DOI:10.1002/asi.20233
- G. Brajnik and R. Lomuscio (2005) SAMBA: a Semi-Automatic Method for Measuring Barriers of Accessibility. Proceedings of the 9th International ACM SIGACCESS Conference on Computers and Accessibility 43-50. 10.1145/1296843.1296853
- M. Vigo, M. Arrue, G. Brajnik, R. Lomuscio and J. Abascal (2007) Quantitative Metrics for Measuring Web Accessibility. Proceedings of the 2007 International Cross-Disciplinary Workshop on Web accessibility (W4A'07). 99-107. DOI:10.1145/1243441.1243465