Measurement Resources

Usability Testing

Introduction to usability testing (Wikipedia)
Usability Metrics Explore usability testing by people with disabilities by defining appropriate metrics to measure the results of such testing.
Beginners guide to usability testing
National Institute of Standards and Technology (NIST) had a project called Common Industry Format test, evaluation, and reporting (CIFter) to investigate usability testing. This includes the Common Industry Format (CIF)
Barrier Walkthough by Giorgio Brajnik. This describes a heuristic evaluation.
Software QA and Testing FAQ. Definitions and descriptions of walk-throughs and inspections.

Task Success or Completion

Process Metrics

8 types of process metrics - more applicable to manufacturing
Some software process and project metrics notes - class notes. Good basic information. The end has a list of steps for setting up a process measurement program.

Tests for Existing WCAG Success Criteria

Alternative Text from WebAIM

Literature Review of Testing from Frederick Boland

The following is a literature review of some resources which may be relevant to aspects of what you are trying to develop (as a basis for discussion, but how could they apply specifically to your problem – that is what I would like to talk to you about ):

There was a report on scoring WCAG 2.0 conformance: https://www.nist.gov/publications/challenges-and-benefits-methodology-scoring-web-content-accessibility-guidelines-wcag. Maybe aspects of this could be used for what you are trying to achieve. In particular, I believe there is flexibility allowed to support user impact, ease of testing, etc., via weights assigned.

There was a report on assessing trustworthiness of software using the structured assurance case methodology. Although this paper refers to software specifically, the methods used may be applied to accessibility, since the problem spaces have similarities: https://www.nist.gov/publications/toward-preliminary-framework-assessing-trustworthiness-software

More background on this is at: https://www.nist.gov/publications/software-assurance-using-structured-assurance-case-models

The Quality Assurance Working Group at W3C: https://www.w3.org/QA/WG/ had a number of resources, parts of which may be applied to the scoring issues. There is a QA Framework Primer and Usage Scenario: https://www.w3.org/QA/WG/qaframe-primer, QA Specification Guidelines: https://www.w3.org/TR/qaframe-spec/, Variability in Specifications: https://www.w3.org/TR/spec-variability/, and Test FAQ: https://www.w3.org/QA/WG/2005/01/test-faq.

OASIS also has a conformance document: https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ioc, and rules for writing conformance clauses: http://docs.oasis-open.org/templates/TCHandbook/ConformanceGuidelines.html

There is some work being done on how to develop good scoring systems. Much of the work deals with risk assessment in medical applications and scoring sales leads, but an interesting paper is one from Gallaudet University: https://www.gallaudet.edu/accreditation-certification-and-licensure/assessment/assessment-of-student-learning/instructions-and-examples/developing-a-scoring-criteria-(rubrics)

There is a difference between scoring and measurement. An overview of this is at: https://www.rasch.org/rmt/rmt33a.htm and also at: https://cehs01.unl.edu/aalbano/intromeasurement/mainch2.html

An introduction to measurement uncertainty is at: http://www.isobudgets.com/introduction-to-measurement-uncertainty/

I also looked a little bit into literature for how to design good conformance models. Some of this work deals with requirements traceability, as well as design quality vs conformance quality. Much of the existing literature is theoretical or esoteric in nature, but I will keep looking.

If you are using people with disabilities in your testing you may need to worry about sample size, independence, variance, etc.

If you are going to use a questionnaire it should be validated if possible. A resource on how to do this is at: https://www.methodspace.com/validating-a-questionnaire/

Point System

Email from Tim Boland proposing a formula for the Point System "A Silver score could be the sum of "weighted" test results per test divided by the sum of the weights for the different tests, and could be a decimal number between 0.0 and 1.0 in all instances, regardless of the content and number of requirements chosen to test. The weights could indicate the relative importance of that requirement to overall accessibility of the content as well as other pertinent factors, and weight values would also be between 0 and 1. The tester would need to explain how they assigned values to the weights for each test to come up with the results. Much more explanation and thought is needed on this idea, but a sample formula might be:

Silver score (for some content) = (w1t1 + w2t2 + .. wntn) /(w1 + w2 + .. + wn), where the w's are the respective weights for tests 1 to n and the t's are the scores for each Silver requirement (1 to n) chosen.

" See the email for more detail.