WCAG 2.0 Evaluation Methodology Task Force Teleconference

29 Aug 2013

Vivienne, Martijn, Eric, Detlev, Liz, Sarah, Mike, Peter, Moe
Shadi, Tim


<ericvelleman> http://www.w3.org/WAI/ER/conformance/ED-methodology-20130712#step5c

Discussion of performance score

Eric: Some people had problems with the score since it could mask important barriers, assessments can be subjective, etc.

Mike: Problems with description of score calculation
... Description should highlight what differentiates the different methods of calculating the score (per website / per page / per instance)

Peter: Uncomfortable with the idea of a performance score
... Boiling everything down to a single number can be misleading 7 hide a lot of information

Eric: can the score be made more specific?

Peter: Possibly, but it remains fraught with problems - you just get a number
... ore difficult with web applications - here problems may derived from rarely used parts / features of a web application
... So the score may not reflect the degree of difficulty users have
... Degree of problems will vary with the context of use

<MartijnHoutepen> Detlev: agree that it's problematic, but scores can be of use to measure development effort, through time or in larger projects

Vivienne: Some of the purposes of web site evaluations make scores helpful - for example in comparing accessibility of site
... also useful for comparing different instances of evaluation of the same site - did we get better? Did it get worse?
... so a mechanism for scoring is beneficial

Eric: More focused on improving the tex in the methodology

Peter: Will need caveats - reacting to the comparison example Vivienne provided
... Comparison between applications wil be difficult because it implies choices (most used parts / rarely used parts); do I prioritize tests for existing employees or regardless of that?
... The different scores for different parts may not give valid informaton for decision making in the absense of particulars of users / use cases
... Huge differences in scores will be more helpful then when numbers are close
... Implementers in Australia misusing Techniques where particular techniques are prescribed for use in contravention of the WCAG approach

Eric: Score is related to the sample that embodies a use case already
... Aggregate scores will mask differences in any case

Vivienne: Scoring only used to indicate a percentage of SC failed - in a scientifc context it can be useful to differentiate for example, by counting violations per page
... Some mechanism is required to make any comparison at all

Peter: A single violation (big flashing object) may be worse than 15 other violations

Vivienne: there are all kinds of problems, and it is not mandatory

Martijn: People want to use numbers, demand scores - we better provide an acceptable way of scoring rather than doing nothing
... The text that's there right now might be more finegraded (scores for level A, Level AA, etc.) SC may be rated as fine or N.A. so that might need more guidance

<MartijnHoutepen> impact of errors or failures is important

<Vivienne> Maybe check out Roger Hudson's Accessibility Priority Tool : http://usability.com.au/2013/01/accessibility-priority-tool/

<MartijnHoutepen> Peter: Any calculation will disrepresent values of W3c, make some disabilities more important then others

<MartijnHoutepen> Peter: calculation automatically prioritises one disability over another

Peter: Calculation cannot encapsulate severity of accessibility issues
... If there is formula it includes a value judgement, without formula the score is not doable
... For improvement calculations it is straightforward, on the same sample

Mike: Using score for comparing competing products is difficult - but it is useful for comparing tests of the same product to see the trend (gettign better or worse)
... People doing research will be devising their own methodology
... We might recommend using the score only in cases for comparison of the same site or tool over time ,rather than between sites / apps performances

<korn> +q to say: "We might have a section on 'scoring' that talks about all of the problems / issues with it, note the value for measuring improvement over time, set forth a variety of ways measuring within a site over time improvement, and then expressly state that this document does NOT set forth a canonical score and that it should NEVER be used for comparison across different sites/products"

Moe: Internally there are execs that want benchmark figures to see the degree of conformance - government is less granular
... internally it can be very useful, there may be differences regardign the support for different groups of people with disability

Peter: Suggests section about problems and issues related to scoring (pasted into IRC above)
... scores could specify the purpose of the score, delimiting its validity

Liz: If we have a score: be informed that Tim and Liz are writing a paper describing a methology for scoring
... THis might eliminate the concerns Peter expressed

<Vivienne> Liz I'd love to see that paper when you'd like some feedback

Liz: paper will be distributed / published at CSUN Liz may make an earlier version available

Eric will work on the editor draft, possibler include some of that

<MartijnHoutepen> Detlev: We identify major use cases in the first part of the methodology, we can use this information for scoring.

<MartijnHoutepen> Detlev: comparison may be possible across sites on this basis

Sarah: Concern over misuse of scores, concerns at the WCAG WG about any scoring mechanism - can it not me a supplemental mechanism and not part of the formal methodology

Eric: This seems an approach of last resort - there is a widespread need for scores
... would seem to belong to the mwethodology with all the difficulties discussed it is still worthwhile - we should try to reach some consensus clear highlight the caveats
... if that's not possible, make it ancillary info

Sarah: many want a score to stack up against competitors - we should try to find a meaningful mechanism

Eric: What's in the draft now needs to be improved
... Next week there will be a new editor draft, discussion on detail of reporting

Peter: The fact that many people are clamouring for scores highlights the danger of providing a single score formular
... People will be scoring differently leading to vastly different numbers.
... WIll new editor draft include text drafted by Peter (on...?)

Eric: Text was worked on, also by Shadi. Text should reflect outcome of discussions
... will check if Peter's text has been included already

