See also: IRC log
<MartijnHoutepen> http://www.w3.org/2013/08/01-eval-minutes.html
<ericvelleman> http://www.w3.org/WAI/ER/conformance/ED-methodology-20130712#step5c
Discussion of performance score
Eric: Some people had problems with the score since it could mask important barriers, assessments can be subjective, etc.
Mike: Problems with description of score
calculation
... Description should highlight what differentiates the different methods of
calculating the score (per website / per page / per instance)
Peter: Uncomfortable with the idea of a
performance score
... Boiling everything down to a single number can be misleading 7 hide a lot
of information
Eric: can the score be made more specific?
Peter: Possibly, but it remains fraught with
problems - you just get a number
... ore difficult with web applications - here problems may derived from
rarely used parts / features of a web application
... So the score may not reflect the degree of difficulty users have
... Degree of problems will vary with the context of use
<MartijnHoutepen> Detlev: agree that it's problematic, but scores can be of use to measure development effort, through time or in larger projects
Vivienne: Some of the purposes of web site
evaluations make scores helpful - for example in comparing accessibility of
site
... also useful for comparing different instances of evaluation of the same
site - did we get better? Did it get worse?
... so a mechanism for scoring is beneficial
Eric: More focused on improving the tex in the methodology
Peter: Will need caveats - reacting to the
comparison example Vivienne provided
... Comparison between applications wil be difficult because it implies
choices (most used parts / rarely used parts); do I prioritize tests for
existing employees or regardless of that?
... The different scores for different parts may not give valid informaton for
decision making in the absense of particulars of users / use cases
... Huge differences in scores will be more helpful then when numbers are
close
... Implementers in Australia misusing Techniques where particular techniques
are prescribed for use in contravention of the WCAG approach
Eric: Score is related to the sample that
embodies a use case already
... Aggregate scores will mask differences in any case
Vivienne: Scoring only used to indicate a
percentage of SC failed - in a scientifc context it can be useful to
differentiate for example, by counting violations per page
... Some mechanism is required to make any comparison at all
Peter: A single violation (big flashing object) may be worse than 15 other violations
Vivienne: there are all kinds of problems, and it is not mandatory
Martijn: People want to use numbers, demand
scores - we better provide an acceptable way of scoring rather than doing
nothing
... The text that's there right now might be more finegraded (scores for level
A, Level AA, etc.) SC may be rated as fine or N.A. so that might need more
guidance
<MartijnHoutepen> impact of errors or failures is important
<Vivienne> Maybe check out Roger Hudson's Accessibility Priority Tool : http://usability.com.au/2013/01/accessibility-priority-tool/
<MartijnHoutepen> Peter: Any calculation will disrepresent values of W3c, make some disabilities more important then others
<MartijnHoutepen> Peter: calculation automatically prioritises one disability over another
Peter: Calculation cannot encapsulate severity of
accessibility issues
... If there is formula it includes a value judgement, without formula the
score is not doable
... For improvement calculations it is straightforward, on the same sample
Mike: Using score for comparing competing
products is difficult - but it is useful for comparing tests of the same
product to see the trend (gettign better or worse)
... People doing research will be devising their own methodology
... We might recommend using the score only in cases for comparison of the
same site or tool over time ,rather than between sites / apps performances
<korn> +q to say: "We might have a section on 'scoring' that talks about all of the problems / issues with it, note the value for measuring improvement over time, set forth a variety of ways measuring within a site over time improvement, and then expressly state that this document does NOT set forth a canonical score and that it should NEVER be used for comparison across different sites/products"
Moe: Internally there are execs that want
benchmark figures to see the degree of conformance - government is less
granular
... internally it can be very useful, there may be differences regardign the
support for different groups of people with disability
Peter: Suggests section about problems and issues
related to scoring (pasted into IRC above)
... scores could specify the purpose of the score, delimiting its validity
Liz: If we have a score: be informed that Tim and
Liz are writing a paper describing a methology for scoring
... THis might eliminate the concerns Peter expressed
<Vivienne> Liz I'd love to see that paper when you'd like some feedback
Liz: paper will be distributed / published at CSUN Liz may make an earlier version available
Eric will work on the editor draft, possibler include some of that
<MartijnHoutepen> Detlev: We identify major use cases in the first part of the methodology, we can use this information for scoring.
<MartijnHoutepen> Detlev: comparison may be possible across sites on this basis
Sarah: Concern over misuse of scores, concerns at the WCAG WG about any scoring mechanism - can it not me a supplemental mechanism and not part of the formal methodology
Eric: This seems an approach of last resort -
there is a widespread need for scores
... would seem to belong to the mwethodology with all the difficulties
discussed it is still worthwhile - we should try to reach some consensus clear
highlight the caveats
... if that's not possible, make it ancillary info
Sarah: many want a score to stack up against competitors - we should try to find a meaningful mechanism
Eric: What's in the draft now needs to be
improved
... Next week there will be a new editor draft, discussion on detail of
reporting
Peter: The fact that many people are clamouring
for scores highlights the danger of providing a single score formular
... People will be scoring differently leading to vastly different numbers.
... WIll new editor draft include text drafted by Peter (on...?)
Eric: Text was worked on, also by Shadi. Text
should reflect outcome of discussions
... will check if Peter's text has been included already