13:48:53 RRSAgent has joined #eval 13:48:53 logging to http://www.w3.org/2013/08/29-eval-irc 13:48:55 RRSAgent, make logs world 13:48:55 Zakim has joined #eval 13:48:57 Zakim, this will be 3825 13:48:57 ok, trackbot; I see WAI_ERTWG(Eval TF)10:00AM scheduled to start in 12 minutes 13:48:58 Meeting: WCAG 2.0 Evaluation Methodology Task Force Teleconference 13:48:58 Date: 29 August 2013 13:54:50 Vivienne has joined #eval 13:55:23 ericvelleman has joined #eval 13:55:49 WAI_ERTWG(Eval TF)10:00AM has now started 13:55:55 +[IPcaller] 13:56:01 zakim, IPcaller is me 13:56:01 +Vivienne; got it 13:56:52 + +31.30.239.aaaa 13:57:02 Zakim, aaaa is me 13:57:02 +MartijnHoutepen; got it 13:59:08 Detlev has joined #eval 13:59:09 +Eric_Velleman 14:00:36 + +49.404.318.aabb 14:01:06 Zakim, aabb is Detlev 14:01:06 +Detlev; got it 14:01:19 Sarah_Swierenga has joined #eval 14:01:49 Liz has joined #eval 14:01:57 + +1.517.432.aacc 14:02:29 + +1.301.975.aadd 14:02:59 zakim, aadd is Liz 14:02:59 +Liz; got it 14:03:01 Zakim, aacc is Sarah 14:03:02 +Sarah; got it 14:03:26 Mike_Elledge has joined #eval 14:03:45 + +1.313.322.aaee 14:04:06 Zakim, aaee is Mike 14:04:06 +Mike; got it 14:04:24 zakim, mute me 14:04:24 Vivienne should now be muted 14:05:06 korn has joined #eval 14:05:19 scribe: Detlev 14:05:21 Zakim, mute me 14:05:21 MartijnHoutepen should now be muted 14:05:49 http://www.w3.org/2013/08/01-eval-minutes.html 14:05:50 + +1.650.506.aaff 14:06:00 Zakim, aaff has Peter_Korn 14:06:00 +Peter_Korn; got it 14:06:02 http://www.w3.org/WAI/ER/conformance/ED-methodology-20130712#step5c 14:06:15 Discussion of performance score 14:06:33 MoeKraft has joined #eval 14:06:39 q+ 14:06:52 +MoeKraft 14:07:08 Eric: Some people had problems with the score since it could mask important barriers, assessments can be subjective, etc. 14:07:13 q? 14:08:04 Mike: Problems with description of score calculation 14:09:06 q+ 14:09:17 q- mike 14:10:07 Mike: Description should highlight what differentiates the different methods of calculating the score (per website / per page / per instance) 14:10:07 -Mike 14:10:25 +vivienne 14:10:30 q+ 14:10:38 +Mike 14:10:58 Peter: Uncomfortable with the idea of a performance score 14:11:27 q- 14:11:41 zakim, ack me 14:11:41 unmuting Vivienne 14:11:42 I see no one on the speaker queue 14:11:43 Peter: Boiling everything down to a single number can be misleading 7 hide a lot of information 14:11:48 q+ 14:12:09 Eric: can the score be made more specific? 14:12:44 q? 14:12:50 Peter: Possibly, but it remains fraught with problems - you just get a number 14:13:37 Peter: ore difficult with web applications - here problems may derived from rarely used parts / features of a web application 14:14:21 Peter: So the score may not reflect the degree of difficulty users have 14:15:10 Peter: Degree of problems will vary with the context of use 14:15:38 q+ 14:17:03 Detlev: agree that it's problematic, but scores can be of use to measure development effort, through time or in larger projects 14:18:04 q+ 14:18:15 Vivienne: Some of the purposes of web site evaluations make scores helpful - for example in comparing accessibility of site 14:18:46 q- de 14:18:51 q- vi 14:19:14 Vivienne: also useful for comparing different instances of evaluation of the same site - did we get better? Did it get worse? 14:19:50 Vivienne: so a mechanism for scoring is beneficial 14:20:22 q+ 14:20:32 Eric: More focused on improving the tex in the methodology 14:21:17 Peter: Will need caveats - reacting to the comparison example Vivienne provided 14:22:36 Peter: Comparison between applications wil be difficult because it implies choices (most used parts / rarely used parts); do I prioritize tests for existing employees or regardless of that? 14:23:43 Peter: The different scores for different parts may not give valid informaton for decision making in the absense of particulars of users / use cases 14:24:21 Peter: Huge differences in scores will be more helpful then when numbers are close 14:25:37 Peter: Implementers in Australia misusing Techniques where particular techniques are prescribed for use in contravention of the WCAG approach 14:26:58 q+ 14:26:59 Eric:Score is related to the sample that embodies a use case already 14:27:04 q+ 14:27:04 q+ 14:27:11 q- kor 14:27:38 Eric: Aggregate scores will mask differences in any case 14:28:17 q+ 14:28:24 q+ 14:29:11 q- vi 14:29:12 Vivienne: Scoring only used to indicate a percentage of SC failed - in a scientifc context it can be useful to differentiate for example, by counting violations per page 14:29:49 Vivienne: Some mechanism is required to make any comparison at all 14:30:27 Peter: A single violation (big flashing object) may be worse than 15 other violations 14:30:55 q+ 14:30:57 Vivienne: there are all kinds of problems, and it is not mandatory 14:31:07 ack me 14:32:04 Martijn: People want to use numbers, demand scores - we better provide an acceptable way of scoring rather than doing nothing 14:33:11 Martijn: The text that's there right now might be more finegraded (scores for level A, Level AA, etc.) SC may be rated as fine or N.A. so that might need more guidance 14:33:41 Zakim, mute me 14:33:41 MartijnHoutepen should now be muted 14:34:20 -Detlev 14:35:13 impact of errors or failures is important 14:35:28 Maybe check out Roger Hudson's Accessibility Priority Tool : http://usability.com.au/2013/01/accessibility-priority-tool/ 14:36:46 +Detlev 14:36:50 Peter: Any calculation will disrepresent values of W3c, make some disabilities more important then others 14:37:52 Peter: calculation automatically prioritises one disability over another 14:37:57 Peter: Calculation cannot encapsulate severity of accessibility issues 14:38:02 q? 14:38:52 Peter: If there is formula it includes a value judgement, without formula the score is not doable 14:39:02 q- 14:40:18 Peter: For improvement calculations it is straightforward, on the same sample 14:40:21 q- 14:41:24 zakim, mute me 14:41:24 Vivienne should now be muted 14:41:31 q- mik 14:41:47 Mike: Using score for comparing competing products is difficult - but it is useful for comparing tests of the same product to see the trend (gettign better or worse) 14:42:15 Mike: People doing research will be devising their own methodology 14:43:09 Mike: We might recommend using the score only in cases for comparison of the same site or tool over time ,rather than between sites / apps performances 14:43:38 +q to say: "We might have a section on 'scoring' that talks about all of the problems / issues with it, note the value for measuring improvement over time, set forth a variety of ways measuring within a site over time improvement, and then expressly state that this document does NOT set forth a canonical score and that it should NEVER be used for comparison across different sites/products" 14:43:52 -Mike 14:44:34 +Mike 14:44:46 Moe: Internally there are execs that want benchmark figures to see the degree of conformance - government is less granular 14:45:41 q+ 14:45:46 Moe: internally it can be very useful, there may be differences regardign the support for different groups of people with disability 14:46:02 q+ 14:46:15 q- 14:46:54 Peter: Suggests section about problems and issues related to scoring (pasted into IRC above) 14:47:19 q+ 14:47:56 Peter: scores could specify the purpose of the score, delimiting its validity 14:48:45 q- kor 14:50:01 Liz: If we have a score: be informed that Tim and Liz are writing a paper describing a methology for scoring 14:50:46 Liz: THis might eliminate the concerns Peter expressed 14:51:29 Liz I'd love to see that paper when you'd like some feedback 14:51:32 Liz: paper will be distributed / published at CSUN Liz may make an earlier version available 14:52:03 q- liz 14:52:31 Eric will work on the editor draft, possibler include some of that 14:53:50 Detlev: We identify major use cases in the first part of the methodology, we can use this information for scoring. 14:54:15 Detlev: comparison may be possible across sites on this basis 14:54:23 q- det 14:55:34 Sarah: Concern over misuse of scores, concerns at the WCAG WG about any scoring mechanism - can it not me a supplemental mechanism and not part of the formal methodology 14:56:04 q+ 14:56:08 Eric: This seems an approach of last resort - there is a widespread need for scores 14:57:21 Eric: would seem to belong to the mwethodology with all the difficulties discussed it is still worthwhile - we should try to reach some consensus clear highlight the caveats 14:57:53 Eric: if that's not possible, make it ancillary info 14:57:58 q? 14:58:23 Sarah: many want a score to stack up against competitors - we should try to find a meaningful mechanism 14:58:53 q- sar 14:58:58 Eric: What's in the draft now needs to be improved 14:59:26 q? 14:59:33 Eric: Next week there will be a new editor draft, discussion on detail of reporting 15:00:29 Peter: The fact that many people are clamouring for scores highlights the danger of providing a single score formular 15:01:20 Peter: People will be scoring differently leading to vastly different numbers. 15:01:55 Peter: WIll new editor draft include text drafted by Peter (on...?) 15:02:39 Eric: Text was worked on, also by Shadi. Text should reflect outcome of discussions 15:03:01 Eric: will check if Peter's text has been included already 15:03:27 have a good week everyone. bye! 15:03:28 ciao! 15:03:31 -Sarah 15:03:33 bye 15:03:35 -Mike 15:03:35 bye 15:03:36 -MoeKraft 15:03:36 ack me 15:03:38 bye 15:03:44 - +1.650.506.aaff 15:03:48 -Detlev 15:03:51 trackbot, end meeting 15:03:51 Zakim, list attendees 15:03:51 As of this point the attendees have been Vivienne, +31.30.239.aaaa, MartijnHoutepen, Eric_Velleman, +49.404.318.aabb, Detlev, +1.517.432.aacc, +1.301.975.aadd, Liz, Sarah, 15:03:54 ... +1.313.322.aaee, Mike, Peter_Korn, MoeKraft 15:03:59 RRSAgent, please draft minutes 15:03:59 I have made the request to generate http://www.w3.org/2013/08/29-eval-minutes.html trackbot 15:04:00 RRSAgent, bye 15:04:00 I see no action items 15:04:03 -Vivienne