WCAG 2.0 Evaluation Methodology Task Force Teleconference -- 04 Oct 2012

comments until now

<MartijnHoutepen> EV: We will collect all the received comments into a disposition of comments and publish this after the comments close

random sampling survey

<MartijnHoutepen> http://lists.w3.org/Archives/Public/public-wai-evaltf/2012Oct/0017.html

<Kathy> yes

yes

<Liz> yes

vivienne: evaluation commissioner - maybe need question; to what extend do the views of the commissioner change the sample selection?

peter: concern with survey is that is focused on external evaluation and sampling. If this the only view we have then we'll miss other users and use cases, e.g., web apps, templates, boundary condition testing.

martijn: add questions if you like

peter: the questionnaire would have to expand significantly, so we might want to break them up into two questionnaires

<ericvelleman> sorry, my phone was also on internet, so I will have to dial in again

peter: different survey for structured reviews for websites that are too big, or have too many permutations

martijn: maybe another questionnaire for in-house testing

peter: craft a placeholder for random sampling for other types of website

<shadi> [[I wonder if we need two questionnaires that follow each other to dig deeper into particular areas (such as web applications) rather than splitting by in-house/external modalities?]]

Kathy: need more for web applications. right now we have #12 that can start the discussions. is that enough?

peter: how do you adjust your sampling approach? your testing approach? adjustment could be to remove Random from the title and add it to specific items.

kathy: #3 - this would change depending on what type of evaluation is done

<vivienne> * vivienne agrees that we should change the title and make the survey about sampling, not just random

kathy: mobile apps, websites, web apps, etc.
... methodology is very focused on websites; internal vs external views of applications should be addressed

ev: need to account for these ideas

katie: agrees with peter and kathy re: internal vs external views. when we figure out the universe of this document, we need to consider historic components of the project

shadi: mobile has to be in the scope

<vivienne> yes, mobile has to be there

<Ryladog> +1

<Ryladog> +1 mobile

<shadi> [[not advocating to remove mobile but wondering about terminology difference between "mobile app" versus "mobile website"]]

peter: include mobile as part of the universe. question about re-review: white box testing where the engineering group tells you we made lots of changes in these areas, so we focus on the new areas. how would this adjust the sampling?

kathy: also missing is the role of automated testing. do automated tests change the sampling strategy?

ev: good idea for a new question

<shadi> +1

<MartijnHoutepen> +1

vivienne: likes adding question about automated testing. sometimes uses automated tools to help decide on pages to test for manual review.

ev: send proposed questions/revisions to the list

<vivienne> sure

peter: native apps shouldn't be within the scope.

katie: agrees that native apps shouldn't be included, but if it opens in a browser it should be included.

peter: if it's not something that users can direct their browsers to view, then it shouldn't be included
... e.g., iTunes is not within scope

<shadi> [[propose a question to compare *sampling* in mobile web apps vs in traditional web apps]]

katie: the excell version (plugin) that opens in the browser is covered by WCAG. so, bottom line is 'if it opens in a browser it is included.'

kathy: add on to Katie's comment, there are instances where the majority of the application runs in a browser but calls out a native app, then that should be included.

ev: add your questions to the questionnaire list by next Tuesday
... next agenda item: goodness criteria - propose discussing this on the next call

approaches for tolerance

<ericvelleman> http://www.w3.org/WAI/ER/2011/eval/track/issues/1

peter: aggregating individual results into a conformance statement - my fundamental concern is how to convey the results in a meaningful fashion; red, yellow, green ratings for the outputs are minimally helpful.
... confidence level is key

vivienne: does failing one item means failing overall - we have decided yes, but we may want to consider a 'not quite there' rating, a 'conditional pass'
... this approach encourages developers to get those items fixed to get a complete pass

katie: uses a similar approach. uses high, medium, low severity for the items; also has rating for impact for persons with disabilities, e.g., descriptive text on a spacer image vs an unlabeled link that would impact PWD more.

<shadi> [[I see three parts of the evaluation "output": (1) conformance to WCAG - yes/no; (2) some type of score - indicative to help "motivate" developers and decision makers; (3) report - to show the types of issues to explain the "severity" and guide developers on how to fix issues]]

ev: two discussions on tolerance - 1) confidence level or 2) severity/impact for PWD

peter: these two approaches are closely linked; confidence of what you are reporting is a given. but how do you convey 'good, but not perfect' or 'not horrible' vs sites that are mostly inaccessible, since this is very valuable info?

ev: we should continue this discussion on the list

illustrations

kathy: sent a couple of graphics to the list. want feedback on 1) what do we see as the differences in interactions with the different steps? and 2) how to convey this in the graphic?. e.g., the arrows don't overlap, but they should.

<Ryladog> Nice job Kathy

ev: graphics are colorful and clear, so let's get them finalized

kathy: working with WCAG group on the graphic, too.

<shadi> [[I really like the lify-cycle approach]]

WCAG 2.0 Evaluation Methodology Task Force Teleconference

04 Oct 2012

Attendees

Contents

comments until now

random sampling survey

approaches for tolerance

illustrations

Summary of Action Items