See also: IRC log
<MartijnHoutepen> EV: We will collect all the received comments into a disposition of comments and publish this after the comments close
<MartijnHoutepen> http://lists.w3.org/Archives/Public/public-wai-evaltf/2012Oct/0017.html
<Kathy> yes
yes
<Liz> yes
vivienne: evaluation commissioner - maybe need question; to what extend do the views of the commissioner change the sample selection?
peter: concern with survey is that is focused on external evaluation and sampling. If this the only view we have then we'll miss other users and use cases, e.g., web apps, templates, boundary condition testing.
martijn: add questions if you like
peter: the questionnaire would have to expand significantly, so we might want to break them up into two questionnaires
<ericvelleman> sorry, my phone was also on internet, so I will have to dial in again
peter: different survey for structured reviews for websites that are too big, or have too many permutations
martijn: maybe another questionnaire for in-house testing
peter: craft a placeholder for random sampling for other types of website
<shadi> [[I wonder if we need two questionnaires that follow each other to dig deeper into particular areas (such as web applications) rather than splitting by in-house/external modalities?]]
Kathy: need more for web applications. right now we have #12 that can start the discussions. is that enough?
peter: how do you adjust your sampling approach? your testing approach? adjustment could be to remove Random from the title and add it to specific items.
kathy: #3 - this would change depending on what type of evaluation is done
<vivienne> * vivienne agrees that we should change the title and make the survey about sampling, not just random
kathy: mobile apps, websites, web apps, etc.
... methodology is very focused on websites; internal vs external views of
applications should be addressed
ev: need to account for these ideas
katie: agrees with peter and kathy re: internal vs external views. when we figure out the universe of this document, we need to consider historic components of the project
shadi: mobile has to be in the scope
+1
<vivienne> yes, mobile has to be there
<Ryladog> +1
<Ryladog> +1 mobile
<shadi> [[not advocating to remove mobile but wondering about terminology difference between "mobile app" versus "mobile website"]]
peter: include mobile as part of the universe. question about re-review: white box testing where the engineering group tells you we made lots of changes in these areas, so we focus on the new areas. how would this adjust the sampling?
kathy: also missing is the role of automated testing. do automated tests change the sampling strategy?
ev: good idea for a new question
<shadi> +1
<MartijnHoutepen> +1
vivienne: likes adding question about automated testing. sometimes uses automated tools to help decide on pages to test for manual review.
+1
ev: send proposed questions/revisions to the list
<vivienne> sure
peter: native apps shouldn't be within the scope.
katie: agrees that native apps shouldn't be included, but if it opens in a browser it should be included.
peter: if it's not something that users can
direct their browsers to view, then it shouldn't be included
... e.g., iTunes is not within scope
<shadi> [[propose a question to compare *sampling* in mobile web apps vs in traditional web apps]]
katie: the excell version (plugin) that opens in the browser is covered by WCAG. so, bottom line is 'if it opens in a browser it is included.'
kathy: add on to Katie's comment, there are instances where the majority of the application runs in a browser but calls out a native app, then that should be included.
ev: add your questions to the questionnaire list
by next Tuesday
... next agenda item: goodness criteria - propose discussing this on the next
call
<ericvelleman> http://www.w3.org/WAI/ER/2011/eval/track/issues/1
peter: aggregating individual results into a
conformance statement - my fundamental concern is how to convey the results in
a meaningful fashion; red, yellow, green ratings for the outputs are minimally
helpful.
... confidence level is key
vivienne: does failing one item means failing
overall - we have decided yes, but we may want to consider a 'not quite there'
rating, a 'conditional pass'
... this approach encourages developers to get those items fixed to get a
complete pass
katie: uses a similar approach. uses high, medium, low severity for the items; also has rating for impact for persons with disabilities, e.g., descriptive text on a spacer image vs an unlabeled link that would impact PWD more.
<shadi> [[I see three parts of the evaluation "output": (1) conformance to WCAG - yes/no; (2) some type of score - indicative to help "motivate" developers and decision makers; (3) report - to show the types of issues to explain the "severity" and guide developers on how to fix issues]]
ev: two discussions on tolerance - 1) confidence level or 2) severity/impact for PWD
peter: these two approaches are closely linked; confidence of what you are reporting is a given. but how do you convey 'good, but not perfect' or 'not horrible' vs sites that are mostly inaccessible, since this is very valuable info?
ev: we should continue this discussion on the
list
kathy: sent a couple of graphics to the list. want feedback on 1) what do we see as the differences in interactions with the different steps? and 2) how to convey this in the graphic?. e.g., the arrows don't overlap, but they should.
<Ryladog> Nice job Kathy
ev: graphics are colorful and clear, so let's get them finalized
kathy: working with WCAG group on the graphic, too.
<shadi> [[I really like the lify-cycle approach]]