W3C

WCAG 2.0 Evaluation Methodology Task Force Teleconference

11 Oct 2012

See also: IRC log

Attendees

Present
Kathy, Shadi, Martijn, Eric, Mike, Katie, Alistair, Aurelien, Sarah, Tim, Moe
Regrets
Tim, Vivienne
Chair
Eric
Scribe
Alistair

Contents


Graphic Illustration

kathy - sent out the diagram - two versions - need to get some feedback.

kathy - is the diagram usaable now, could it be updated later

kathy - circular one could be the one we use, based on the feedback so far

eric - any comments

shadi: kathy and I have been chatting - graphic is better than the initial art work
... we should use the graphic for now, and make alterations later if necessary
... the diagram should only be used to support an overview of the later content
... if the later sections become more complex the graphic would have to be updated in any case, e.g. after the sampling discussion
... thanks for the good work Kathy

kathy: ok will finalise and send round

eric: will use the graphic in the next editor draft
... moving to agenda point 2

Random Sampling Survey

<ericvelleman> <http://www.w3.org/WAI/ER/2011/eval/track/issues/9>

eric: a new version of the survey above has been sent round
... it contains 22 questions
... looks a little like a research project, rather than something to just start a process
... few remarks on the English, so changes will be made
... are there any comments on the survey

<Mike_Elledge> Your english is much better than many people I know in the U.S.

kathy: looks really good, one thing, we have text in 16 about automatic testing tools. We might add a bit to find out how people use the automatic testing tools as this would be very useful.

eric: yep

mike: liked the questionnaire, couple of things relating to the terms
... if we provided a link to the definitions e.g. random vs structured it could be helpful

eric: could we link directly to the working draft
... we could add easy question relating to sampling

shadi: several comments
... comment1 - survey is good, comprehensive etc...
... we should close this topic before the f-2-f meeting
... comment2 - call is sampling process rather than random sampling, just to reframe
... comment3 - second half is independent, and overlap in certain respects
... we want to see the role, but what they do it independent from their role

eric: using the correct checkboxes you can set a frame around who and what people are and do

shadi: question 2 is a bit extensive
... evaluate websites in early or development stages is ok but not what the methodology is about
... question 5 - what is automated sampling?

eric: automated sampling - using a tool to do the complete sampling

shaid: may be better to seperate the tool from what is being done, as it may be doing several things structured, random or both

<aurelien_levy> maybe it's better to ask : did you use a tool to make the sampling ?

mike: automated means using an automated evaluation tool to spider a site

<aurelien_levy> as an individual question

shadi: solution to open brackets after the word structured - explaining what structured actually is
... structured and random are still independent of tool use
... how the automation comes in is a seperate issue

eric: is it covered in question 6

shadi: in 5 it should be independent of the tool, with the next question being about the tools and how they are being used

<aurelien_levy> did structured sampling covering sampling based on number of template ?

mike: question 12 - add a new choice after 51-100 it depends see question 13 below
... 'it depends' or more than 100

shadi: agreed - maybe add a 'other' category
... question 19 may need to be split due to technical issues of survey generator

mike: question 17 - how is web applications defined. Is it just server-side? or are we looking at websites

<aurelien_levy> sam question for mobile application ? is it web app or nativ app ? or both ?

<shadi> mybe add "(client-side)" before "web application"?

eric: agreed, this was something which needed to be checked

mike: web application should be defined

eric: it is defined lightly in the public working draft
... we use web application in our definition of website
... need a good definition of web application

shadi: look through the documents i.e. WAI ARIA, web app working groups

<Tim> \me we should reuse existing definitions if we can (QA item)

shadi: maybe we can link to the term web application from the draft, and place the words client side next to it
... also looking at app, native app, etc

eric: a mobile application also needs to be defined

<aurelien_levy> mobile web application / mobile website

eric: every web page could be a mobile application, so a clear definition would be good

<shadi> [[How do you adjust your sampling approach for (client-side) rich web applications? This includes mobile web applications.]]

<shadi> [[#17 How do you adjust your sampling approach for (client-side) rich web applications?]]

eric: shall make the changes and additions, with explaination in 17 and 18, in 16 will add to please explain on how to use automated testing tools, and in 12 shall add a new button with add below
... will also change 5, and change 1 - which will be split

<shadi> [[#18 How do you sample mobile web applications (please specify if they are native or web-based mobile apps)?]]

eric: shall put it in a mail and a new survey

kathy: what survey program will we be using. can we segment stuff.

eric: survey tool will be the W3C survey tool

<aurelien_levy> maybe a question can be did you do a sampling ?

kathy: other programs allow a greater range of possibilities

shadi: its good to have the results archived

<aurelien_levy> I know some people that didn't do any sampling and simply look trought the website searching for errors

shadi: we could in addition use other tools, but the results would have to be archived
... its a good point as the WCS does not have the strongest export facilities

eric: will discuss in the office

kathy: survey monkey?

<Zakim> shadi, you wanted to ask about extending this to WCAG WG and ERT WG

shadi: should we extend the survey to WCAG and ERT WG

eric: good idea
... we will have a think about it - working first on the changes proposed

<aurelien_levy> maybe a question can be : did you do a sampling ?

eric: by tomorrow we will now where to place this as a survey
... point 3 agenda

Tolerance

eric: yes or no conformant, some scores and the report which could add information about how to fix things
... confidence level is what needs to be discussed
... relationship between sample and confidence
... how can we rate the confidence output
... confidence level

shadi: could you explain confidence level, as it could be a loaded term

eric: if you use a sample, there still could be issues - if you have a high confidence level in the sample the chances of finding other error is lower

<Tim> http://stattrek.com/statistics/dictionary.aspx?definition=confidence_level

<Tim> confidence level definition:

mike: confidence level in terms of statistics?
... could be complex to do, as this is subjective - there are so many different variables
... maybe someone with experience could tell us how to achieve this

eric: the sample must be representative of the whole website

<aurelien_levy> the question is more : is it useful regarding the work needed to prove or calculate your confidence level

tim: we have a stats division

<aurelien_levy> i'm not sure of the quality of my connection but i can try

alistair: too complex to worry about

aurelien;could be useful to have a good confidence level, but the work needed to calculate the level is too work intensive

aurelien: there must be an easier faster way to do it

eric: are you saying the confidence level is linked to the number of pages

aurelien: some people don't use sampling at all

eric: if you want a high confidence level you need to increase the number of pages being assessed

mike: if there is a method to set confidence levels, we need to make sure that people don't mix confidence levels of a good evaluation with confidence in sample

martijn: some people look at the number of people checking it instead of an actual confidence level in a sample

shadi: important not to get bogged down in this
... the time when you need such certainty is a small case like certification

<Mike_Elledge> Have to go for another call....bye!

shadi: testing phase is Jan to Jun, we could get feedback from the methodology - cross testing etc
... one more point - important to differentiate between confidence levels and reliability

shadi what is the command to make minutes?

<shadi> [inter-rater reliability]

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.137 (CVS log)
$Date: 2012/11/13 18:35:12 $