W3C

WCAG 2.0 Evaluation Methodology Task Force Teleconference

30 May 2013

See also: IRC log

Attendees

Present
Shadi, Martijn, Vivienne, Eric, Liz, Peter, Moe, Detlev, Katie, Mike
Regrets
Kathy, Tim, Kostas, Sarah
Chair
Eric
Scribe
Vivienne

Contents


Current state of comments

<ericvelleman> <http://www.w3.org/WAI/ER/conformance/comments-20130226>

Eric: disposition of comments - looking at the way it is formatted - same format as last time and has tried to group items
... received many comments now - 96 in total, and others that are editorial that aren't included in this list

<ericvelleman> http://www.w3.org/WAI/ER/conformance/comments-20130226

Shadi: EOWG wiki had background discussion along with the comments

Eric: explained the structure of the Disposition of Comments document
... approach last time was to split them into larger groups and address with a proposal in an edited draft with a request for review for the TF - suggest same approach

<Detlev> Vivienne: happy the way its set up

<Mike_Elledge> +1

Eric: will make an edited draft based on the public working draft and ask for TF to review

anyone against this?

<MoeKraft> +1

<MartijnHoutepen> +1

Eric: close this point

<Liz> +1

Test run

Eric: changed the survey questions for 2&3 to make it more clear what was required. Survey 2 had been answered by a number of people, so I left the questions there at question 15 or so. You can complete questions after that if you want to. Both surveys are now open.
... you have the opportunity to fill in the information until next week when we'll have a summary of website 2 and hopefully website 3
... website 1: easy - gaming, 2: more complex, 3: library
... has anyone looked at the changed surveys

<ericvelleman> <https://www.w3.org/2002/09/wbs/48225/testrun2/results>

Eric: url for the website 3 now has a direct link to the url
... link to survey 2 in the answer page after the introduction you can go down to question 11, optional questions below

<ericvelleman> https://www.w3.org/2002/09/wbs/48225/testrun3/

Eric: testrun 3 page in the introduction there is a direct link to the library part of the website
... survey 1 is closed now
... need the conclusions to add to the disposition of comments about the outcomes of the surveys
... adding it to the disposition of comments makes it easy to see if we've fixed them
... if you want to address survey 2&3, please do it as quickly as possible
... Martijn will do the comment list for survey 2
... who can summarize the outcome of survey 3?

Detlev: should we wait till we've got some replies first? There is sufficient to talk about from survey 2.

Eric: we'll be working on the disposition of comments, so we have enough work and can give this more time

<Detlev> yes

<Mike_Elledge> +1

<MartijnHoutepen> +1

Eric: we can do website 3 in a couple of weeks

<Liz> yes

+1

Eric: next week we'll discuss the outcome of website 2 and will send a reminder to people to complete the survey

Sampling

eric: Discussion: when to decide not to sample, when to sample
... Discussion: hererogeneity and homogeneity

<MartijnHoutepen> http://lists.w3.org/Archives/Public/public-wai-evaltf/2013May/0042.html

Eric: discussion around developers using different coding on different pages of the website and so taking perhaps 1 or 2 tables would not be enough - homogeneity of the website
... We need a good definition
... what causes are there for this situation
... how can an evaluator determine if there is homogeneity/hetergeneity on the website

Peter: what has tables got to do with it - also in coding it can be just the html
... this isn't a hard & fast or precise thing, but more of an indicator as to the confidence level of the sampling
... you could have 100 pages all coded in the same style, which will indicate that probably all 10,000 are similar. However we can have a host of coding styles within the sample, and this would decrease the confident of our sample.

Eric: tables is just an example of something you look for in an evaluation

Detlev: not sure what makes that difference in the methodology, how much would it help developers. You can have many differences such as script base, it's down to the evaluator's sampling

Mike: one of the criteria for a website being acceptable is that there is consistency throughout the site. Whether 1 developer has used a different format for say a widget or there are several developers, this is not the key thing. The key thing is the consistency. There is value to pointing it out to evaluators to watch for it. Maybe change 'style' to 'design patterns' or similar. 'Style'

may refer more to coding.

Shadi: regarding Detlev's comments - don't make it too complicated. Looking at Step 3 Sampling - intro paragraph talks about authoring mechanisms rather than styles - maybe we can be more specific with our wording. Not only the types of pages dictate the sampling size, but perhaps also the way the page is coded.

<shadi> http://www.w3.org/WAI/ER/conformance/ED-methodology-20130128#step3b

Shadi: 3(b) (link above) change from selected 2 distinct pages to 1 distinct page. Maybe we can reflect from the test runs to see how large a sample is needed.
... maybe in that same step we should enumerate the different types of pages that an evaluator should select as in step 2(c) and talk about the coding style as one of the parameters which would grow or shrink the sample size. It is difficult to put objective criteria to decided what a homogeneous or heterogeneous website.

<shadi> http://www.w3.org/WAI/ER/conformance/ED-methodology-20130128#step2c

<shadi> [[Web pages with varying styles, layouts, structures, and functionality often have different implementations of accessibility features. They are also often generated by different templates and authored by different people.]]

MH: agree with Mike & Shadi. Can add something more in the general procedure - also see email.

Eric: is it so important that we have to add more text about it.

Peter: the confidence you have in your sampling size needs to be adequate and may be influenced by the homogeneity of the coding used. This may influence the likliehoood that your sample is capturing everythng.

<shadi> +1 to Detlev

Detlev: need to consider the need for sampling. Need to think about the purpose of the evaluation. YOu may gain little in the overall result if you add more different bad pages, but it's good for the designer.

Eric: let's keep discussing this on the list

<shadi> http://www.w3.org/TR/WCAG-EM/#step2c

Shadi: we can sharpen the text here - also 2(c) where we talk about what constitutes a different type of page.
... it's not about building larger and larger samples. In some cases you come across something that is fairly accessible and you want to make sure that this is representative. We need to sharpen the terminology and reiterate that the coverage needs to get bigger depending upon the purpose of the evaluation and the type of the website

Peter: if the website is so awful that you've seen enough, it turns the focus of our work on its head. We're trying to help someonemake a solid compliance claim - aiming for perfection. "most websites are so bad you've seen enough", then we need to focus a lot more work on how you report the lack of perfection.

Shadi: in the document at the beginning we talk about the use of the document. In cases where you're looking for perfection (accessible website) and you want to verify that or issue a conformance claim is oneof the use cases that this methodology is targeted for. Other use case - how good or how bad - what do I need to fix in order to conform
... We need to think about this in regard to reporting. Depends upon purposes - conformance - you can stop early if conformance is the goal once there are a certain number of errors. Even though you know a table is inaccessible and you've already realized they fail conformance you may continue to check to show the different types

<shadi> +1 agree that we need to look more deeply on the reporting aspect

Eric: words 'reasonable confidence' step 3 and in introduction. The concept of using this as a way to conclude what the confidence is for the results that you've gathered.
... we should discuss the question of 'to sample or not to sample' on the list
... any other issues?

Detlev: are we open to changing the question of evaluation purposes? developer/designer for new websites so they ask for testing and another situation - 2 main scenarious for testing and they don't seem to be so well reflected in the 3 different reporting types - detailed info to the designers or informing/challenging a conformance claim

<Detlev> Can do

Eric: will put it on the agenda for the next telco

Shadi: are there only the 3 different use cases - for future discussion. We may want to think about other use cases.

Eric: we could put it on the list

<Detlev> I will trigger that yes

<Detlev> conscientious

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.138 (CVS log)
$Date: 2013/06/09 11:58:22 $