WCAG 2.0 Evaluation Methodology Task Force Teleconference

13 Feb 2014


See also: IRC log


Kathy, Mike, Shadi, Liz, Eric, Detlev, Vivienne, Gavin, Tim, Martijn
Sarah, Moe, Alistair


Test-run survey proposal

Eric sent around a draft questionnaire

<shadi> http://lists.w3.org/Archives/Public/public-wai-evaltf/2014Feb/0006.html

Received a few commments and included them

Uploaded new version

E: looked at methodology, tried to replicate most important steps.

E; If could put answers into spreadsheet could go through it easily.

E: Intro: where to find working draft. Note how long it may take. That it is confidential, published anonymous.

K: What are we using.

E: Qualtrics.

K: Has some a11y bugs that they're working on.

E: In Q can indicate what is/not accessible. So only used items called accessible.

K: :Items that are accessible may take users to external pages that give directions for screen reader users. Some bugs still exist.
... Will test drive survey.

E: Has numbered questions. Not logical numbers, but clearly indicated in tool.
... Enter name, practical experience, days evaluating in 2013 (indicates experience) and email address.
... Some comments not to as for email address. Will suggest making information optional.

K: Please enter email address or phone number so can contact you.
... Ask for email or phone number so we can contact you.

E: Survey follows WCAG Steps. Pls remember not testing you. Follow methodology as closely as possible. Then directions for how tosave contents.
... Only saves,however, if you press Save before going back. Otherwise lose everything.

M: Is it possible to show a warning message?

E: Will look into it.

<Detlev> Are you using the queue today or are we supposed to speak up queueless?

E: Scope. Q105 Eval scope. Part we have to fill in. Should do on maling list. Scope of evaluation. Put @ signs to indicate we need input. Need more work.

D: Comment to Q101. mentions w

<Detlev> Q101 How much practical experience do you have evaluating websites using the WCAG2.0 guidelines (or other schemes / regulations based o WCAG 2.0)?

D; wcag change to one pasted into survey. Test against German translation BITV.

S: Maybe we can reformulate question. How much experience eval websites.

D: Just to make it clear that it counts. other countries have their own regs based on wcag.

S: Methodology does rely on WCAg 2.0. If using something else would be apples/oranges.

D: Why said or other schemes based on WCAG 2.0.

S: What's purpose of question?

E: If people get different results is it caused by different pages or their experience. Not somehing that's definitive, but gives context.
... If using differently may get different results.

S: # Days gets at it too.

E: Could use one question, but it's kind of a control. Great experience may be 5 days to some people.

S: Not disagreeing with Detlev, just wonder if it will add confusion.

E: Could put some info that if persons use a national version with same checkpoints and success criteria would be okay.

<Vivienne> I agree with Kathy. Maybe a scale? 1-25, 26-50 etc?

K: How many days evaluating websites WCAG 2.0. do all the time, but have no idea. Suggest # of websites and perhaps ranges.

M: +1

<EricVelleman> +1

<Vivienne> list or range

E: Could turn into bulleted list.

<Detlev> queue

E: Q104. The intro to eval scope.

S: Gavin first.

G: How many websites. could do one, but could be very large with sections. Should take into consideration.

E: Will come up with proposal.

<Detlev> approx. number of days/year may be more expressive

S: Back on requirements for scope?
... Base line seems too vague. What we would usually expect?

E: Just typed something there. <vague, not big>
... Could put very specific things there.

G: Massive? Gets down to browsers, versions, etc. Important that mentioned HTML, ARIA, earlier versions of AT important as well.

K: Also add asking what kind of applications ahve been done, i.e., mobile. Gives indication of level of evaluations they'e done.

E: will have to spend more time on this on list. Have to decide accessibility benchmark.

S: Have to be precise. Enumerate browsers and assistive technologies and combinations, to really ahve list that we assume that website should work for.
... Isn't that how you would do testing. Define what needs to work and that baseline is addressed during testing.

K: Have to prioritize though. There are so many combinations and limited budgets.

S: Agree, can make it short list. But need something so we have a target we're shooting for. Need to define minimum bar.

V: When we test give client a questionnaire if there is specific requirement they have. Ex Aus gov't site may requre only IE 9.

E: If we test a website that is only required to be IE6, then will be only
... that.

S: Do have a definition of that in methodology. Can't use an intranet website so can test. Assume it will be english as well. Might be dependant on type of website.

E: Propose not to test Dutch website!

S: Put that in definition.

E: Let's come back to scope on mailing list. Decide on what site we want to use, part, baseline, target, etc. Will be easier once we have website in front of us.

<shadi> [["how many minutes" -> "how much time"?]]

E: Next section. Technologies relied upon (can be selected in Qualtrics). How much time to do, easy to do, comments, etc. For Step Two.
... Missing anything?

S: Just minor. May to change from minutes to how much time.

V: Step Two, don't we have identify different templates?

E: Yes, but would have report on all the different parts that come back in Section 3. In the reporting we put others as optional. This is only non-optional one.
... Step 3. Representative sample. Read it then paste exemplar instances, maybe past urls or descriptions instead.

<shadi> +1 to "provide the urls or descriptions ..."

Q: 115. Change to how much time. Then how chose random sample.

T: Have any indication of how big the site is, # pages?

E: We determine scope.
... We will be telling them which site to review for consistency.
... Were you able to get representative sample. Not scientific, but indicates comfort level.

T: Tehre is rationale, right.

E: Yes. In line with what usually use. Q119 How easy to follow instructions.
... Select the urls, paste them, then go back to them.
... Step 4. Audit the sample. Introduction, then a page for each SC.

Q 121 not a qestions so much as remark.

S: Thought earlier version was AA. Need to go with that.

<Detlev> agree about using AA

E: Okay.

S: A bit more work, but no way around it. Trying to make it easier.

T: If one goes through it would check A and AA accessibility.

E: If we dont SC1, then all applicable for A and AA.

S: Maybe 36, for total, not each.


E: Agree to expand to WCAG 2.0 AA.

E. Please read for audited sample. Then real tests. Have to decide, easy or hard work for E.

E: Radio buttons or check boxes. This is relevant because on a page doesn't pass. But next page fails. Do we want possiblity for marking page pass...

V: We do both. First page is p, f, na and summarizes them according to sc

E: Can't use radio buttons.

V: dropdown for p, f, na?

D: What would be the purpose for recording each p, f for each page. May not be same pages. Not comparable. Someone may have made different decisions. Can't really compare, unless go back to every page. Waht would work is for people report common problems. Difficult to rate 1.3.1 for some reason. Detailed results for each page won't be usable.

E: Thought if we do it that way would be pushing people into one direction. Wanted to give them the possibility.

D: But will we be able to collect meaningful information. Comments, not necessarily specific results.

E: So you would prefer radio button that opens comment field?

D: Need a comments field, not so much radio button or checkbox.
... Checking P/F no way to calculate results. Maybe one P/F for entire site. Can process since not dependant on particular pages.

S: I agree with Detlev. Particularly since later in process want to see how the procedure worked. Which page would require more analysis. Would skew questions about overall performance.
... Step 5 A, looking at minimum requirements,

<shadi> http://www.w3.org/TR/WCAG-EM/#step5a

S: Reading from Step 5, right after list there is a note, need to decide what the desired granularity is. can't compare two different levels. Need to set in scope.
... Would like to simplify so have greater chance of people completing the test run. Table might work. Show SC with PF comments.
... Rather than by SC have it done by entire website. Otherwise do by page.

E: But if don't put that level of specificity may wind up with differing, but intresting results.
... Need people to paste where people found failures. If all pass, one fail, can see what difference is.

S: If you want to see a long vs. short version, maybe we should have two tests. Was not clear to me level of granularity wanted.
... Should not leave it open, otherwise won't know what we get.

E: But we have a method. If people have methodology A, B, C then we won't know how they'll interpreting our methodology.

S: Trying to test too many aspects? Clarity, efficiency, confidence. Lots of parameters.

V: Understand what Detlev was saying, part of it is seeing how people will use mehtoldogy. Would expect them to use all sc on each page. If they don't, won't be using it as we envision.
... If they don't use each of guidelines then how we know what they'll do in practice?

D: Practical testing refers to wcag techniques as one approach. In practice people break items into individual checkpoints in 1.3. If we don't have that we won't know what causes 1.3 to be rated as fail. If we stipulate that any failure leads to failure of page then we will be consistent and comparable. Whether it will be meaningful is another question. If we checkbox then we can see there is variability, which would be useful.

M: I'm confused.

E: We decided not include in document that sometimes an error is okay. Every error is an error.
... Rest will cover next week. You can send ideas for websites.

S: CSUN. Contract is still not yet signed. Scheduled to have room for both Monday and Tuesday. About 90% chance we are meeting Monday or Tuesday.

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.138 (CVS log)
$Date: 2014/02/16 10:11:11 $