Silver Conformance Subgroup -- 31 Mar 2020

getting to FPWD

Rachael proposal

<Rachael> https://docs.google.com/document/d/1D18qg5pvne94jNvUwvj_Of36E8re6AB4tubdLpmhWOw/edit#

<Fazio_> Rachael talks first

<Fazio_> RM:wants to simplify definitions and approach

<Fazio_> Adding -1 when no testing was done was suggested

<Fazio_> numbers should equate to level of satisfaction

<Fazio_> 2 levels proposed basic (automated) advanced (manual)

<Fazio_> 0 would be nothing passes basic

<Rachael> +1 if more than 80% of possible tests pass (if 80% of basic tests are passed it becomes a 1, if 80% of advanced tests pass it becomes a 3

<Rachael> +1 if no errors occur within tasks that stops a user from a functional area from completing a task

<Rachael> Define the portion of the site that will be tested

<Rachael> STep 2: Run basic tests

David: Instead of having qualitative testing and quantitative testing, we rely on numbers to determine the quality?

Rachael: Simplest form possible,then boiled down to aggregate.

<Fazio_> R>: so scoring aggregate would determine quality

<Rachael> Run basic tests

<Rachael> If 80% of basic tests pass, capture errors for work and rate the test a 1. Then move to step 3.

<Rachael> If less than 80% of the basic tests pass, capture errors for work and examine the location of the failures. If the failures do not affect the ability of users in any of the functional areas to complete key tasks, rate the test a 1.

<Rachael> If less than 80% of basic tests fail and failures are within the key tasks,capture the errors and rate the test a 0.

<Rachael> Step 3: Advanced tests

<Rachael> If the site is already rated and fails, capture errors for work but don’t adjust the rating

<Rachael> 80% of advanced tests pass, capture errors for work and rate the site a 3.

<Rachael> less than 80% of the basic tests pass, capture errors for work and examine the location of the failures. If the failures do not affect the ability of users in any of the functional areas to complete key tasks, rate the site a 3.

<Rachael> If less than 80% of advanced tests fail and failures are within the key tasks,capture the errors and rate the test a 2.

<Fazio_> Chuck: can't achieve higher without totality of prior milestone

david: I think it makes sense to have automated tests be a baseline.

<Zakim> jeanne, you wanted to attempt to channel Peter

<Fazio_> RM:Jeanne: large sites might find automated testing difficult

<Fazio_> Chuck: are we reflecting different EU categories?

<Rachael> +1 if no errors occur within tasks that stops a user from a functional area from completing a task

<jeanne> NOting that these functional areas are not finalized and need work

<Rachael> Functional areas: Usage without vision , Usage with limited vision , Usage without perception of colour , Usage with limited hearing , Usage without vocal capability , Usage with limited manipulation or strength , Usage with limited reach , Minimize photosensitive seizures , Usage with limited cognition

David: who is "our"?

<Fazio_> RM RM:RM: advantage to this approach puts complexity decision in "our" (the Working Groups) hands

<Fazio_> Chuck can a test that is both automated and manual be either one chosen by tester?

<Fazio_> Jeanne: no

<Fazio_> Jeanne: WG's should not determinewhat tests are basic, and what are advanced because every test would have to be debated in the WG.

<Fazio_> RM: this method can be adjusted many ways

late my time too!

<jeanne> Charles Hall' email: Task Completion varies by task. Tasks can be granular and discrete actions, like activate the edit button in order to enable input; or broad utilities and goals, like update my account profile with new residence and employment information. So I think we first have to be a bit prescriptive of “task” with common examples.

<jeanne> ... In the UX practice, we often conduct what is referred to as “top task analysis” (some links to resources at the end). Essentially, when there are many broad goals a user may have with a service, we tend to divert attention and resources to supporting those found to be most important. For example, in an ecommerce experience, searching (locating) tends to be more important to the

<jeanne> user than checking out, but successfully checking out tends to be critical to both the user and the business. Task completion is easily measurable for each. In search, completion is when the user successfully activates and navigates to a link from among the provided result set. In checkout, it is when the transaction successfully posts and confirmation feedback is provided.

<jeanne> Top Task Analysis Resources:

<jeanne> Task Analysis<https://www.usability.gov/how-to-and-tools/methods/task-analysis.html> – usability.gov

<jeanne> Top Task Analysis<https://www.usability.de/en/services/methods/top-task-analysis.html> – usability.de

<jeanne> What Really Matters: Focusing on Top Tasks<https://alistapart.com/article/what-really-matters-focusing-on-top-tasks/> – Gerry McGovern

<jeanne> ... How we typically ensure users with disabilities can complete these top tasks, is during usability testing – usually moderated. We recruit people with disabilities for each study, with a goal of 50% participation. However, this is very inconsistent and many disabilities and specific functional needs are not represented.

<Fazio_> Jeanne: Alastair uses a barrier score

<jeanne> Alastair: 1. Barrier score

<jeanne> This is part of a standard audit, and is the auditors assessment of how big a barrier the issues we found are.

<jeanne> We give a score out of 25 for 4 categories (to give a pseudo-percentage). The categories are quite wide, and we tend to score in 0/5/10/15/20/25 (so not very granular).

<jeanne> - If something is a blocker to a primary task (e.g. can't access 'add to basket' button with keyboard), that's an automatic 25/25.

<jeanne> - If the issues are not blockers (e.g. missing heading levels), 10 or 15, but with consideration that things like colour contrast or language issues can wear you down.

<jeanne> - I don't think we've scored less than 5 in a category, there's always something!

<jeanne> ... That's explained as 'how likely is someone with a disability going to encounter an issue that prevents task completion'. I.e. not *can* they, but how likely. The main benefits are that it helps us differentiate the better/worse sites, prioritise issues, and explain improvements more clearly.

<jeanne> ... Typically our clients want to improve their accessibility and/or fill in an accessibility statement. The ability to compare across sites (fairly) is not a big factor.

<jeanne> ... 2. Usability testing

<jeanne> Full usability testing, usually with around 12 participants drawn from the general population (with disabilities, obviously). Working with the client to define likely/desired tasks to set the participants.

<jeanne> ... I'm sure there is an in-between level that would work well if you are in the internal team, but as external consultants those methods work well together or separately.

david: We do 2 different...
... we do 2 different kinds of tests, human factors, where you let the client or participant tell you what's going on. We define the primary goals.
... We bring in the users, we ask them how they expect to complete these tasks, or what is it they want to do.
... We compare to the pre-conceived notion of what we expect them to do.
... the other thing we do is "like" scales, 1-5, how difficult, etc. We ask what they would do to make it better, what barrier to remove.
... We do psych tests. Devices that go on head to scan, monitor brain wave activities and monitor while they perform activities.
... we can measure various things like stress, regions of brain activated. These aren't lab grade, but they do well enough.
... They are very easy to have the results analyzed.
... You can utilize neuro-psychologists.
... They have their own scales. You can do eye tracking.... yesterday I sent info on all this.

Chuck says that's really impressive.

How much for one of those brain scan devices?

I want one.

Jeanne: I think all three of these have talked about it in the user research type model. Is there anybody who has experience doing this with more of a coga-walk-through approach?
... I think we'll need to include that for the smaller companies that can't afford expensive user research.
... I'll ask Charles Hall. He started us thinking about that.

David: I sent a minute ago, Kathy Chen from HP, she sent in an anonymous sample.
... ... she's happy to contribute.

Jeanne: I'm looking, but it's not what we need. She's talking about how they test it as opposed to "here's the results we obtained".
... That we could...

David: Sure, it's suplimental.

<Fazio_> blockers to primary tasks arethx Chuck :p

Jeanne: I'll save it and see if there's something we can do with it. It's kind of difficult to talk about some things, we use the same words for different meanings.
... She used "test cases".
... 'sall good. Nothing goes to waste.
... Given some ideas we've heard tonight. Any ideas on what we could do?

<Zakim> Rachael, you wanted to say that this moves the complexity into deciding which tests are basic

back to you David.

<Fazio_> Kim:clear distinction between user research and studies and post release in her company, but this group is lumping them all together

david: That's why Jeanne asked about cognitive walk-throughs. Be more like taking a wire frame and walking through the steps.

<Zakim> KimD, you wanted to ask about kinds of testing

<Fazio_> Jeanne: when task completion testing first came up we talked about acknowledging extra accessibility effort

<Fazio_> Kim: our company user tests during iteration

<Fazio_> Jeanne: Task completion testing should not be done at the end of development only

<Fazio_> Kim: Task completion testing helps determine sample

<Fazio_> Kim: task completion is used in her company to determine testing scope

<Fazio_> Kim: task completion testing should be done att end

david: We have tons of attornies in the bay area!

janina: Groups of blind attornies in nfb and ...

david: do you offer honorarium compensation?

Kim: Not sure.

hardish stop at top of hour... gotta go.

Thanks David for scribing!

- DRAFT -

Silver Conformance Subgroup

31 Mar 2020

Attendees

Contents

getting to FPWD

Rachael proposal

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output