Silver Task Force & Community Group -- 03 Apr 2020

<jeanne> model

<scribe> scribe: bruce_bailey

<Chuck> I am in circumstances where I have to be extremely quiet, because I have two other family members in critical phone calls at this time. So my audio participation will be zero if possible.

Chatter about Zoom privacy and security issues

Zoom adoption has been problem in federal space too

s/I look AWSOME!//

@chuck try the slash me command...

<ChrisLoiselle> +present

<Chuck> scribe?

<Lauriat> https://docs.google.com/document/d/1D18qg5pvne94jNvUwvj_Of36E8re6AB4tubdLpmhWOw/

<Chuck> I can

<Chuck> I will back Bruce up.

Racheal to give overview of Tues night call

<Rachael> https://docs.google.com/document/d/1D18qg5pvne94jNvUwvj_Of36E8re6AB4tubdLpmhWOw/edit#heading=h.qrkd2z50hnm0

Rachael: drafted a document that built on Bruce's proposal, and divide tests into good / exellent
... basic test would be: Does page have heading?
... more advance would be if headings semantic and correct
... middle would be in between
... what makes a basic test versus advance could be divided up
... did automated versus manual for now

Jeanne: thanks, one of the things that came out of meeting was GL versus task completion
... Kim raised important point that task completion not trigger unintended consequence of having PWD test at end instead of at design phase
... Jeanne agrees that user testing needs to be at beginning

Peter Korn: Sounds link there was not discussion about levels of adjectives with how central the portion of website used for testing is?

Jeanne: Yes, but we started with just adjectival testing first

Rachael: We did not talk about it, but the centrality of the content could effect adjective assigned

Charles Hall: Suppose issue is my iFrame needs a Title

scribe: but suppose technical requirement is not met, but not functional impact. Is this a Two?

Rachael: No, thats a 1 because it is not critical.
... plus 1 if passing more advance test, so that example stays at 1

Peter Korn: Could we create a shared doc for exploring edge cases for running proposals against?

<Rachael> +1 to a set of test cases

scribe: Charles example of 1 pixel un-titled iFrame (for tracking) is great example to see how it is scored.

Jeanne: Working on it now.

<jeanne> https://docs.google.com/document/d/15i3eW2USEctoZtWkhTzpzJEm1zRSckgHPM8J4Wo0Wzc/edit#heading=h.wwerdh4hmke

Shawn Lauriat: agree good idea. Where?

[see Jeanne's link]

SL: Think we need examples like JF with good to terrible, arranged per GL
... but we need similar approach for alt text.
... One thing to be careful, using examples to test GL scoring, not models for Understanding
... Not for complete and total conformance, but for stress testing scoring
... Might be more granular than we need.

Jeanne: We will let Peter plug in some examples, come back to it on this call or the next.

SL: OK

Bruce has no comments.

<CharlesHall> question on “Is the task a guideline task vs a task completion”

<CharlesHall> do we have a definition of each?

Bruce: the plus ones were not completely clear to me

CH: Do we have definition on GL task versus task completetion?
... What is a GL task?

Rachael: that is a typo, should be GL test

<Rachael> corrected: Is the task a guideline test vs a task completion test

Rachael: The idea is to get base score, basic vs advance
... if it does not pass basic test, then could adjust
... look at what fails
... picked 80% for starter
... if no critical issues, then could give +1
... example, errors are in footer only, so maybe go from 0 to 1

<Fazio_> we need to some how flag certain failures that contribute to user stress and mental fatigue, adding them up to come up with a cognitive failure. Mental fatigue user stress can shut users down mentally, and trigger psychotic episodes, depression, PTSD

Rachael: if all basic pass, then can look at adding 1 or 2

<jeanne> +1 to overall level of cognitive failure

David Fazio: non-critical failures need to aggregate to result in failure for coga reasons

SL: Yes, we want to avoid passing and just marking it down to being too hard
... want to get away from that sort of issue with previous testing
... something we want to try out.
... For each test, we have results one can point to, but want to recognize added cognitive load
... but re-look at end for wholistic review

Charles Hall: Can look at each, but need to look for cumulative effect

DF: Each test could have a number for mental stress, then look at additive result at end.

<PeterKorn> David - makes sense to me.

DF: So two type at looking at problem: At each step, and at end.

Rachael: That is intent for plus ones
... Mostly a problem for COGA, but can be an issue with other disability types

<CharlesHall> so we need a death by a thousand cuts test

<Rachael> you would never have more than a +1

<Rachael> +1 if more than 80% of possible tests pass (if 80% of basic tests are passed it becomes a 1, if 80% of advanced tests pass it becomes a 3)

<Rachael> +1 if no errors occur within tasks that stops a user from a functional area from completing a task

Rachael: more true for COGA, but happens for other users: Many small errors can become large errors over time.
... corrects scribing: never +2, only +1

PK: I like merge with task analysis, site riddled with paper cuts needs to part of analysis

DF: Agreed, won't apply to everything or even each step, but still can be a huge obsticle

<Zakim> sajkaj, you wanted to say that cognitive stress may reduce with site familiarity

Janine Sajka: How does familiarity with site factor into testing?

<Zakim> Chuck, you wanted to say that if it's once every 10 years it really needs to be right for the once.

Census site (once every 10 years) needs different that work time and attendance system.

SL: Agreed cognitive load seems to first instance.
... Census has learning curve and once-and-done
... Office system can take into account familiarity over time.

<Fazio_> I wonder if we can benchmark neuropsychology evals

continue discussion of integrating different proposals in Silver

Jeanne: We are looking at pieces from last two weeks and see if and how might fit together

<Zakim> Chuck, you wanted to say we have a lot of recent proposals, each with strengths and weaknesses. Can we review strengths?

Jeanne: the other recent topic is multiple currencies

<Chuck> Chuck will scribe for Bruce if needed.

<Chuck> Bruce: I have not seen multiple currencies added to the other proposals. I think that's ok. Rachael has a system for going above and beyond the baseline, which is... the core thing I'm addressing

<Chuck> Bruce: how do you allow for people doing more than the minimum for some guidelines without ensuring you are circling back to the other guidelines. Seems addressed in Rachael's.

<Chuck> Bruce: JF keeps going back to fico, multiple currencies could be an issue there. new thing from me, if there's fewer achievements, maybe it becomes less of a burning need for multiple currencies.

<Chuck> Bruce: settlement and concent decrees, there's a requirement to test with users. They don't talk about pennies and ribbons, just requirements. Maybe just a few achievements to address with the adopting agencies.

<Chuck> Bruce: Even if it doesn't show up in scoring it's still a requirement.

Peter Korn: Where would see WCAG2 AA in alternative currencty?

PK: Assume WCAG2 Double A is the legal requirement. But page does a few AAA items. How to get credit for that?

SL: We are moving away from this strict divide, so how does this work as a problem?

PK: Using current good-enough goal which pencils out to AA, what do we do with AAA things? How does a site get credit?
... If shoot for certain bar, how do things about that become credit with second currency?

Jeanne: We are trying to get away from saying certain SC are more important than others.
... so that has left AAA SC being neglect. We want to change that.

<Chuck> Chuck can't talk, but that's my q (I typed it in).

<Zakim> Chuck, you wanted to say take sign language as an example. If a site includes a video of sign language, is that part of the basic scoring or is that a ribbon?

<Fazio_> Also, a lot of guidelines for other disabilities apply to cognitive

<Fazio_> like for hearing and vision

SL: For Silver, we want to get away from that sort of test being separate.

Charles Hall: Suppose basic requirement is 4.5:1 or 80% SAPC. Which currency do we get points for 3:1 or 100% SAPC?

<Chuck> ach ch

scribe: I understood 2nd currency to be for awards for testing. Did I evaluate how other disabilities benefited from high contrast?

Jeanne: Somewhat, there are things outside of conformance that might be acknowledged, like testing with people for disabilities.

<Chuck> just making a note

<Chuck> I won't need to talk or we don't need to discuss.

<Chuck> It was a technicality

CH: So to Peters question, this goes to 2nd currency because it is not an additional set of people.

<Zakim> Chuck, you wanted to say I understand the q, about what do you get if you go beyond the minimum of contrast standard. Technically, we have learned that a higher ration may NOT be

<Chuck> Bruce: That was one of the ... the contrast example is a good one. Ranges of higher contrast that are generally better overall.

<Chuck> Bruce: 100% SAPC which is not black on white but is strong compared to 80%, a bunch of small things like that could be added to second currency. If you get enough of those, maybe you can note that.

<Chuck> Bruce: We've met bronze, and we've earned ribbons... another ribbon could be getting these extra bonus points. The answer can be "yes" depending on how you structure the muliple currencies.

<Chuck> Bruce: distinct or cumulative by doing a bunch of little improvements.

Jeanne: Prefers Rachael approsal.

Bruce: +1

Jeanne: So we probably do not need 2nd currency in addition to adjectival ratings

<Chuck> +1 to Rachael's proposal.

Jeanne: Doing research can be acknowledged separately
... but missing some AAA SC might not failing
... but can't miss all of them
... so if other scores are high enough, that might make up for failing other pieces

PK: hopefully not too much of monkey wrench, but for a long time with 2.x we have issue with conforming alternative versions
... nowadays we are seeing mobile versions that are an alternative (accessible) version
... could a voice-only smart speaker be allowed as way to meeting requirements?

SL: No.
... Each version should be evaluated separatedly
... So main app fails, then get high marks on alternative.

PK: Suppose main site has paper cuts, but speech version is so good it is magical?
... What could it say about the accessibility of the service over all?

SL: That seems outside of scope of conformance as we have been working so far.
... but could specify how some version works well for certain cases on certain platforms
... but that is outside of how we have been talking about conformance claims so far

[Shawn and Peter agree this is a problem for large companies]

<kirkwood> very good question Peter

SL: We will talk about ACT next time

Jeanne would like Wilco there.

- DRAFT -

Silver Task Force & Community Group

03 Apr 2020

Attendees

Contents

continue discussion of integrating different proposals in Silver

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output