W3C

– DRAFT –
AGWG Teleconference

11 October 2022

Attendees

Present
alastairc, bruce_bailey, Chuck, Daniel, Fazio, Francis_Storr, GreggVan, JakeAbma, jaunita_george, jeanne, Jennie, jon_avila, joweismantel, Judy, Katie_Haritos-Shea, kirkwood, Laura_Carlson, Lauriat, Makoto, maryjom, mbgower, MichaelC, Peter_Bossley, Raf, sarahhorton, shadi, ShawnT, stefans, SuzanneTaylor, wendyreid, Wilco
Regrets
AzlanC, BruceB, ToddL
Chair
alastairc
Scribe
Chuck, Laura, wendyreid

Meeting minutes

<alastairc> https://www.w3.org/WAI/GL/wiki/Scribe_List

<bruce_bailey> regrets for second hour

alastairc: Does anyone have any announcements?

Announcements

Chuck: There will be a survey for the WCAG2ICT survey
… one of the upcoming agenda items
… if there are any questions, q+

bruce_bailey: interagency accessibility forum is happening this week
… will post a link in the agenda

<Zakim> bruce_bailey, you wanted to mention IAAF

<bruce_bailey> https://www.section508.gov/iaaf/agenda-2022/

alastairc: First and only item for today

Continue conformance conversation https://docs.google.com/presentation/d/1yLYeNcybGxRu43KdrVUcOCL6iXsy6-gxl9-lbyr90dI/edit#slide=id.p

alastairc: last week's conversation on conformance
… we will talk through topics, first is adjectival scoring
… coming from conversations at TPAC
… looking to move on to potential solutions for these topics
… we've already reviewed percentages, 100% pass fail, scoring approaches
… issue severity, prioritizing by functional need
… scores, protocols
… starting with adjectival
… then going to move on to how we use these things together
… what is the best model for WCAG 3

Rachael: This is the last of the conformance scoring concepts
… as a reminder, thse can all lay over each other
… we want to think through some of the ways we can move through these
… adjectival scoring, bands to improve understanding
… motivational, easier to summarize bands
… challenges, it's potentially confusing

Adjectival Scoring

Rachael: it's hard to spell, adds subjective judgement
… increases complexity
… challenges with i18n
… slide 12
… potential solutions
… Lara has pointed out, could subjectivity be mitigated by defining what falls into each band
… tieing them to different approaches
… bronze to WCAG 2.2 AA, etc

<Fazio> regardless of approach we use I think we should somehow give credit for WCAG 2.x testing

alastairc: Next steps, are there any key benefits or challenges
… anything that would improve this solution

GreggVan: Has there been any thought to, seven point scales, or three
… it seems that reliability would go down, or stress, if we define these

alastairc: Question, did we consider scales
… poor to best in levels

<Rachael> q_

GreggVan: Three point, might be better, everything could fall into the middle
… curious is there further thought

Rachael: We've had some of that work done in the first public working draft
… different scales

<Zakim> bruce_bailey, you wanted to suggest Venn Diagram to illustrate that conformance ideas are not exclusive to each other ?

bruce_bailey: Just had the idea, can we illustrate the conformance ideas are compitble with each other
… there could be overlap
… possibly combinable

alastairc: How do we evaluate, look at the various models

Wilco: I don't think it's necessarily obvious that the same scale works for everything
… true/false seems to work well for most things, but not everything needs the same scaling

alastairc: I think when we've discussed this previously, for some things the scale is 2
… poor or best

<jeanne> +1 to not everything has to be the same scale.

alastairc: i.e. language on the page, two options, allowing for some things not to have a full scale
… from previous discussion, we didn't want different scales for everything with ranging numbers
… benefits, challenges?
… how would you solve the challenges

Chuck: There are problems with this solution, I'll admit
… one thing I learned, if you had individuals, and they had different results, but you averaged them, the average would work

alastairc: Is that trying to solve subjective difference

Chuck: And interrelator reliability

<Zakim> Lauriat, you wanted to note the judgement expression as a feature and not a bug

Lauriat: As a note on interrelater reliability, having a scale for subjective judgement
… one person's failure is another person's pass
… having a way to express "I marked it as x because..." vs a simple yes/no
… if you have an article about a park, with a photo of a dog, and the alt text is subjective
… this is something we need to test to confirm
… I'd rather prototype it, test it

Rachael: Chair hat off, something I have been thinking about
… tree structures of assessment
… rather than a single test
… a tree of tests
… alt text is an example of that
… is there alt, yes no
… is the quality good
… potentially, is there more broadly context that needs to be applied
… like a museum
… it might be possible to assign adjectival ratings to parts of the tree

<alastairc> +1 to Lauriat's comments, having bands for something like "equivalent" is easier than pass/fail.

I will scribe for wendy when she speaks

Rachael: some things would only have certain levels

<Lauriat> +1, exactly my thinking of how ACT can help guide that

wendyreid: Super strong +1 to Rachael. I'm getting into testing talk, this is my first thought for what we described so far.

wendyreid: There is a lot of strong yes/no in current WCAG. Unit test level is "does this page have a title". Second question is "is the title good".

wendyreid: There are lots of things that start with yes/no, and you go further down. The following questions can be subjective.

wendyreid: The tree structure can address the concerns, with subjectivity, time, etc.

back to wendy

Wilco: Seems to me that how you test, and how to break them up, is not correlated to quality
… skeptical of that approach

alastairc: How good what is?

Wilco: If you use different tests to measure abjectival rating, if you fail first level, this
… I don't think this is how tests are broken up today

alastairc: Are there guidelines that come to mind?

Wilco: Things stacking on other things, some of these scaled up into ACT, but not great for determining quality
… I wouldn't use them for that

<Zakim> Lauriat, you wanted to note I heard that as more of where to make decisions about rating, rather than direct correlation

Lauriat: +1 to Wilco
… I understood what Rachael described as educational and supportive, how to walk through a test
… here's where you can make these decisions
… possibly problematic
… if it fails x, it fails completely

<Zakim> mbgower, you wanted to say that an overall approach should help us align clear requirements with testing

Lauriat: but if you get to a point to assess quality, here's how to do that

mbgower: Thanks for the discussion, relate this to something Ken Nakata said in his keynote
… the problem with wCAG is that it can be vague and complex at the same time
… to Wilco's point
… if one path forward is a more mindful approach to look for clarity
… friction, we need to ensure our tests are moving people to clarity
… more reliability
… dependent on the clarity of requirements
… the scoring adds to that clarity

alastairc: Adjectival scoring would be more useful if we start with objective tests, then build on subjective ones

<Zakim> SuzanneTaylor, you wanted to suggest robust examples for each band, as a way to vet the bands now, to clarify the bands for testers in the future, and to also help guide designers and developers in the future

<laura> +1 to MG We need to keep the distinction between simple and clear front of mind.

SuzanneTaylor: We should list a robust set of examples as a solution

alastairc: Informative documentation

SuzanneTaylor: Robust examples to help vet the bands, for us and for designers and developers

alastairc: Which we've done in WCAG 2.x

GreggVan: This discussion gives me an idea
… I'm always concerned about people being required to do something
… but the goal is unclear
… adjectival seems unclear
… but we want to move past the binary
… what if we could combine adjectival, with a binary
… when done, you either do or don't conform, but with a score
… You got C's, you passed, but you could also pass with A's

<Lauriat> +1, something we've noted as something we should investigate, so let's definitely not lose that

GreggVan: encourage to do better

alastairc: I think that aligns

Combine adjectival with minimum level to pass

alastairc: to build on mbgower 's example, do images have alt, then how good is the alt, etc
… concept of equivalence

+1 to exploring gv's idea

alastairc: but with clear cut bands, each instance is a minimum requirement, it moves it along

<Zakim> mbgower, you wanted to say I did a presentation on how to get to clarity on ALT a couple months back

mbgower: I just wanted to remind folks I did a playback on alt on images
… I proposed, what if every alt had a 2-5 word short description
… method to designate whether an image was important
… every image must have a short description
… if important, needs more
… then get to adjectival score of quality
… binary of whether it is poor or good
… then more detail for the higher levels
… requires us to go back to the testing for requirements

alastairc: Sounds like a good example
… something to try on other guidelines

<Ryladog> +1 to MG

alastairc: considering the issues we've discussed
… are there other solutions?
… good options there
… this is one of our scoring type options
… evaluating issue severity

Rachael: Just a reminder as we discuss issue severity
… we have already done this in WCAG 2 with A, AA, AAA
… different ways to do that levelling
… some benefits, addresses absolute barriers while recognizing difficulty of achieving perfection
… it does integrate and standardize
… prioritization, it happens informally outside of WCAG
… incorporate context

Issue severity

Rachael: reflect impact
… challenges to solve
… conversation on the list and in the previous meeting
… what is severe for one person in one context and task, may not be for another
… we need to define and test consistently
… who determines the context or task outside of the specification
… did we miss any of the pros and cons?

alastairc: Are we missing anything?

Ryladog: One thing I always include in informal prioritization, even when specific, higher than A
… if certain things aren't working, they are highest priority

alastairc: We had a subgroup working on this
… the main thing we worked through was looking at all of the tests in the FPWD, put in the spreadsheet
… group looked at them and classified as critical errors
… easier to point at critical than serious, medium, minor, etc
… grading the lower levels was harder
… as pointed out on the list
… across scenarios, it differs
… separate but integrated methods
… set of issues, working with or be the site owner, you may have some prioritization of the tasks, if you have a barrier that has been raised in testing
… info image missing alt text, if you took that away, what is the impact
… judgement of the barrier and the impact
… but this does maybe not be part of the initial testing
… maybe to help move from bronze to silver, onwards

sarahhorton: I was going to add, a benefit, based on how we've been talking about it, the issues in the way we break them down in a matrix
… moving from identifiying something as critical, critical for whom
… using functional and user needs
… one of the main design goals of WCAG 3 is to bring user needs more central to the standards
… this is critical to users with this functional need

shadi: We might be talking about different aspects or purposes of testing
… what we mean
… there is testing as in "did I meet the crtieria"
… clarity of did I do this or not
… we also need the outside view, does it look like this content has met these criteria
… this is more of a statistical thing, check a sample
… is the claim correct
… another dimension, when we try to score
… is it likely that the claim is true
… what is the ranking of one vs the other
… I may be wrong, but I think we're discussing different goals, purposes for the testing

alastairc: In terms of approached or goals
… for the sub group, better aligning results with lived experience
… having a "better ruler"
… working on the feasability at the guideline level
… we're meeting again tomorrow
… potential solution listed
… slide 15
… incorporating context with site owners

GreggVan: I don't have a solution
… we discussed before, we did severity before A AA AAA
… cognitive group points out that the cognitive are viewed as less severe
… something with no alt text is easy to envision as a blocker, but something being a blocker for someone with a cognitive disability is harder to envision
… physical barriers will appear to be more concrete

<Fazio> +1 COGA would have concerns about severity

GreggVan: than cognitive barriers

<Ryladog> +1

GreggVan: I worry that cognitive will be harder to measure

alastairc: It has come up
… chair hat off, I think it's something we need to actively keep an eye on

<Rachael> I added that as an additional challenge

alastairc: it's easier to assess concrete requirements
… but the cognitive view brings in other requirements
… if we look at a per-instance basis, it could be challenging to view pages as a whole
… difficult to assess at the guideline or test level
… but if you're looking at a higher level, site level, might be possible

shadi: The question to me, COGA could have concerns, could this also be an improvement for the community
… improvement from the current status

<Fazio> has yet to be seen

alastairc: The group should evaluate based on functional needs, assess with the various groups, involve COGA

Ryladog: I also wonder a little bit about the owner determining, if we separate these things by need, we should just leave out the COGA stuff because they are a particular site
… challenge with separating out
… in general usability, design will say we don't want this amount of info, we don't want more than x number of steps
… design can say that, I wonder if we can need to look at that

<Fazio> rule of 4 in short term memory

Ryladog: work around the way people may avoid certain use cases
… let people make choices

shadi: Good point Ryladog, ruling out like that categorically, the model proposed would not work that way

alastairc: To allow site owners to choose what they test
… ?
… depends on how we present it
… we'd want to avoid people picking and choosing between groups

I will scribe for wendy when she speaks.

alastairc: from the POV of people choosing, potentially a WCAG-EM style of assessment
… what would people need

Rachael: In this conversation, there are two kinds of crtitical errors
… one is clear, i.e. flashing, but others are more nuanced
… the out side ones can be handled as a protocol

<Zakim> Lauriat, you wanted to +1 mbgower's previous note on clarity as an absolute essential for this

Lauriat: +1 to Rachael
… wanted to raise one of mbgower points, the adjectival, it's the other side of expressing issue severity
… the previous point around clarity, it's very much true for severity
… it's clear for people to follow and understand

<jeanne> +1 to clarity and testing

<Jennie> +1 to Rachael. Certain criteria that issue severity may not be appropriate for could use other types of evaluation. Example: if related to cognitive accessibility

Lauriat: test it to understand

wendyreid: I see the role of issue severity. We see it today with what vendors are doing.

wendyreid: I get things assigned priority depending on impact, and user impact.

wendyreid: This works because I'm on an agile dev team.

wendyreid: I see the issue about not identifying by disability type. There is an element in the market of exclusion on purpose.

wendyreid: When a site owner raises accessibility of their site, they put focus on different things on user base they know.

wendyreid: Use cases and scenarios most important. It feels wrong to say we won't focus on a group now, but the reality is that they kind of have to issue priority based on use case or the product and service they sell.

wendyreid: Dev teams have to prioritize. I think it can feel challenging, but the subject matter experts do play a role in choosing severity.

Ryladog: I know that's the relaity

<Fazio> Priortizr COGA as most severe :)

Ryladog: when one goes for conformance, somehow, we're measuring for each user group
… this site is a 2, on a scale of 1-4, it's not covering every user group

+1

alastairc: In general, we do try to approach things from a baseline and improve from there

Jennie: I found Wendy's point interesting, as someone who scores points, the people with cognitive disabilities who are part of a user base, the people making those decisions are unaware of them
… or could be part of the user bade
… need to get more needs of cognitive users recognized
… I agree, there is ways to use multiple strategies for scoring

<alastairc> q/

Fazio: If you think about cognitive needs, that should be the most severe level

<kirkwood> categorize severity by user group should be a seriously considered approach

Fazio: workaround where cognitive accessibility as highest level need

alastairc: We need to get guidelines in as part of conformance
… and work with COGA to define severity

<Zakim> Chuck, you wanted to ask for a scribe change

thank you wendy!

Prioritizing by Functional Needs

<Fazio> +! COGA encompasses many groups

gregg: prioitsing by group. coga is thought of a one group. But it is more groups.

<Jay_Mullen> All I can say on Issue severity (new here) - is that when I deliver an audit - there is the core rating of severity of Critical / Serious / Moderate / Minor / Best Practice and that is great but often when its a large site you have hundreds of violations between each and a product owner cannot really parse or understand well if they are not an expert - and they do not know where to start. So what we do is provide a second level of priority wi[CUT]

<Jay_Mullen> of 1-5 (1 being the highest priority) which is defined by details such as is it a repeate dpattern, is it part of critical flow, does it impact wide audience versus small audience. The severity always stays critical but it is further set apart and differentiated by this second level that product owners will rely on to prioritize development actions on agile teams.

rm: functionl needs doc has many groupings listed.

gregg: want to plant. the idea of hight and low.

<Fazio> I love the functional needs approach

rm: key bnefit of funtional needs.
… has challenges.
… has risk of bias
… If the number of functional needs matters, than it motivates greater subdivision in the FAST (May move inequality to FAST)

If the number of functional needs matters, than it motivates greater subdivision in the FAST (May move inequality to FAST)
… May introduce biases against those functional needs that appear less frequently
… Adds complexity.

ac: any benefits or challenges?

<jeanne> +1 to being careful that we are improving equity

jk: integration with lawsuits.
… may match the real world. Gives impetus to do this,

ac: challenge- which are applicable accross tech can vary.
… some tech may be more usable than others.

<kirkwood> lawsuits are by user groups with specific functional needs.

ac: hisitant to use this as part of conformance.

jennie: functional needs could inform other scoring mechanisms.

ac: looking at the test level.
… any solutions?

rm: reworking FAST.

gregg: Also need to consider combination of disabilites.

sh: could be another of context is important

<Rachael> Link to FAST for anyone who hasn't seen it : https://w3c.github.io/fast/

sh: prioritizing in a closed enviormnent like a kiosk.
… there is a contextual aspect to this.

RM: Weighting.
… it is a concept.
… Allows inclusion outcomes with lower benefits without skewing conformance
… Allowing lower weighted items may be a way to bring advisory techniques into conformance
… Adds complexity and is more difficult to understand
… Difficult problem to correctly weight
… Weighting by testing could be a problem for regulators
… have not solutions yet.

ac: back away slowly.

<jeanne> +1 to alastair's caution

<Ryladog> A good example of a lower weight items - is the people with severe ammonia allergies so must always wear a mask

<kirkwood> Can’t we just give the ability to measure which needs are covered and let the owner determine how to prioritize/weight/points

ja: Katin mentioned positive statements.

<jeanne> +1 to JA

ja: show people can take a standard appooach or give them another way

ac: maybe have baseline conformance. but give them something extra to score points.

<Fazio> no. Because more important but more difficult things would always get left out in lieu of easy outs

<jon_avila> it would have to be to support the same disability

<Zakim> jeanne, you wanted to having flexibility in context and could be tied to minimum by Functional Need

<jeanne> +1 to Sarah's example -- that was what was I meant by flexibility in context

sh: headed in direction of 3rd party content. and allowing not to meet criteria.
… doesn't mean I think it is a good idea.

ac: any solutions?

chuck: maybe look at an example.
… or mayt his is a warning sign that this isn't the right apporach.

<Zakim> jeanne, you wanted to talk about weighting prototypes

jeanne: did work in 2019 on weighting.

<Chuck> I thought so, I thought that these existed, just couldn't find examples on the fly.

jeanne: take away is we didn't get any of them to work.

<kirkwood> agree with Jeanne

<Zakim> Chuck, you wanted to say that to give this "weighting" a fair opportunity, maybe invite any individuals to come up with a proposal if desired, otherwise we deem that there is not much support.

jeanne: think we should not pursue.

gregg: spent a couple of years on this & couldn't get it to work.

RM: Setting Minimum Scores
… Key Benefits
… Provides a baseline for prioritization and may motivate getting started
… A way to allow more flexible conformance approaches
… May promote equity
… Challenges to solve:
… Risk of gaming
… Organizations may adopt minimum and claim conformance. If less than WCAG 2, this would hurt progress
… Risks compliance stopping at the minimum and important needs being left out
… Adds complexity and is more difficult to understand

mg: minimum score may help improve accessibility.

<Zakim> alastairc, you wanted to suggest 'gating' rather than minumums, in general

<Ryladog> improved authentication will eventually help with that

chuck: could be a problem: site with one problem area but rest of the site is great

ac: more issues a A that AA. Issue with gating
… could have a way of progression.

<Zakim> mbgower, you wanted to say emphasis on user process can help offset what Chuck is pointing out

<Chuck> +1

mg: emphasis on user process could help mitigate issues.

gregg: worry that parts of a process may be critical but we leave them out.

Protocols

rm: Being able to complete some kind of additional process (such as user testing) after testing to gain some type of additional credit
… Allows reporting conformance on things that can’t be tested with high inter-rater reliability
… May be easier for decision makers to understand and adopt based on their situations
… May motivate orgs to go above minimum accessibility (and provide ways to do so)
… Challenges to solve
… How to reduce gaming of the system
… Administrative burden for AG if more than just a few protocols are needed
… Administrative burden for AG if more than just a few protocols are needed
… solutions:
… This could be done by affirmation

<Fazio> like our maturity model

katie: yesI SO 9000/9001 and affirmation
… what about pairing? whear does that fall?

<Zakim> Rachael, you wanted to answer

katie: should be extra points for it.

rm: good question. We should come back to it.

<Ryladog> Accessible Pairing

ac: Protocols is good way to extend confromace.

rm: different ways it could play.

ac: seems like a necessary thing to for funtional needs.

sh: could provide hooks for regulators.

<Ryladog> Pointing to existing other standards

<Zakim> alastairc, you wanted to add example for broadcasters

ac: like the idea of provide hooks for regulators.
… could be an extension.

<Zakim> SuzanneTaylor, you wanted to say that protocols could be as simple as triple A broken down into categories

st: could be like WCAG AAA.

<Ryladog> +1

st: could encourage better accessibilty

Evaluating conformance proposals

rm: Requirements Document

Accessibility guidelines & Guidelines process.
… we have metrics.
… Considerations for WCAG 3 Conformance
… we ha 11 of them
… we have "Additional Criteria to Evaluate Success" questions tha we are capturing.

<alastairc> https://docs.google.com/presentation/d/1cqnw0dw-xnEVJM7QSCAe9r9WyJNVkEPzgm9k8h4aSl4/edit#slide=id.gb3ceb32d61_0_33

jk: another one to add: Is the proposed conformace model easy to understand?

rm: we need to figure out 2 or 3 options to explore.
… need volunteers to write them up.
… (gives examples of options)

<alastairc> Current slide: https://docs.google.com/presentation/d/1yLYeNcybGxRu43KdrVUcOCL6iXsy6-gxl9-lbyr90dI/edit#slide=id.g1659f228417_0_0

ac: what would you do?
… any questions?

gregg: Are you looking at global view?

ac: yes.
… Bronze Pass/Fail
… Silver Adjectival
… Gold: Protocols

ac: this is a basic template you can work with.

Gregg: try to take the whole of what we have and see if it will fit in the model

rm: 2 ways forward.
… if you have a suggestion copy silde 34.
… then have intial conversations.

ac: any volunteers?

st: copy silde 34?

rm: yes. I'll make a new version.

<mbgower> Thanks for a great summary and exercise

<Rachael> https://docs.google.com/presentation/d/15ZoKbczXw3JIoyDxAxKtBG0sWMKnB6lqAnV4V9xVsoM/edit#slide=id.g165c944dd8c_0_17

ac: try the excercise.

Minutes manually created (not a transcript), formatted by scribe.perl version 192 (Tue Jun 28 16:55:30 2022 UTC).

Diagnostics

Succeeded: s/???/WCAG2ICT

Maybe present: ac, gregg, ja, jk, katie, mg, Rachael, rm, Ryladog, sh, st