W3C

– DRAFT –
Accessibility Conformance Testing Teleconference

15 September 2023

Attendees

Present
CarlosD, CarlosD_, daniel-montalvo, Jean-Yves, kathy, suji, thbrunet, trevor
Regrets
-
Chair
-
Scribe
Jean-Yves, kathy, trevor, Wilco

Meeting minutes

Empty Headings Rule

<daniel-montalvo> Wilco: We said yesterday that the rule would be deprecated because we were failing it under 1.3.1 and there are no longer cases where this is an issue

<daniel-montalvo> ... Deprecated may be an option, but ARIA requires headings to have an accessible name

<daniel-montalvo> ... We could have a rule that checks that from the ARIA perspective, much more generic

<daniel-montalvo> ... It could cover headings, buttons, column headers, etc

<daniel-montalvo> ... But we cannot transform the current rule

<daniel-montalvo> ... We could for the moment map this rule to ARIA until we have a new rule

<daniel-montalvo> ... But it creates a situation where implementations that currently fail this under 1.3.1 would be inconsistent

<daniel-montalvo> Jean-Yves: That would not be a problem for mee

<daniel-montalvo> Carloss: Same here, good to have it from an ACT perspective

<daniel-montalvo> Markr: Same here, would be easy to change

<daniel-montalvo> Jean-Yves: Currently I wouldn't deprecate the rule only tool. I would do when we have the new one

<daniel-montalvo> Mark: Deprecating does not make much difference for us, we just continue using it until wee have the new rule

<daniel-montalvo> Wilco: We'll need to check with all tools explicitly

<daniel-montalvo> Wilco: I like that approach much better

<daniel-montalvo> Daniel: What about Trusted Tester?

<daniel-montalvo> Wilco: They do not implement this rule, they are going to have opportunity to review PR anyway

<daniel-montalvo> Wilco: Do we think that's a 2-week Call for Review?

<daniel-montalvo> Jean-Yves: Yes.

<daniel-montalvo> Carlos: Yes, significant change.

Acc support & assumptions in the background

<kathy> https://www.w3.org/TR/act-rules-format/#act-rule-structure

kathy: Looking at the structure of an ACT rule. As you get through the list assumptions and accessibility support are both required

<kathy> https://www.w3.org/WAI/standards-guidelines/act/rules/09o5cg/

kathy: pages or rules have a navigation sidebar. Interested in moving some things to be under background. Thinking specifically assumptions and accessibility support could go under there.
… right now the background has the bibliography as a subsection, interested to move a few more things.

daniel: Would probably be less cluttered, might want to rename background if we put these as subsections
… don't think assumptions or accessibility support are background because they are required

wilco: Thought this might be okay since we mix up these section anyways. Some times accessibility support background shows up in the accessibility support
… find that people outside this group are regularly overlooking the information.
… if you don't know what is expected in accessibility support you might just check the background

daniel: agree with the thinking, but don't like the naming.

Jean-Yves: No strong opinions, might need to make other changes, like background becomes required if we put assumptions and accessibility in there
… will need to have a good structure since this will become a large section.
… might have times where accessibility support and assumptions are more normative than what is usually included in the background.
… slight concerns with blurring everything together

thbrunet: Would it help to borrow a term from WCAG and use the term "understanding"

daniel: Sounds better to me, gives a bit more idea of what to find there

Jean-Yves: I think that could be a good name

thbrunet: And I think from WCAG people are trained to go look at the understanding documents

kathy: I think "Understanding" works, might need to look at other categories that might fall under this and make sure they work as well.
… Thought rule versions or implementations could potentially also be included.

wilco: Would like to leave implementations where they are personall.

kathy: Currently not in the format

wilco: The PR adds it to the rule structure and defines it.

kathy: Rule structure also has change log and live site has rule versions.

wilco: I like rule versions better

kathy: would that be something we could put under the background?

wilco: I think that is a separate thing.

Jean-Yves: Would also put bibliography and related rules could also make it into the ACT structure.

wilco: Sounds like we have a good direction here, just need to write it down.

Optional test cases

Jean-Yves: Our rules have two uses to make sure that humans understand what the rule is testing and then we also care about tools and methodologies getting correct results.

<Jean-Yves> act-rules/act-rules.github.io#1954

Jean-Yves: when we add things that clash with the accessibility support.
… according to the spec, the alt is empty and go to the title, others just say that is now name
… if you take a rule like image button has non-empty accessible name
… for the specification, it should be passed, but it does result in an actual accessibility violation
… in this case, we ended up removing the example from the rule because it couldn't really be a pass or fail example
… but it is a very valuable test case that we would like to have included in the rule.

<Jean-Yves> https://github.com/act-rules/act-rules.github.io/pull/2077#discussion_r1246410253

Jean-Yves: We have had multiple other times where we remove things because of how they were affected by accessibility support.
… Had a rule targetting empty headings, example was an empty div
… we also have problems in some rules of having too many examples, and it might be bad for the human, but also worthwhile having for the computer
… the solution that was mentioned for the accessibility support problem would be optional test cases that don't affect the implementation
… specs may say pass, but some accessibility support problem could cause it to fail
… optional cases to test some corner cases
… those are the two sides of it. Problem with accessibility support is much more important. Want to enforce standard, but understand where accessibility support lies.

thbrunet: Would propose to change the wording to use 'extended' instead of 'optional'. So you have a 'core' test set and an 'extended' test set

wilco: I see value in this tracking problems around accessibility support. Have had problems with test cases changing depending on how the accessibility support. Causes problems with tracking.
… that doesn't necessarily need to be part of the rules though. In adding this, it might add more work on maintaining on each rule.
… will move test cases around and then implementers would need to agree to that

Jean-Yves: This follows spec, but then some browser doesn't support it.

MarkRogers: XML1.1 is really hard to test due to all of the mimetypes, its very easy to mess up some configuration and that could affect results. Need to be very specific about versions when testing, things update all the time
… versions don't need to be in the rule, but we might should have some kind of metadata.

wilco: Agree, think there are better places to put the information. Second thing is that if the spec and browser are in conflict, it doesn't directly mean the browser is wrong and the spec is right

wilco: Have had talks of making a repo with recording this information and including test instructions to verify that the accessibilty support problems
… not saying there is no room to optional test cases
… as for having more test cases, and having an extended set of test cases and using them for automation. Not sure I am keen on that either, since it makes manual implementation of those rules more burdensome
… would like for us to stay selective of our rules

kathy: I like the idea of tracking these separately, but if they are included in the rule, I think if people see optional they are just gonna not do it.
… I think there needs to be some significant description for why the other test cases are separated and not included in the consistency examples
… if they are optional, what result should they get? What would their results be used for?

Jean-Yves: Should we include them in the rule, should we link to them from the rule, then always a question of if we want detailed test instructions, specifically which assitive tech to consider (e.g., browser)
… that might be a bit on the line. Its something that has blocked us before. Think it makes sense to not necessarily include them in the rule
… could have some use linking them from the rule, could use them to demonstrate accessibility support issues.
… not exactly sure how we can manage on it. Maybe have example without detailed instructions on the rule page and example with full instructions on other site.

wilco: Was some work, AT drivers which wants to programmatically run screen readers and cross-browser platform testing which is increasingly incorporating more accessibility feature testing. Being used for role and name computation
… those are resources that I'm hoping we can be using instead of having to do it ourselves
… the only reason I think we might want to do something like this, is to create transparency about how different implementers report these different test cases
… have a sense that people are not digging that deep into our rules today, except for maybe besides ourselves.
… a possible example would be to have a dummy rule, but not have it as a rule that is tracked. Uses the systems we have already in place, but doesn't show up on the implmentations list

Jean-Yves: Even without the full test example suite, we still need to udpate the accessibility support. We can't just point to those resources and say to check if they are conforming
… hope that it will bring some harmonization at that level. Might be able to link specifically to web platform examples that demonstrate the issues
… might be a way to track it without pointing fingers since it is a separate group

wilco: Is this something we want to continue pursuing, just keep thinking about it? Put it on the agenda?

suji: Would like to discuss it a bit more regularly to get a conclusion on it.

Jean-Yves: From the accessibility support side, using the web platform tests might be the nicest way and require less maintenance.
… need to investigate further before commiting to optional cases further

wilco: Proposal is to look at the platform tests for accessibility support. Which means for the moment if we see major accessibility support issues we don't include test cases for them.
… for the web platform, talked with member that it might be possible for us to have a section where we could contribute our own test cases to.

daniel: It would be on a browser, it doesn't tell us what the assistive technology might do.

wilco: I initially really liked the idea of optional test cases, but think there might be better ways to help our accessibiltiy support problems.

Jean-Yves: Goal to aim for is finding accessibility support, try to find web platform test, submit one that is relevant to the rule. Finding the appropriate place and method for contribute to the web platform test
… same goes for the AT driver test
… something where we add this guidance and know what we should do.
… something that would make the revisit test much easier. I think that should be our goal

Machine learning and ACT Rules

https://github.com/act-rules/act-rules.github.io/discussions/2113

wilco: Setup a discussion.
… brief introduction, ML is starting to change our industry. Seems that the rules we have been writing, especially for ones that haven't traditionally been automatable, we don't handle predictive models well
… on top of that, our examples are very minimal, so they may not work as well on our minimalist examples.
… even if they did, not clear that they would get these right in the real world.
… three topics, Do ACT test cases work with ML, Training on ACT test cases, How does confidence fit with ACT consistency

MarkRogers: Would problem could be people poisoning the datasets, such as incorrectly labelling an email as spam on purpose

... add "Poisoning data" topic title

Do ACT test cases work with ML?

wilco: first topic - test cases

https://github.com/act-rules/act-rules.github.io/discussions/2113#discussioncomment-6948559

trevor: ML systems can work with small test cases if trained, and others can use the entire page
… if they can't handle the test case, change the configuration
… no guarantee on how well ML will perform in real world scenario

markrogers: may be difficult to understand a whole page for pass or fail

jean-yves: real web page many issues while test cases are one

trevor: even with large data sets, some may be simple like our test cases

<Zakim> thbrunet, you wanted to react to trevor

thbrunet: having multiple targets within the same test has been questionable

carlos: don't think we should build test cases for predictive models, but other way around
… train on our test cases

enrico: future implementations using ML may identify a portion of the page
… technology should be able to handle simplicity

wilco: summarizing, general agreement that ML should be able to handle simple test cases
… have a different category of implementations. talk about this in third topic
… contextual applicability
… confidence of ML in real world

markrogers: don't train on ACT examples
… how much is needed for rule examples is needed for testing depends on requirement

<Zakim> Jean-Yves, you wanted to mention need of context for "look like" rules, and validity on any tool based on implementation report

jean-yves: examples in rules that require context will have bigger examples for ML
… validity of ML in real world, assume tool wasn't trained just on test cases

wilco: concerned ACT test cases are so simple the ML tool isn't challenged

jean-yves: not a problem
… tool only trained on test cases won't be good in real world

wilco: we never meant for ACT to be a quality checker

thbrunet: split test cases to explanatory and test examples
… explanatory doesn't have to be real world
… have tool vendors agree to not train on ACT

daniel: examples are granular, not worried. more worried about context

trevor: if it doesn't work in real world, not a big concern.
… not guaranteeing performance outside of test cases
… don't think it's our job

wilco: conclusion sounds like we don't need to pivot our approach
… also not worried about quality in real world

Kathy: Coming from manual, when I go through test caes
… The applicability is spelled out for me so I know there is something to test here
… We're expecting testers will be able to find the applicability
… It sounds similar to ML concern, ability to find what needs testing.
… I would agree that we're not responsible for how well the methodology performance, but we are with what should pass and what not?

markrogers: ML can be scripted for the test cases

wilco: answer is known

Training on ACT test cases

wilco: on the links in context question, TF asked about Power Mapper implementation

markrogers: we used the test cases
… was that bad or good to do?

jean-yves: we can't prevent use of test cases for learning
… if overtrain on test cases, it could be bad.

carlosd: should be a big concerned if only train on test cases, but don't think we need to change
… but we can't prevent it

jean-yves: implementers can also lie on their reports

trevor: no way to test an ML model even with a giant data set
… over training on examples isn't always a negative

wilco: existing models train on specific web sites
… open source Crest trains against the website first
… doesn't seem right to train on model being evaluated
… not opposed but not in favor

thbrunet: can't govern tool vendors to overfit

wilco: proposal is to recommend against training on test cases, but not to regulate

<CarlosD> +1

<Jean-Yves> +1

+1

<giacomo-petri_> +1

<daniel-montalvo> +1

<thbrunet> +1

How does confidence fit with ACT consistency?

https://github.com/act-rules/act-rules.github.io/discussions/2113#discussioncomment-6950358

markrogers: can ask implementation to report confidence.
… number as a percentage or probability since it's available with their tool

trevor: confidence is arbitrary across ML models
… an overconfident model vs conservative model

<Zakim> thburnet, you wanted to react to trevor

trevor: can be a user setting for confidence threshold can be followed for implementation report

thbrunet: more of an issue for subjectivity. don't need confidence for objective pass/fail.
… maybe another category of test cases

jean-yves: agree to request a tool to give yes/no answer
… can request confidence to evaluate inconsistent results
… don't want to overload manual testers with too many test cases

carlosd: request pass/fail with confidence to measure consistency
… develop a different metric for AI-based tools

trevor: less concerned about things testing differently on different days
… think testing of ACT test cases will be consistent

wilco: expect with limited ACT test cases, tools won't vary day to day
… reporting confidence could be an option
… but might not be meaningful
… if not standardized
… testing in its own category

thbrunet: implementers should be doing their own checks
… implementation note on type might be good to have

wilco: check level and not at the tool level is good
… we expect tools to run in default configuration
… axecore doesn't look at fallback roles by default, but we can turn that on. run ACT on default.
… can ask implementer to run in default for ACT

trevor: allow people to define their own confidence
… if new implementer class, metadata used for decisions could be interesting for users looking at implementation reports

<Zakim> Wilco, you wanted to talk about open fields on wai site

markrogers: link text goes to same page or similar may be hard

Wilco: i like to see how we can get more metadata, more info from implementers.
… implementers already want to report why things are different that they should, can't do that because WAI doesn't allow random text to be injected.
… would need pre approved messages, numbers, ... but not a plain text.
… but I'm in favor of more metadata.
… We expect reports to use the default setting (confidence threshold)

Proposal: ML based tools must use "default" confidence threshold

<CarlosD> +1

+1

<MarkRogers> +1

+1

<suji> +1

<giacomo-petri_> +1

+1

<thbrunet> +1

<daniel-montalvo> +1

MarkRogers: we can put metadata in a github issue, then only store the number on WAI site.

Wilco: It seems we do not need a new category of tools. These are automated tools.

Proposal: ML based testing fits in the automated testing category.

+1

0

+1

<suji> +1

<daniel-montalvo> +1

<giacomo-petri_> +1

<MarkRogers> +1

<CarlosD> 0

Biases in learning data

Wilco: if you let users give you test data (and the answers), this can "poison" the model
… I do not think it is an ACT problem.
… But there can be bias in the training data. E.g. vision-based testing can be based toward Western languages if there is no CJK in your training set.
… There can be a risk of leaving people out. We may think about best practices on training models for a11y.

CarlosD_: Are you suggesting we come up with "Best practices for trainin ML model for a11y testing"?
… Is it in or out of our scope?

CarlosD_: it can be good to identify set of biases and point to them.

trevor: are we building a dataset for people to look at? Is it in our area?

trevor: managing that list nearly sounds as another TF.
… that sounds important to have, but maybe out of ACT scope.

Wilco: we might just set up a wiki and contribute to it; review from time to time. Then put on WAI website.
… I think we are the correct group to build these practices.
… building a list of ideas doesn't sounds too much work.

JYM: My first thought is that ACT rules don't care about that
… That it might be better suited in other groups. But I feel as part of ACT rules, but I do feel it is the correct group of people
… Maybe it's in the scope of WAi but not necessarily ACT
… We did just have an issue we fixed where we missed CJK
… It is in scope for us to provide test cases. We've been bad at identifying these.
… We don't have examples with right-to-left script either.
… I've also wondered about ruby and accName
… It might be our role to provide examples in the rule
… I think that fits more with what we do in ACT rules

CarlosD_: +1 on what Jean-Yves said. Our examples are biased toward Western scripts.
… How do we deal with tools that have non-Western languages out of scope?

ac CarlosD_

thbrunet: tools have the option of reporting CantTell if they don't support CJK, ....
… our tests cases are not as exhaustives as they could be.

Wilco: I like idea. I think there is value to figure out what we need to improve and to document it.
… we did looked into this kind of things for languages (Cantonese and Mandarin have the same primary language tag)
… can get fairly complex with vision-based AI.
… Do we think tool need to support these cases? I'm OK with that.
… This means the tools need to be able to detect what is out of their scope and they cantTell.
… AG is also having some bias in this area.
… we maybe need to connect with th i18n group

MarkRogers: Accessibility Support is much worse in some language, e.g. Japanese screen readers used to not support ARIA because ARIA wasn't translated
… In such a case, can we really used aria-label to provide a name in these languages?
… it may be difficult to address the bias due to these limitations.

CarlosD_: we do have examples of hidding by moving to the left, which may not work in right-to-left languages.

Wilco: there seems to be agreement to identify our own biases.

Wilco: it may be useful to reach out to the i18n people to see if they can help us.

kathy: we should consider test cases. We'd need someone familiar with other languages. Reaching out to i18n makes sense.

daniel-montalvo: we will have to be reviewed by people with experience in other language.
… we can do the same as with ARIA where i18n can review our rules and give feedback.

thbrunet: I can reach out to colleagues in Japan.

giacomo-petri: that would be a lot of work. But it is the right thing to do.
… we might be able to get more meaningful examples to be as generic as possible.

CarlosD_: I propose to build the guidelines, not the dataset. We can also improve our examples.

CarlosD_: we can say that others examples still exist and need to be used for training

Wilco: the TF can add a question to the rules survey (with link to known biases)

Wilco: in rules about "name is descriptive" all the examples are in English, it is assumed testers understand English.
… if we add more languages to these, what happens?

CarlosD_: they can report cantTell.

MarkRogers: If the examples are totally generic, they do not represent the real world.
… hard limit between genericity and representativity.

Jean-Yves: we need test cases for right-to-left language (e.g. moving content to the right to make it invisible, instead of left)

ACTION: 1: Create a wiki page to start tracking biases in rules

+1

<suji> +1

+1

<MarkRogers> +1

<giacomo-petri> +1

ACTION: 2: Update survey, adding a question to look for bias in rules

+1

<suji> +1

+1

<CarlosD_> +1

<giacomo-petri> +1

<MarkRogers> +1

ACTION: 3: Reach out to i18n WG for help on rule review

+1

<suji> +1

+1

<giacomo-petri> +1

<daniel-montalvo> ¿1

<daniel-montalvo> S/¿1/+1

https://github.com/act-rules/act-rules.github.io/discussions/2046

states and transitions

trevor: if the rule has multiple states, we can add the manipulation with a "where... after" syntax.
… what happens when we are not concerned by only the states but also by what happens between them?
… e.g. a search result gives a loading state before the final result state.
… or an expandable box can have a message that it is expanding
… 2.2.2 flashing content also need to detect the flickers that happen during the transition (after clicking, ...)

Wilco: I've since color contrast issues reported during a CSS animation.

trevor: do we consider transition as a separate state, or as transition between states?

MarkRogers: stuff takes time to load, hydrate data, ... it is useful to have a "page loading" accessible message.

Wilco: I don't know how AT behaves in these transitions

rq-

MarkRogers: AT users need to know that the page is loading. The intermediate state can last some time. Only at the end is there a working state. What are the requirements for the intermediate

thbrunet: loading and loading partial can be different. there can be a skeleton page with various bits loading.
… previously, pages loaded mostly as one item. Now loading also happens after the initial page is loaded.

Wilco: I do not think it matters whether the transition is a state or something else, as long as it s 'something'.
… if a title is updated on contentReady, is it announced?

thbrunet: some things really need to be checked.

MarkRogers: there is risk of race conditions.
… some changes may happen 'too late'. but there is nothing in ARIA spec on sequencing these.

trevor: what about flashing?

Wilco: next steps? anything we have to do?

trevor: a lot of discussion on loading. Do we feel this can fit in our current proposal?

trevor: I'll try writing some examples to see how it goes.

Wilco: final thoughts?

Summary of action items

  1. 1: Create a wiki page to start tracking biases in rules
  2. 2: Update survey, adding a question to look for bias in rules
  3. 3: Reach out to i18n WG for help on rule review
Minutes manually created (not a transcript), formatted by scribe.perl version 221 (Fri Jul 21 14:01:30 2023 UTC).

Diagnostics

Succeeded: s/otpion/option/

Succeeded: s/ ias / bias /

No scribenick or scribe found. Guessed: Wilco

Maybe present: carlos, daniel, enrico, giacomo-petri, JYM, MarkRogers, wilco

All speakers: carlos, carlosd, CarlosD_, daniel, daniel-montalvo, enrico, giacomo-petri, Jean-Yves, JYM, kathy, MarkRogers, suji, thbrunet, trevor, wilco

Active on IRC: CarlosD, CarlosD_, daniel-montalvo, giacomo-petri, giacomo-petri_, Jean-Yves, kathy, MarkRogers, suji, thbrunet, trevor, Wilco