ACT Rules Community Group Teleconference -- 09 Dec 2021

<Cliff> pwd?

<scribe> scribe: CarlosD

Introductions

round of introductions

Measuring test coverage and gaps (Presentation by Kristian Kristofferson)

Wilco_: Kristian couldn't make it, but Jean-Yves will take over

Jean-Yves: we have been thinking about how to promote ACT-rules and what are the benefits for business
... there has been a growing interest in ACT-rules
... business need confidence in the tests they are using
... but some times is hard to address some of the questions they make
... sometimes even we don't know what is the real coverage
... having a dataset of pages that we could test about we could identify not only the SC that are being tested
... we could also have a better estimate of the percentage of SC violations that ACT-rules can identify
... it would be easier to evaluate the usefulness of ACT
... and also help identify what new rules would represent the biggest gain
... but we have the challenge to build this dataset

Wilco_: you're suggesting we build a set of pages where we know how many violations are, and we would be able to work how many would be caught by ACT-rules

Jean-Yves: correct
... the data set should be built on real pages
... the current test cases are useful but don't allow us to estimate the number of problems in real websites

Wilco_: any suggestions on how we would build this?

Jean-Yves: not really... we need to discuss this
... of course, the larger the data set the better it would be
... we could built it incrementaly
... if our organizations are assessing pages we could be adding them to the data set
... but there are issues to consider: we might need to archive the pages
... they might need to be anonymized

anne_thyme: as part of the WAI-Tools we used a data set of pages to assess the rules
... they were real pages, and I agree that we should use real pages
... but we might need multiple testers to review the pages, because they will find different problems, and we should try to find as many problems as possible
... we will need to decide on the manual testing methodology for these checks

Jean-Yves: one benefit from this problem could be the identification of places in WCAG that need further clarification

Drew_Nielson: I agree with the use of real pages
... but we need to define a method to incorporate the page in our data set
... there are copyright issues
... but also if we test the page and find issues with it, it can be negative for the owner of the page
... even if we anonymize it, it can be possible to find who the owner is by comparing the code

Wilco_: how many people would be interested to contribute to this project?

Jean-Yves: I believe we need a static archive of the page
... otherwise if the page changes we would need to do the manual check again

Wilco_: one potential place to start is with the EU monitoring of the web accessibility directive
... I believe some of the manual assesements should be public

TomBrunet: we might need to consider other aspects of the page, not just the violations

anne_thyme: the testing pages in Denmark will not be public
... maybe we can use the data from gov.uk

<Wilco_> Carlos: Don't think Portugal monitoring pages are public

Helen_: on people only checking what's wrong and not what's right
... when auditing people also tend to mark what passes

kathyeng_: the trusted tester training course has practice content where we know what should pass and fail
... this might be available
... but I don't know how "real world" the content is

Wilco_: I guess that is key for this project to be able to find how much a rule or implementation covers

TomBrunet: on the testing, there is implicit stuff that we don't check, because it is part of the context of the test

Wilco_: what would be next steps?

Jean-Yves: we need to research what we can put into the dataset
... we can start with a small dataset
... that will allow us to start comparing findings of the tools with manual testers
... that could allow comparing granularity of checks from automated rules and manual tests

Drew_Nielson: suggestion to address copyright issues
... in the US anything created by a public body is essentially free to use
... we could start collecting examples of real world pages there

<Helen_> +1 to that idea

Wilco_: can we set up a meeting between people interested in the project to define what would be the initial steps

Interested people: Jean-Yves, Wilco, Carlos, Andrew, Cliff, Helen, Daniel

<TomBrunet> Just for context and thought on what I mean.. w3c.org has 500 DOM elements. Sanity check of name/role/value verification alone would be 1500 test results. Most of these checks are implicitly skipped because they're just correct by default.

ack?

Implementation data on the WAI website

Wilco_: the ACT CG website has a table of implementations at the bottom of every rule with at least one implementation
... the table reports consistency and whether the data is complete
... and allows checking at individual results for every test of a rule
... they don't have to match exactly
... these tables are going to move to the WAI website
... and the rules are going to be linked from the WCAG techniques
... but we're currently redefining what we mean by a consistent implementation

for consistency we defined levels

scribe: it can be complete, which means no false positives and negatives and you report on the SC
... it can not have untested results
... if that is the case, it will be a partially consistent implementation
... with a partial you can have false negatives, but not false positives
... the minimal implementation is a new level that shows you can test the existence of something, but not the requirements
... basically, it tests the applicability of a rule

TomBrunet: does the test case id changes when you modify a test case?

Wilco_: yes

TomBrunet: does the implementation report get updated when this changes happen? we have submitted one report and haven't seen it

Wilco_: I haven't been able to update it yet
... we also want to start reporting the coverage of an implementation
... the coverage is a metric of how much you covered compared to the total number of test cases
... tools report cantTell results, so this metric will account for that
... we're also determined automated coverage
... for tools that only do automated this will be equal to the total coverage
... for tools that include semi-automated checks it will allow distinguishing
... Is this a good direction?

Skotkjerra: can you provide further context at why we are allowing false negatives but not false positives

Wilco_: it comes from what ACT rules are designed to do - find problems
... in almost all cases where an ACT rule does not find a problem, further testing needs to be done

Skotkjerra: but if we want to measure the quality of the coverage we also need to know what is not covered
... the best way to have few false positives is to test as little or critically as possible
... we might be sending the wrong signal

Wilco_: is the implication that we shouldn't be reporting partial implementations?
... either you have a complete implementation or you don't
... we're running out of time... mail me if you have any concerns about this process

- DRAFT -

ACT Rules Community Group Teleconference

09 Dec 2021

Attendees

Contents

Introductions

Measuring test coverage and gaps (Presentation by Kristian Kristofferson)

Implementation data on the WAI website

final thoughts

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output