W3C

Accessibility Conformance Testing Teleconference

09 Aug 2018

Attendees

Present
Anne, Kathy, Moe, Jey, MaryJo, Wilco, SteinErik, Shadi, Alistair
Regrets

Chair
Wilco, MaryJo
Scribe
SteinErik

Contents


Issue 238: Structure of the text https://github.com/w3c/wcag-act/issues/238

<Wilco> https://github.com/w3c/wcag-act/issues/238

Wilco: Came up also when discussing Shadis comments
... Do we try to keep the sections or do we try to separate composed rules related sections

Anne: I argued that section 4 - act-rules structure - it would be nice to have the difference between rules and atomic rules side by side to see the difference. But when reading through Shadis comments, I realized we still need to align things
... since there is a mix of atomic and composed rules sections it doesn't really match

<Wilco> https://www.w3.org/TR/act-rules-format/#act-rule-structure

Anne: Should we split it up completely or keep it as it is

Wilco: My original proposal was to split it out, but I believe we settled on a merged approach
... the accessibility support section

Anne: The order is still like it used to be in the overview

Shadi: In my comments I was only talking about section 4 specifically
... I propose splitting out the items for composed rules. I think the editors tried to be very clear that composed rules should have specific test aspects.
... I think Annika is talking about drawing references to composed rules out separately rather than having them inline in the text

Wilco: So any section that is applicable to only composed or atomic rules this should be stated in the first sentence

Shadi: Adding a sentence saying e.g. "This section does not apply to composed rules or vice versa.

Anne: We should be careful that we don't accidentally place information related to composed rules in sections on atomic rules etc.
... We put a separate paragraph in sections that are only for atomic rules

Issue 239: False negatives definition https://github.com/w3c/wcag-act/issues/239

Wilco: The question was if false negatives should distinguish between inapplicable and pass

Shadi: I see where Audrey is coming from. For you (Wilco) is a positive is when a rule correctly triggers. She is thinking about positive as no issue.
... So, false positive would be when a tool or an assertor does not identify an issue, so you assume that there is no issue - at least there is no failure. A false negative would be the opposite when assertor says there is something wrong where there isn't

Wilco: I can see why that can be confusing
... my understanding of false positive is if you have identified something that is not there

Shadi: so you have opposite conditions

Anne: Would it make sense to make a note that false positives are type 1 errors and false negatives are type 2 errors

Wilco: It should work like: a false negative negative occurs if a rule doesn't identify an issue in relation to the accessibility requirement

Anne: That is not what we wrote. We base it on the percentage on test targets
... here we only look at if the expectations work as they should
... as I read it test targets are things that are applicable to the rule

Wilco: So there is no context of the accessibility requirement

Kathy, When I looked at the google definition, I think the way that expectations are written...I am going back to 9.2. It is written that an accessible name describes the purpose of the test target. So is that positive?

Kathy: if a false positive is it that it passed when an expert would not have passed it - meaning that the condition is present, but it is not.

Missed by scribe

Kathy: Because our outcomes are written in a positive terminology a false positive is something that has assessed when an accessibility expert would have failed it.

Wilco: I think you're right, and that is not how we collectively talk about false positives
... The way I hear people use it is incorrectly identifying something as a failure.

Anne: At Siteimprove we call them issues and talk about them the same way

Shadi: Before we continue, I am questioning the purpose of this section

Shadi, the section doesn't add much as it is right now. There is no definition of who qualifies as an accessibility expert. Secondly it doesn't provide any parameters

Shadi, Either we work on improving the section of what we expect or we leave it out. I do prefer putting some kind if measures in there

Stein Erik: Agrees we should not keep quality metrics inside the spec. Either extend them or remove

Wilco: We wrote this section to get a shared way to measure the accuracy of a rule. If you write a rule you want to know the accuracy of the rule, and if we are trying to compare rules - and the accuracy of different approaches, this doesn't make any sense

Anne: This is more process related than format related
... This measures implementation accuracy, not rule accuracy

Alistar, Accuracy of unit test comparison is what we have been talking about for ages. You have a shared batch of unit tests and you can run the rule across these unit tests. That is how we understand the accuracy of the test, if it goes into the system and comes out with the result we expect. How do you internally understand whether you have properly programmed a certain test.

Shadi, this relates to a comment we had last week where we discussed whether aspects of the review process should be included in the spec

<shadi> i think that the format alone does not contribute much to harmonization. i think we should include a higher-level aspect of review expectations. maybe have to classes: "ACT Rules" and "Harmonized ACT Rules" or such?

<shadi> ...we actually have two classes of rules: these that anyone can write down and share, and then these that are reviewed within a community and implemented by some number of independent organizations/stakeholders

Wilco: It is kind of true. The ACT rules format doesn't help harmonization isn't true, but it is one step of a process. With respect to the question of do we need accuracy, initially I was under the impression that this would be useful if we want to start comparing rules.
... I only know one or two developer that actively does it, which makes me question whether or not it should be in the spec
... If we collectively are unsure of the value of keeping it it is a sign it should be removed

Alistar: AUTO-WCAG are moving along and things have come out of this. I would like to see us having some sort of shared unit test thing, so we can show that we meet baseline interpretations across different tools. It shouldn't be the test that differentiates, but heading towards a baseline where we agree tests do the same thing

Wilco: That is an alternative approach
... it lets you create some sort of benchmark

Alistar, I think the most useful thing, if we could get to a point where we can have a standard format for writing up the things, but also standardized examples that WCAG x.x. is this, but is not this etc.

<Jey> https://github.com/auto-wcag/auto-wcag/issues/184

Jey: There is development work to develop all the test cases into a format

Alistar, Brilliant if you have already done that in AUTO_WCAG

Wilco: It is using the format we talked about in the ACT TF

Alistar: who is discussing this

Wilco: AUTO-WCAG

Alistar: I don't care what format as long as we publish our content

<Wilco> https://www.w3.org/TR/act-rules-format/#quality-accuracy

Alistar: Currently the issue we all have is that we do our expert testing in a different way. Coming on to my radar is things like the trusted testers in the US. A lot of people taking it up and there is more and more traction. If you could tie some of the stuff to this methodology

Kathie, With regard to trusted testers we have written our testing outcomes in the positive language in an attempt to align with ACT. I think if we are going to use the ACT rules I hope they will agree with the outcomes of the trusted testers. That is why we originally were using a different approach

Kathie: we were wording our outcomes in a failure language. Our failure condition was that the image does not have an adequate description as we now are switching to the passing language

Kathy: Shadi has asked me to take a look of some of the tests to make sure we are not deviating in significant way

Stein Erik: I hope trusted tester will write up their test instructions as ACT rules to contribute

Kathy: We do have a baseline in draft that would be a better version to compare to the ACT rules. Our trusted tester instructions use tools to aid the tester. When we are close to a more final version I'll be glad to share it with the group

<shadi> +1 but needs improvement

<Wilco> 0

<MoeKraft> +1

Wilco: do people think it is a good idea with benchmarking in the spec

<anne_thyme> -1 - suggest having it same place as review process

+1, but question is in the spec or the review process

<kathyeng> +1

<Jey> -1

Wilco: We have a good topic for next week here

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.152 (CVS log)
$Date: 2018/08/09 15:27:55 $