Silver Conformance Subgroup -- 11 Feb 2020

<KimD> scribe: KimD

<sajkaj> scribe: sajkaj

Updates to Conformance section of ED

<jeanne> https://raw.githack.com/w3c/silver/ED-draft=comments-changes-js/guidelines/#scoring-conformance

js: Lots of changes, esp to Intro
... Many of the later comments were best addressed in the Intro
... Most info architecture comments easily addressed

<KimD> Yes, please

<jeanne> Diff version https://services.w3.org/htmldiff?doc1=https%3A%2F%2Fraw.githack.com%2Fw3c%2Fsilver%2Fconformance-js-dec%2Fguidelines%2F&doc2=https%3A%2F%2Fraw.githack.com%2Fw3c%2Fsilver%2FED-draft%3Dcomments-changes-js%2Fguidelines%2F

js: reviews scoring bullet points ...

<jeanne> Have the conformance better reflect the experience of people with disabilities using the site or product. It allows an organization to say that "we may not be 100% conformant, but our site or product is very usable by people with disabilities" and they can express that in a meaningful way. By changing the "all or nothing" approach of WCAG success criteria conformance to a percentage based

<jeanne> scoring system there is a more nuanced representation of conformance.

kd: Asks whether wcag 2.x tests will still be relevant to use

js: Yes, Makato is working on how

kd: So how do we score those if they're pass/fail?

js: Some things are by individual item, instance of, e.g. logical heading structure
... That could be site wide
... Is it semantically coded correctly is item by item
... Rubric seems best for things like clear language

kd: I think I will need this diagramed to really grok

<KimD> scribe: KimD

Next bullet: Formalize a representative sampling approach to conformance to address the needs of large, complex, or dynamic sites, apps, projects, or products that cannot test every page as required by WCAG 2.x Conformance. We reference specific sections of WCAG-EM to define the representative sampling approach.

<jeanne> Formalize a representative sampling approach to conformance to address the needs of large, complex, or dynamic sites, apps, projects, or products that cannot test every page as required by WCAG 2.x Conformance. We reference specific sections of WCAG-EM to define the representative sampling approach.

Jeanne: specifically called out WCAG-EM because there were questions about this.

Next bullet: Allow the organization (this includes company, business, non-profit, government) to prioritize what it important for their product for accessibility assessment so that the organization would not fail for bugs that did not have a negative impact on the accessibility of their site or product to users with disabilities. Many sites will have workflows that are less critical than other portions of the site, and that should be determined by the site[CUT]

Next bullet: Ensure that conformance doesn't dis-proportionally favor one disability over another.

"Scoring and conformance is divided in to the following.." no major changes.

3.2 Points & Levels

<jeanne> Survey responsess doc https://docs.google.com/document/d/13SMA551BOg2JAkOqO_oF0jJutQkcUOexp-SOvO0CGXM/edit#heading=h.ea8q19e3etob

response: The points haven't been introduced before, and it is not clear, how they can be gained and how they are calculated. The categories are neither defined nor explained. The structure (Information Architecture) does not show where and how points and categories are supposed to be defined. It does not become clear how the Levels A, AA and AAA are matched into points and how they are reflected in Bronce, Silver, and Gold.

<jeanne> It does not become clear how the Levels A, AA and AAA are matched into points and how they are reflected in Bronce, Silver, and Gold.

<jeanne> Response: In the Information Architecture section it states that we are removing A, AA, and AAA. Bronze, Silver or Gold will be based on the total score once the minimums are met.

A purely numeric approach (no weighting for severity, for example) yields a meaningless number.

Imagine a page with 20 input fields, and one of them cannot be reached by keyboard. This yields 95% compliance, yet the user's task on this page cannot be completed.

Our response: Correct, but our we want to give credit for 19 correct inputs

What does "a guideline isn't used" mean? How can a guideline not be used? Is it not applicable? Not tested for conformity (and thus rated as non-conformant)? Response: Added “not applicable” and an example.

are percentages for single guidelines (requirements) as well as their sum (which people will call percentage) as well as an overall percentage, confusion and misunderstanding are foreseeable. Response: Suggestions are welcome.

Jeanne: an example would help.

It is unclear how points are applied, accrued or lost. It is also unclear how points are determined against the 7+ functional needs articulated in EN 301549. How would points be applied in the previous example of headings and heading level (in the context of the 2 user-groups I mentioned: non-sighted users versus users with cognitive disabilities? Response: We will create an example to clarify.

The text should offer up Points & Levels as one approach to rating website accessibility – a thought through suggestion for strong consideration. But it isn’t the only way, nor will it be the best way for all sites. Profoundly unclear how this points thing will work. How does it all add up? Response: Please give an example of a site for which the point proposal will not work? For the lack of clarity, we will create an example to clarify.

Next comment was about not weighting (opinion)

scribe: going through changes

Sampling

Jeanne: answered many questions by quoting the WCAG-EM

<jeanne> While much of the proposed breakpoints are pretty good, for example on sites with 11 - 100 pages it currently suggests "10% of the pages need in-depth testing including manual testing." - this would equate to an 11-page site testing one page. Since Deque has some experience in this realm, this is like not enough of a representative sampling, and we would propose that for in-depth testing

<jeanne> it would be more like "no fewer than 20 pages or screens". This is (should be) open for discussion.

<jeanne> Response: Good catch. I like the “no fewer than” although I think that 20 is too large for small clients. 10 is more affordable and often works well with small to medium companies.

Janina: agree with your comment

Jeanne: "no fewer than" addresses the testing issues where only one page is required.
... breakpoints seem arbitrary - "where's the research" - we would be happy to have some

survey comment: While there is certainly flexibility, there is no reference to research or statistical models to support the particular breakpoints. They feel arbitrary.

Jeanne: wonders what work went into WCAG-EM and they did research

There is data about how big a sampling size needs to be... just have to find it.

Jeanne: any valid research would be helpful. Benefit of FPWD is getting more info

Janina: Is there general principles/guidance for determining sample size
... CDC must have this issue.

<jeanne> https://www.w3.org/TR/WCAG-EM/

Janina: "statistical margin of error" concept - we need this info

Lots of detailed info in the WCAG-EM doc

Jeanne: it addresses a lot of the issues people had.
... example is 'pages in a process' issue
... please take a look and see if you can find something substantive for the meeting Friday

Send your info to the List

scribe: wants to review example with someone
... using WAI's before & after demo is harder to use than expected.

Exposed some flaws about how we wanted to do scoring - want to walk through that

- DRAFT -

Silver Conformance Subgroup

11 Feb 2020

Attendees

Contents

Updates to Conformance section of ED

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output