Silver Conformance Subgroup -- 28 Jan 2020

Updates to Conformance section of ED

<Peter_Korn> Am I the only one with audio difficulties?

<janina> scribe: janina

<jeanne> https://raw.githack.com/w3c/silver/conformance-js-dec/guidelines/

<Peter_Korn> scribe: Peter_Korn

jeanne: put in work from last week on sampling, points, into draft.

<jeanne> https://raw.githack.com/w3c/silver/conformance-js-dec/guidelines/#x3-4-points-levels

ack

<janina> pk: Curious about scoring 0-1; what does partial failure look like?

jeanne: depends upon guideline. E.g. 500 images, and 250 have ALT text (via automated text), you could suggest you are at 50% for that SC.
... if you have a rubric, it could be factors of the rubric.

<janina> pk: Or, in a situation where you can't check each page, how does that work?

jeanne: ideally you would follow the rules from sampling in next section... For however many pages/screens/whatever you are testing, how does it score.
... this clearly needs to be made more clear...
... could put in a note for 2nd bullet, if you are using sampling, score is based on sample.

<jeanne> If you are using sampling (see the following section), then the score is by the pages or screens tested

<jeanne> https://raw.githack.com/w3c/silver/conformance-js-dec/guidelines/#x3-4-points-levels

<janina> pk: Would like to see this say "it's an approach, but not necessarily the only approach"

<janina> js: Alternative?

<janina> pk: As on Conformance Doc, many companies have their own methodology and look at critical work flows, typical customer trajectories

<janina> pk: And 250 of 500 alts would certainly score .5; but if missing aren't causing problem than .5 is likely too low

<janina> pk: If critical, like shopping cart checkout, .5 might be too high

<janina> pk: Suggesting deeper knowledge of site is important.

<janina> js: Believe it's the next section ...

<janina> js: Perhaps we should reorder?

jeanne: moving on to sampling

ack

should sample sizes be different for different guidelines? How to decide?

<janina> pk: Suggest s/answer/assess/ because may not be perfect answer

<janina> pk: or "helps give confidence"

jeanne: likes both of those phrases

janina: likes 2nd especially

pk: "Sampling helps give confidence as to whether people..."

<jeanne> Sampling helps give confidence whether people building the product, app, site, or digital property have done the accessibility work that makes the product more accessible to people with disabilities. Representative sampling is a way to assess large properties without having to test every screen. Representative sampling is common in accessibility testing, partcularly for manual testing.

<jeanne> Sampling is optional. Anyone who wants to test all their pages or screens can do so.

<janina> pk: Sampling may not catch all --- need to consider how to phrase ---

<janina> pk: A way to figure out whether you're where you think you are

<janina> pk: if sampling is exposing problems, it doesn't tell you how far you have to go

<janina> pk: I wouldn't know i found the bugs in a QA if I simply sample

<janina> pk: Bug fix rate vis a vis how many may still be there

<janina> pk: Not sure how to capture that in the doc

"Sampling is a tool to help you get a sense of whether things are in good shape or not, but it has significant limits.

<jeanne> Sampling gives an indication of where you are in accessibility, but it doesn't guarentee that you have found every problem. It is particularly effective when a product or site is built with accessible components.

Makoto: in Japan, seeing many cases where companies pick 30-50 pages which are static, and assess on those and make a claim, but only for those pages.
... such cases, they need to test every page; correct?

jeanne: under WCAG, filing a WCAG claim, believes they would have to do that. But not if filing a VPAT.

Makoto: in Japan, 2 options. 1 - claim for limited pages in a site; 2- claim for entire website.
... to make a claim for entire site, do random sampling. For just specific pages, then must test all of those pages.
... in Editors Draft, this seems to be focused on entire website.

jeanne: goal is to move away from "page" to a broader scope. Could be entire site, some portion of site, a mobile app, a web-app. Could be something other than "website". So need to get away from page.
... but don't want to require testing of entire web site - every page on every site.

janina: scoring is a tool to help assess the reliability of whatever the claim made is.

Makoto: in Japan, no obligation/pressure to make website a11y at this moment. That might be reason for their approach. Many examples today in Japan, making claim only for selected pages. Too difficult to make all web pages in site fully conferment.

jeanne: do you feel people would make more conformance claims if we made it earlier to do?

Makoto: entourage folks in Japan to make min. start - 40-50 pages as a starting point.

<janina> pk: It would make sense to leave conformance and perhaps introduce a different term to help get away from the model of 'how far below perfect is OK"

<janina> pk: Where it's measured by how much things fail or succeed vs a task analysis model

jeanne: we want to have the organization to decide - what is a representative sample. But their choice to do so.
... then have breakpoints for sampling sizes.

<janina> pk: First reaction is samples are small numbers

<janina> js: Agrees but not sure testing more exposes much more

jeanne: doesn't feel that there are distinctions for more than 1k.

<janina> pk: Thinking of uuniversity with 1k courses plus teachers very different from a 1+ billion site like where updates are 10K per second

<janina> pk: Think we need to look at dynamism and complexity, not just size

joe: difficult to define complexity. Interactions could be anything, things changing could be anything

jeanne: looking at by size, it is fairly cut & dried (not completely, but fairly). Stopped at 1k because at 1k, the type of testing has to change dramatically.

janina: recalls a discussion where we would give people options, based on the details of their site.
... for FPWD, we might express the challenges we see & look for feedback.

<Makoto> +1 to Janina

jeanne: have a note about "complex dynamic apps", maybe rephrase to address Peter's concern.

<jeanne> Note: Should we also have breakpoints for complex or dynamic products?

<janina> pk: Also looking at 3rd para, two basic approaches, don't see how they fit into sampling discussion

jeanne: glad that was brought up; want to talk about that.

<janina> pk: Whjat about talking about sampling as a check on having done, then a template approach into a discussion of ways to develop an a11y site

<janina> pk: sampling is a kind of check when you're building

<janina> pk: More about validating work flows and that key components do what they should

jeanne: have the guidelines work, telling folks what they need to do.
... we are trying to emphasis to folks to do things early. Sampling is our attempt to address one of the concerns in challenges. Not possible for a large company to test every page (even just 1k pages)
... either you cheat, doing representative sample, or you don't claim WCAG conformance at all.
... want people to be able to say "My site meets WCAG, and I didn't test every page"
... so what wanted to do with sampling, is allow people to "claim WCAG conformance", and giving them some options on how to do that, and put boundaries around it.
... now many may (instead) fill out a VPAT, which is a different issue.

<janina> pk: So trying to define site vs page conformance? Which is why I wonder a different word, rather than conformance, might be helpful

<janina> pk: the new term tells us something different from wcag conformance

<janina> Janina Note: Term == word

jeanne: that is what we are trying to do - find a way for folks to make a strong and meaningful statement about a11y of the site.
... trying to say, you don't have to test everything. Moving to percentages. You can meet WCAG with sampling.

<janina> pk: Even then seems we're still bound up in conformance=perfection terminology and no prioritization of what's important for important work flows

<janina> js: OK, but how to write that?

jeanne & janina: if this is easy, it would have been done already

jeanne: maybe kickoff the whole conformance & scoring section to make clear we're moving away from testing everything on every page to say you conform.
... "conformance" is a W3C term - that technology is interoperable. Two different browsers can take the same content, and render the same result.
... conformance has become a historic term. It used to be required - every spec. has to have such a section; it no longer is required.
... BUT - people who are used to WCAG are used to "conformance".
... Maybe kick off with a paragraph - "This hasn't worked well, and..."
... "Want to get away from 100% or nothing, and you should get away from having to test everything"

janina: agrees - messaging is very critical. Need to be very thoughtful about that.

- DRAFT -

Silver Conformance Subgroup

28 Jan 2020

Attendees

Contents

Updates to Conformance section of ED

should sample sizes be different for different guidelines? How to decide?

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output