Silver Deep Dive -- 11 Aug 2020

<ChrisLoiselle> scribe: ChrisLoiselle

Intro & Scribe

<pkorn> I'm having difficulty getting into the Zoom room. I'm told "The host has another meeting in progress"

<kirkwood> same

<jeanne> We are using a different call today for all 3 meetings

<Detlev> I can scribe the nex hour

Chuck: Welcomes everyone. Asks for scribes for meetings throughout the day.

Approach for Today’s Deep Dive

<pkorn> Thanks!

When you have comment or question, please announce name for captions and speak slower so captions can be taken appropriately.

Chuck: We are looking to agree on what will be included in first public working draft.

Revisit MVP

<Chuck> https://docs.google.com/document/d/1tQHgVFaJYS1WWs9BKucZxWboMNVuclvdNqnQuzPbWwY/edit#

Chuck: Speaks to minimal viable product (MVP) , shares link for slide 3, Revisit MVP which has milestone dates.

<Chuck> https://w3c.github.io/silver/requirements/

Chuck: Pastes in silver requirements doc link in IRC
... Reads through timeline dates provided on https://docs.google.com/document/d/1tQHgVFaJYS1WWs9BKucZxWboMNVuclvdNqnQuzPbWwY/edit#

Discussion on Representative Sampling

Chuck: We do need agreement from AGWG on representative sampling

<Chuck> https://www.w3.org/WAI/GL/wiki/Draft_Silver_Timeline

We wanted to ask working group if any concerns on addressing sampling later and providing information in an editor's note later.

Wilco: For representative sampling, are you stating that full websites are no longer accessible but only parts accessible?

Chuck: Yes, it would impact large sites.

<Zakim> sajkaj, you wanted to say we never had fully accessible

Janina: In the past, page by page would need to pass or you fail. I don't think we ever got past the page layer.

We talked to that in the challenges document as well.

<Zakim> Lauriat, you wanted to bring up WCAG-EM

ShawnL: Janina's point is correct. WCAG EM gave outline of different methods, including representative sampling.

Like WCAG EM, we intend to talk to it later but not include it in conformance model at moment.

Wilco: On representative sampling, you'd use it to reduce the amount of testing. You'd need a normative methodology to talk to it.

<alastairc> Presumably this could be part of a later Silver draft, or a separate thing?

ShawnL: The statements of conformance you can build up however you like, i.e. WCAG EM. The scope of conformance and the representative sampling would be in alignment.

<Wilco> +1

PeterK: Concern on representative sampling and not speaking to whether website met obligations, i.e. conforms or another term.

ShawnL: We have proposed a way to declare that.

Just not how representative sampling works within the first public working draft. We'd refer to it in an editor's note.

Wilco: It moves WCAG away from a document that says what it means to be accessible and toward a document that at what point has an org put in sufficient resources to address accessibility.

AlistairG: I helped write WCAG EM. The statement of conformance was the key point. It was not a conformance claim within WCAG EM. Representative sampling doesn't give you straight conformance.

5c in the methodology.

<Lauriat> Thank you for clarifying, Alastair!

AlastairC: I think we have enough to move into first public working draft for feedback. We aren't ruling it out from Silver at a later stage. I think it may be big enough to be a separate document.

Jeanne: We are deliberately moving into that direction and not just being an advice document. It was a request by stakeholder groups on scoring etc.

Wilco: I understand the background. What is being proposed is that much of content is not tested and we are making a declaration of testing without testing , if we are using representative sampling.

PeterK: I agree with the key point, Wilco. In many situations, it is impossible to test everything. We need to have some guidance on W3C on how to make statement of analysis when it is not possible to test all of the site.

<jeanne> +1 Peter

<Chuck> ach Rach

Addressing this is a key requirement for Silver. It feels like a miss if we don't talk to this.

Rachael: The challenge is that Silver is a shift. We need to have a conversation on scoring and sampling. We need to move bar forward. The use of the editor's note acknowledges the importance.

We need the foundation to bet set to build from that.

<Lauriat> +1 to Rachael

<Chuck> +1 to Rachael

AlistairG: sampling is great for this, but it has to be random. You can't test everything. WCAG EM tried to incorporate random sampling, but bias was introduced on required pages being necessary.

<Zakim> Lauriat, you wanted to +1 Wilco's articulated concern, largely why I'd like to not include representative sampling in the conformance model, at least in the FPWD.

ShawnL: +1 to Wilco's concern on conformance model on first public working draft.

To Peter's point, testing happens at different points in time.

<Chuck> +1 to no rat holes!

PeterK: We are talking to a concept without proper text to review in order to understand it fully. The text that describes thinking and how we are working on it , it is hard to say whether we are comfortable with it or not.

<Chuck> straw poll: Any cannot-live-with objections to leaving representative sampling out from the FPWD? -1 cannot live with, +1 editor's note ok

Do we have proposed language? Without that, it would be a thumbs down on any type of vote.

Wilco: Instead of stating representative sampling will be part of WCAG, making conformance claims will be handled in a different document.

<Lauriat> +1 to Wilco

<CharlesHall__> i think the key is not addressed in FPWD versus not addressed in 3.0 at all

<Fazio> +0

<alastairc> +1 for editors note

<Lauriat> Charles: I think we have two parts to this: FPWD, in 3.0 at all. One depends on the other, certainly.

<jeanne> +1 for editor's note

for voting, +1 for editor's note, or -1 if not acceptable.

<Rachael> +1 to editors with a note that we need to work out wording for the editors note that represents the complexity of the issue and need to address in the future. The note will come back to the group.

<Detlev> -1 if conformance concept moves away from page scope towards path-based scope thers should at least be an indication how the bits of paths are going to be sampled

<pkorn> -1 on "use an editor's note to say we will come back to this" until having seen at least a rough draft outline of the editors note

ShawnL: One option is to address it as editor note after first public working draft, not including it, or including it in some fashion in the first working draft.

AlastairC: We need to determine scoping. It is difficult to know that in advance.

<pkorn> +1 to Alastair. Scope is a critical question for FPWD

<Detlev> OK, correction - if scoping is in, representative sampling can wait IMO

<Nicaise> -1

<Rachael> +1

<pkorn> -1

<Wilco> -1

<JustineP> -1

<agarriso> -1

<srayos> -1

<Lauriat> -1, back of the queue

<Francis_Storr> -1

<kirkwood> -1

<Detlev> +1

Plus 1 to an editors note, or -1 to back of queue .

<bruce_bailey> +1 for editors' note

<GN015> +1

<Fazio> +1

<jeanne> 5 +1 in Zoom chat

RESOLUTION: Move topic to back of queue

Scope & Definitions

Chuck: presents slide 4, Scope and Definitions

<alastairc> View: All content visually and programatically, without an interaction equivalent to loading a new page or state OR

Purpose of this section is to discuss proposed definitions , view vs. path

<alastairc> View alt defintion: All content presented on the screen at a single moment

<alastairc> In either case, “View” can be applied to components, templates, design systems, etc

<alastairc> Path: The content and associated views needed to achieve something OR

<alastairc> Paths can happen on one screen or multiple screens, be (likely always will be) a subset of a Single View or Multiple Views, and always include editorial content

Thanks, Alastair.

Chuck: Any questions on definitions?

<Zakim> Lauriat, you wanted to ask what "always include editorial content" means

ShawnL: What exactly does always include editorial content means?

<kirkwood> I question the “always include editorial content” as well

<CharlesHall__> “All content presented on the screen at a single moment” does not define screen or consider virtualized content.

PeterK: Question on the word something within the path definition. Have we decided what "something" is?

<kirkwood> “something” should be “a goal” no?

<Zakim> sajkaj, you wanted to suggest these two aren't necessarily mutually exclusive

Janina: On two proposals on silver presentations, the concept of the path , i.e. logging in and then the steps to purchase and then leaving smoothly, none of that should get in a way of evaluating the framework.

I don't think the two proposals are mutually exclusive. I think there is a set of things you need to achieve and the evaluate of the tooling separate from tooling populated with content.

Chuck: On content on a page, that refers to framework of components , they don't have content until you put it in to framework.

<pkorn> +1 to contrasting with the WCAG 2 "complete process". I'm not suggesting it needs to have the same definition, but a contrast would be illuminating.

Detlev: Views and Path, i.e. checking out , what would be a task vs. a path, vs. a view. There were complete processes talk in WCAG 2. I.e. you can place things into a shopping cart, but you can't check out. Is it accessible?

<Zakim> alastairc, you wanted to answer whether options are exclusive ornoty

<pkorn> I also think having a few examples of "doing something" would be very helpful.

AlastairC: To Janina's point, they weren't mutually exclusive proposals. The real question is do we think path is a useful concept to include with scope of conformance statement.

<sajkaj> +1 to ac

<Ryladog> I think we need to include path

<kirkwood> +1 to path being a useful concept (very)

<Ryladog> It also matches a 'Use Case'

Wilco: I think path seems useful, but I'm not sure I understand it in full. Path starts at a certain point and then moving through that. Similar to process.

<alastairc> I *think* path is more detailed than process, you'd be charting a path through views, including what navigation people use on each view.

<Zakim> Chuck__, you wanted to answer question related to "page"

<Zakim> Rachael, you wanted to say that these are starting definitions, both proposals include both but need to decide on whether view or path is central.

On defining view, what is the purpose ? I.e. a design system or component is accessible, but how you use it is still important. I.e. what's the value to breaking it down further than a page level?

<CharlesHall__> task = atomic action; path = superset of tasks to a user need; user need = end goal of use of the technology; view = no longer a useful concept due to things like PWA, virtualized content, and XR.

Rachael: Value is there , where one proposal is additive . I.e. moving from one stage of development into another , from design into development.

<Rachael> original definition: A single view or the complete series of views and the specific components & content needed to complete a task from end-to-end.

<Rachael> original definition: A single view or the complete series of views and the content needed to complete a task from end-to-end.

<Zakim> jeanne, you wanted to report on past Silver discussions on transparency

<agarriso> Definition of complete processes from WCAG-EM = https://www.w3.org/TR/WCAG-EM/#complete

Jeanne: The need is for transparency of what they are actually claiming for conformance.

AlistairG: talks to WCAG EM link pasted in IRC. Views and Paths are both useful. If someone's critical path is what they are interested in, that is what they will be interested in. If its a design system that is being reviewed, that is the important item to be analyzed and reported on.

The remediation process of who made the component goes back to the design system and component topic.

Katie: I think that path is a way to discuss complete processes. It also matches use cases. I.e. this path is accessible, to complete a task. I agree view and path are both important.

<Zakim> Lauriat, you wanted to answer the goal of changing from a page to a task/path/thing.

<jon_avila_> I agree as well that both view and path are needed as well as views being components

ShawnL: The original motivation was to move from conformance at page level. Tying it to a user action vs. DOM structure is important . Mapping to user.

<jeanne> +1 Shawn

<Zakim> alastairc, you wanted to say the value of that path is providing a mechanism of saying which areas of a page are more important.

PeterK: I think mapping to what user wants to do is very important.

AlastairC: Using path on defining conformance , you'd define in more than a process , the path would be starting on a home page, going through "X" navigation to get to "X" .

This helps with granular detail of what is important on a given page.

<Rachael> Path: The content and views that need to be completed in order to accomplish an activity.

<Lauriat> +1, and we do have a topic later today to talk more about how that'd work.

<pkorn> Mild suggested revision:

<Fazio> I feel like path and process which is defined can be synonymous

<Lauriat> +1, they can be

<pkorn> Mild suggested revision: "The content and view(s) (or portion of a view) that need to be completed in order to accomplish a desired activity"

Rachael: Portion of view will come down to which proposal we are going with.

<Detlev> I offered

PeterK: To Rachael, outside a path items were looked at, but were scored differently. Does your proposal include that?

<Detlev> scribe: Detlev

<Zakim> CharlesHall__, you wanted to address the word view

<Wilco> +1

<Lauriat> +1 to Charles

<Fazio> +1 to Charles hall

Charles: Thigns with word "view" that are troublesome: may appear ableist, has a meaning in native apps, and includes some things we want to iclude in the future

<jeanne> +1 to Charles - especially ableist.

<Zakim> Rachael, you wanted to say yes and propose revision depending on model chosen" The content and associated view(s) that need to be completed in order to accomplish a desired

Rachael: Agree, we can think of another term

<Lauriat> +1 to Rachael for how to solve

Rachael: agrees with Peter

path 0 associated views needed to complete a desired activity

Chuck: bouncing between path and view - suggests straw poll

<Chuck> poll pt1: Is path a useful concept to include in the conformance model?

<Fazio> +1

<CharlesHall__> +1

<sajkaj> +1

<kirkwood> +1

<Rachael> +1

<alastairc> +1

<Lauriat> +1

<jon_avila_> +1

<Francis_Storr> +1

<mattg> +1

<agarriso> +1

<JustineP> +1

<sheri_byrne-haber> +1

<jeanne> +1

<bruce_bailey> +1 that path is a useful concept

<Chuck> +1

<Ryladog> +1

<alastairc> Great, then it's just the definitions...

<jeanne> +1 from Jennison in Zoom chat

<pkorn> +1

Chuck: so we need to work on definitions - reg. view - we like the concept but not necessarily the word
... what are current alternatives?

<kirkwood> +1 to both

Katie: Why not have both, path and view?

<agarriso> +1 to both

<jon_avila_> We can have both

<jon_avila_> we would

Alastair: both are included - path just adds more flexibility for related views for the conformance process

<Lauriat> -1 to Alastair, that still ties it too much to traditional HTML

Alastair: there are differences between definitions of view given: everyhthing given without loading new page is a lot moe than what is currently presented on screen

<Zakim> Lauriat, you wanted to add "an atomic path piece" (which needs better wording) as a concept of view

Shawn: Proposes concept 'atomic path piece' instead of view

<agarriso> What about "Aspects of the UI" instead of view

Shawn: example is the slide presented in Zoom with sidebar nav etc
... declaring the whole thing as part of a single path is not helpful - but if you homw in on creating new slide and what you need to do for that is more useful

<Fazio> reminds me of user needs

<CharlesHall__> +1 to Shawn, which is also a reason to have the author define their own paths

<Nicaise> +1

Shawn: better to think of atomic pieces rather than visual aspects or traditional pages

<sajkaj> +1 to sl; I have regarded "view" as a common phrase expressing the more technical "atomic piece"

Wilco: a path may include many aspects on the screen, difficult to exclude bits

<jeanne> +1 to atomic piece

Shawn: no conformance claim about a single aspect like 'insert slide' but claim broken down in atomic aspects

AGarrison: Hard to explain something like atomic path component to people will be hard

<Lauriat> +1 to "atomic path piece" needing a better wording, which I flagged as I said it. :-)

<Zakim> alastairc, you wanted to say that the comments would lean us towards the second defintion.

AGarrison: we need to cover components, beyound path and view (which is assembled by components)

<Lauriat> +1 to Alastair, non-interference still needs testing for that

Alastair: If we had path as primary scoping mechanism, it is important to check around that path as well (without following other paths) - leads to the second defintion of 'all content presented at the same point on that path'

<Wilco> +1

Shadi: Shawn please continue thought about process - so the devloper would define smaller paths, but the conformance claim was on the level of application? So what is the relevance of the paths in terms of conformance?

<Zakim> Lauriat, you wanted to answer Shadi on paths vs. whole thing

<alastairc> Shadi - we'll come onto that in scoring, there is a different level applied to paths rather than the view around them.

Shawn: paths are powerful for transparency - breakdwown of application, may exclude additional sidebar content, app level integration
... scope of slides would not include calendar, for example
... so scope would be a collection of paths so missing things can be called out

<Fazio> +1 to shawn

Chuck: on-screen definition of view, paths can be defined to compose views, Shawn?

Shawn: Its the inverse paths would include views / or atomic bits - a single view would not include everything in the path under consideration - but it needs non-interference criteria

agarrison: from end-to-end testing background, paths are very brittle - that would put strain on longlivety of vconformance claim

<alastairc> is that different from URLs

<Zakim> Lauriat, you wanted to speak to longevity

Shawn: agrees with agarrison but depends on context - for highly dynamic content; for more sedate scenarios like Slides, it is much more stable - little bits will be tweaked but the path as a whole will be more stable - an old claim may not be usefull, but a year old may be largely correct. All down to context

Chuck: alternative to 'view' not really concrete / usable yet

<Rachael> path: The content and associated view(s) that need to be completed in order to accomplish a desired activity.

Alastair: path something like "content to achieve a goal"

<kirkwood> “all content present on screen” :view

Alastair: any suggestions?

Chuck: View: all 'content that is programmatically determinable or percievable'

<Lauriat> -1, that discounts dynamic application-level functionality.

<kirkwood> my suggestion for view “all content present on screen”

<Lauriat> +1 to Alastair (G)

<Zakim> Lauriat, you wanted to speak to non-interference

agarrison: For component library are spread out all over pages would you have to present it on one page? (=)

<jon_avila_> I +1 Alistair as well - that we need to allow units in a way to capture components. But we will also need to link holistically to the components in combination.

<CharlesHall__> +1 to non-interference being tertiary content present through a path

Shawn: we need non-interference when looking at paths - if it is some link in the footer is harmless but other things that affect perception like blinking or sound has to be taken into account
... the definiton of what is included in a path is defined at the guideline level

Alastair: referrign to addition of path definition in bold on th eslides by Rachael

Chuck: agrees that views can apply to component (discussing edition of view definition on slides)

<kirkwood> +1 to “all content presented on screen at a single moment”

<Chuck> link to slides and page presently on: https://docs.google.com/presentation/d/1cN0T-t-uFCRnRJN6hKAbLWsjkTEE454O_FbwFmcNk08/edit#slide=id.g9005bf062d_0_26

<Zakim> Lauriat, you wanted to respond to usability testing

Alastair: concerned with focussing too much on paths - it needs all content presneted alongside whilst progressing with the path?

Shawn: Usability testing is a good way of validating possible interference of stuff outside the path
... Usability testing should reinforce that

Rachael: Alastair, going down that way we need to take care not to have circular definitions

<Zakim> alastairc, you wanted to say that I still think having the rest of the view/page/content in scope (somewhere) is important

Alastair: see people explores pages and get stuck in other places - all of the content presneted is in scope of conformance statement, maybe not primary, but surround stuff should be in scope

<Wilco> +1

Shawn: agrees, it is a starting point

Chuck: view defintion

<jon_avila_> Does this address component testing?

<alastairc> Path: The steps need to accomplish a desired activity.

<Lauriat> -1 to content

Chuck: definition will be pasted in once stable

<sajkaj> +1 -- prefer "path" over "process"

<CharlesHall__> “presented” becomes a challenge if interpreted as static. content can be added dynamically.

Alastair: path is a bit more granular than 'complete processes'

<Lauriat> +1 to Charles.

<Chuck> path: The steps need to accomplish a desired activity.

<jon_avila_> A path should be more flexible than a complete process although at some level a complete process is important but can be covered elsewhere

<Lauriat> +1 to Jon

<Fazio> I keep saying they're synonymous path and process!

<kirkwood> shoud be “the sequence of steps”

<Ryladog> +1

<jeanne> The problem with the WCAG-EM definition is that it would need to be adopted to apply beyond web -- especially to mobile apps.

<Rachael> Complete Process: A sequence of steps that need to be completed in order to accomplish an activity.

agarrison: From Complete Processes definition we should reuse definitions worked out elsewhere?
... view contains path, we have made view into path

Peter: the definition of path needs to make clear if it is one or more views or a portion of a view, should be differentiated, can then be debated
... making defintion of view dependent on path doesn't make sense

<Lauriat> +1 to Peter, the circular definition doesn't help.

<kirkwood> Path should be: “the SEQUENCE of steps in order to accomplish a desired activity”

Gundula: Current definition not cnsistent: path is often a series of interactions; view as 'interface available' seems to spread across the path that would not make sense

Dfazio: Likes the idea of conformance claimer stating what is in scope (like purchase) then breaking it down in smaller bits, some of them more critical than others

Peter: clarifying reg. Amazon: levels of criticality is not an Amazon position

Jon: We seem to be restricted to tightly to complete processes, would prefer a more open term - that the entire process is covered may be covered at another level
... view will not work well for testing components

<Fazio> I prefer interface

Alastair: Do we need the definition of view or would we use 'interface'?

<Fazio> If we use view yes

Rachael: We have lost the concept of the snapshot in time, current state of the interface

<Zakim> Lauriat, you wanted to say I don't think we need it.

<Fazio> +1 to shawn

Shawn: would prefer 'interface' of a path rather than view as a snapshot that might include a lot more stuff
... view much more difficult to apply to non-visual interfaces, say voice
... we need examples how scope can work for differently shaped things

<pkorn> +1 for examples. They are critical for us to wrap our mind around this.

<Ryladog> +1 to Peter

Peter: struck by comment on work that went into defining thiongs like 'complete processes'. maybe a smaller group working through examples in a consistent way to define this - we might get it wrong here

<jon_avila_> We really need a unit of interface or experience

agarrison: also spend a lot of time to define paths to test targets to nail down what exactly you are testing - so all that may be drawn in

Chuck: Sivler TF is the group that has been tasked to review definitions, and examples have been used

<Zakim> alastairc, you wanted to say it needs to be considered as part of the scoring, so could come back to this

Alastair: agrees with Peter that other exisitng thngs need to be considered - maybe we should put this discussion on hold to discuss scoring based on examples and come back to definitions

Chuck: so we may take break early come back stay on schedule reg. meeting schedule

Chuck adjouned - come back in 2 hrs 10 mins

Shawn: All three sessions are just one call

Alastair use the link shared via mails

<Jennie> * Please confirm which Zoom room we should be in

<Chuck> AWK:....

<AWK> Chuck.... ?

<Chuck> The schedule for today can be found here.

<Chuck> https://www.w3.org/WAI/GL/wiki/Meetings/Silver_Deep_Dive_2020-08#Session_1_.289-11_ET.2C_13:00-15:00_UTC.29

<Chuck> it's deep dive day.

<Chuck> 2 hours have already happened, and 1 hour and 52 minutes we meet for the 2nd 2 hour session.

<Chuck> Now is break.

<Chuck> AWK: we can retrospect that if you wish.

<Jennie> * I missed that as well. There are 3-4 people still in the typical meeting room on Zoom

<Chuck> If you wish, I can call you and we can discuss how it was announced, and how we might improve upon that.

<Chuck> in the future

<Chuck> I'm in a mini call now (not directly related), but I can call you soon-ish if you want to send me your phone #.

<Chuck> I have your number.

<Chuck> AWK: I'll phone you once my current call ends.

<ChrisLoiselle> ChrisLoiselle: Scribe

<ChrisLoiselle> Scribe: ChrisLoiselle

<Zakim> bruce_bailey, you wanted to ask about streamtext

AlastairC: We need to explore scoring before moving into representative sampling topic. Discussions Actions from Today slide on sampling , defining view vs. path , etc.
... talks to slide 5, guideline scoring issues. Two proposals. 3 types of testing and weighted scoring .

<jon_avila_> Is there a link to the slides for this proposal?

Rachael: The concept of a view and a path are looked at, but in a different way if content is outside of desired path / view.

<alastairc> https://docs.google.com/presentation/d/1cN0T-t-uFCRnRJN6hKAbLWsjkTEE454O_FbwFmcNk08/present#slide=id.g9005bf062d_0_41

<alastairc> https://docs.google.com/presentation/d/1zUqVZnSKEmQuRpd7aTLvVeI_A-IV0ly3qt3glqqpNBc/preview#slide=id.g8dc9f88081_23_276

PeterK: Another part of this, would be a person could define a set of paths that the site is trying to accomplish. This would impact third party using this methodology.

Rachael: I see conformance declarations as declaring your paths. When a third party evaluates that, they aren't defining the paths they are evaluating if the company met conformance.

<Zakim> Lauriat, you wanted to mention the possibility of independent conformance claims

PeterK: Determining factor would be who is defining those paths and what the failures are , if any.

ShawnL: If someone independently defined your site or product as being be made of "X" parts and identified problems with "X" elements. That would allow a starting point for structure.

Wilco: Are conformance claims going to be comparable ? Within EU, we would want to make sure there is consistency. I.e. third party or own org working on accessibility

There are monitoring agencies within EU that harnesses automation. Would we get similar results compared to path driven approach?

<bruce_bailey> GSA is also doing a very lightweight automated monitoring of Federal Agencies.

ShawnL: Yes, we could have automation type of tests, based on paths. Samples and supplemental documentation would help in this area.

AlistairG: Are we looking at only failing content? We internally are looking at sufficient techniques in addition to failing content. We need those two parts to make a conformance claim.

Sheri: I think this would be a great opportunity to talk to ITI in regard to VPAT process where this would be defined.

<shadi> [[please note that most of the world does not use VPAT]]

<jeanne> +1 Sheri, and it is in our plans.

AlastairC: Discusses slide 5 details on proposals.

There is a difference between the proposals. 3 types of testing comes out as a percentage. Weighted scoring is additive. Testing component and then testing component when in website again.

Rachael: 3 types of testing is either percentage or adjectival.

<alastairc> https://docs.google.com/presentation/d/1w5Pz-T5vvSpe-RZKF6f0uOuql6XHkSJh/edit#slide=id.p9

AlastairC: Are there suggestions / comments on weighting or 3 type proposals?

<jon_avila_> I agree with Shadi that context is really important.

Shadi: I think weighting can become complex. I'm not opposed to it, but we need to review further. It depends are on how the requirements are formulated. I.e. text alternative , weighting would need to go along with more granular requirements.

Jeanne: We worked on weighting in 2018 and the metrics weren't suitable to scoring system. I.e. impact to user, applicability to user's task , number of technologies applied to , etc.

<jeanne> https://docs.google.com/document/d/19uWtVJVhIdkxgzNVDYII0NrIhlgmuxao9fq_Kj5_vAo/edit#heading=h.zdlgvzl15w0e

Jeanne: Provides metrics used on research on weighting to fair to all user groups and stakeholders.

The proposal is different, but issues around weighting may still apply.

<Zakim> shadi, you wanted to ask jeanne if prototyping was using current success criteria or more granular ones

Jeanne: For flashing criteria, proposal may be for a flat score , plus per criteria or need based scoring.

Shadi: When you did the research , did you do it with a certain criteria ? Jeanne: WCAG 2.0 . Shadi: We did this on WCAG 1. and 2 as well. I'll reach out on research.

AlastairC: Are we trying to come up with a perfect accessibility score or are we trying to make it as accessible as it can be? Weighting by guideline may be problematic.

What if you have an instance of an issue. How does that impact the task? The instance aspect needs to be part of the solution on scoring.

Wilco: This is more a conversation on accessibility metrics. Did we research this and draw upon it?

Chuck: The weighted scoring is also talking to severity , as a clarifying statement.

Gundula: 3 types of testing talks to counting how many buttons there are, where as weighting proposal , it is more positive where scoring is additive. If there is a huge product, 11,000 images vs. 2500 buttons on a given app may not be feasible.

<Zakim> Lauriat, you wanted to bring up weighting by test result in the context of a path-type context.

Shadi: having scoring per persona or functional need, the lowest score is what you get (Shawn reading Shawn's comment).

<Zakim> shadi, you wanted to reflect on weighting vs scoring

<Fazio> Agreed

Shadi: Weighting can be done with or without scoring. I.e. the levels of A, or AA on current 2.X WCAG.

<CharlesHall> +1 to Shadi. we also have a kind of weighting with non-interference

<Fazio> One of the issues was the cost of testing with this kind of process being so intricate

Missing a text alternative on an informative image vs. non informative image may have different weight, for example.

<jeanne> +1 to shadi talking about more granular evaluation. We have talked aobut putting soome of that work at the Method level.

<pkorn> +1 - very concerned with the complexity involved in both proposals

Detlev: What would be an appropriate way to making thoughts available , would it be appropriate to paste within the IRC or link to it elsewhere?

Is adding a third proposal be beneficial?

<Zakim> alastairc, you wanted to ask about the non-interference aspect and how that affects scoping.

AlastairC: You can post before meeting ends so we can reference it.
... In regard to non interference within a Silver context, can someone in Silver explain how that helps when scoping a path?

ShawnL: On scoping a path, it is about scoping boundaries of a path. Entry point to Final point. Accomplishing is the ultimate goal.

One example is talking to heading structure and heading clarity.

User may think they need to go to a different structure to accomplish the goal if a label is missing , or if meaningless label is present on the heading.

Another example, link in a footer. Lists of links within footer are probably unlikely to send person off in wrong direction.

This can work at per test level. This is where ACT structure can help for entry points. I.e. for links, this is what you need to check. Given this particular test, this is the score. The score would be framed based on the path the user is able to get through.

Defining path through the testing.

Framing in context of user makes in easier to understand. I.e. flashing occurs on this page at this area, however user is able to get to where they need to and not come across it in the context of what they are trying to accomplish.

Rachael: I think 3 types of testing proposal treated non interference as items that could fail. In Weighted scoring, additional points are given if it doesn't interfere with someone's path

AlistairG: We are almost coming full circle on P1, P3, etc. off of WCAG 1. and then talking to levels of A, AA in WCAG 2.

AlastairC: What do we do with categories of disabilities? Do we try to weight guideline scores? Or are we looking to define if there is a sufficient score?

Wilco: Weighted scoring talks to pass fail in some capacity and adding scoring in additive

Rachael: I'm not sure on the weighted scoring. He leveraged the test set I sent out. Tests can be scored as appropriate. Each test may be binary, it may be a percentage, it may be an assigned quality or adjectival.

Each test has a unique way of testing , appropriate to itself as a test.

<Wilco> +1

Shadi: Considering only types of disabilities to create the weights could potentially disadvantage certain groups. It needs to be explored further to be balanced.

<CharlesHall> not categories of disability categories of functional needs (which include temporary, situational, and contextual)

JohnK: I'm concerned on weighting aspect

Damaging toward tracking and weighting with lawsuits , impact on disability group. General impact it will have on accessibliity.

<CharlesHall> +1 to JohnK if a disability influences weight. but I don’t think that is the intent

<sajkaj> +1 to chuck -- item by item

Chuck: On weighting and understanding, the severity were served as markers and this group would determine certain weighting within the user needs and overall scoring etc.

Wilco: On percentages, it would put us in a position where we need to find issues to define it isn't conforming , to burdening an org to add up the images on a page and review each for say alt text etc.
... Weighting is also risky.

AlastairC: I agree. I would rather state we would not leave anyone behind rather than weighting. I think inclusive categories page is a safer approach.

Shadi: Does one have to exclude the other?

AlastairC: Weighting, would go to overall score. Vs. a minimum score on other proposal around alt text.

PeterK: What are the characteristics of the different proposals? In following 3 types of testing, if it was not a critical how was that impacted?

Yes, Bruce, that would be great.

<bruce_bailey> scribe: bruce_bailey

<ChrisLoiselle> Thanks!

AC: do we need draft set of weighting for the FCPWD ?

PK: depends on the earlier straw poll -- I depends what the weighting are.

AC: We need a subgroup to propose something for that. Rachael will be showing and example (in a bit).

Janinia: We had a hard time coming up with correct weighting

<sajkaj> +1 to shadi, I think John was trying to quantify what Shadi is saying

Gundala: Weighting results in a numerical score that might not reflect what you mean.

AC: Both proposals provide for a way to score by catagory.

Shadi: Gives an example of Flashing as a critical group or need, so it is a minimum requirement.

<jeanne> +q to say that it is difficult to weight by impact and have it be fair across groups. The totality barrier

<pkorn> +1 Really important that we move WCAG 3 to be much closer to the actual user experience

<Lauriat> +1 to pkorn's note, exactly.

Shadi: but most of us intuitively note when technical failures are not really blocking. So things that are a failures, but not such a big failure.

<shadi> [[for the record: did not mean to say *without* considering impact on users! but also considering other parameters is what i meant]]

Brooks Newton: For product owners, weighting becomes a good faith question of where do we start?

<jon_avila_> Customers do ask which issues to fix first and if you give them a score they will ask how to increase that score.

scribe: There should be something in the Guidelines that gives product owners insight on the weighting.

<Zakim> Lauriat, you wanted to -1 to weighting or rating severity at the guidelines level without taking into account the impact to users.

<Fazio> True

scribe: It is important to be able prioritize error, as we have with levels. But a AA issues probably a blocker for someone.

<pkorn> +1 This has to take into account the path

David Fazzio: Neuro psychological evaluations do this. They rate by percentile. They have to have large sample groups to justify their weightings.

scribe: That is too complicated for us. But it is a bench mark we can use.

<Fazio> +1

<Lauriat> +1 to Rachael's point, we definitely need a way to express that.

<CharlesHall> +1 to Racheal on cumulative impact

Shawn Laurriet: I am not in favor of weigthing at the guideline level

<Jennie> +1 to Rachael

<alistair_garrison> +1 to Rachael

scribe: The score would be expressed at the test level in the context of path.

<Zakim> jeanne, you wanted to say that it is difficult to weight by impact and have it be fair across groups. The totality barrier

Rachael: The earlier concern as discussed on Silver calls has been the discriminatory effect against individuals with cognitive disabilties.

Wilco: Having greater granularity can incentivize product owners to game the system. I think WCAG 2.x does better with this.

AC: That can be addressed by how scoring is built up.

Jeanne Spellman: We had this problem in legal setting where judges wanted to know why one Success Critera was Level A and other was Level AA.

<Zakim> alastairc, you wanted to say that prioritising by disability is an intuitive question, but terrible thing to answer.

<CharlesHall> The Functional Needs list intends to specifically not identify User Groups or Disability Types – only the Functional Needs.

scribe: Judges wanted a clear explanation. As we base Guidelines on users needs, we hope minor guidelines won't have a big impact on users, and could be dropped.

Chuck Adame: As a project manager, I don't see how can meet the time line and have this weighted concept included in FPWD.

Alastair: I think we acknowledge that we are working with clients, but it can be counter productive to cater to client needs too much.

<jon_avila_> Amen - weightings are not the way to go.

<Wilco> +1

Alastair: Client can help set prioritizations, based on backlog and time requirements.
... In our work, we avoid disabilties categories in general.

Michael Cooper: I worry that any weighting is going to be impossible in practice.

<Jennie> +1 to Shadi - I see a cost to justifying and explaining why one path is chosen, not another. We would need extremely clear guidance for this.

scribe: Even if we agree in principle, we will not have an objective and unchallengeable algorithm.
... And as we tweak, every thing will change. I think we might do better with user group thresholds.

Shadi: Reacting to what Shawn said, I think this could add a lot of complexity. And then the difficulty of figuring out what is on the Path.
... I am starting to change my opinion based on alll the concerns people are razing.
... Equal weights cause problems, but the opposite could be true as well.

Peter Korn: Weighting by path can help with cognitive issues. I am thinking aobu the concept of spools. A number of SC have grey areas.

scribe: Getting muddles and cumulative friction is not a bad match to cognitives issues, so I see a lot of values with Rachaels proposal.

Rachael: We have to recognize that there is going to be complexity.
... The weighting by path helps to address this.

Alastair Garriosn: Neither proposal is entirely what we want. There is concern with weighting getting enshrined by laws and regulation.

Alastair Campbell: Not a binary choice now. This is only for FCPWD. We do want to avoid premature adoption of Silver.

<CharlesHall> the author should determine the path as part of scoping versus an evaluator determined

Jennie Delsi: I appreciate Rachael's presentation, but I have questions about enforcement around the scoring and the path based analysis.

<Fazio> wireframes should provide the paths pretty well

Rachael: We also have the concept of View. So if you do not have a path, the scoring still works. And it tells you want to fix.
... If your score is too low, whatever the number, scoring show where the fix should be focused.

Shawn: We need supplemental document with exaamples, especially for simpler cases.

Chuck Adam: It is not a binary choice between path and view, or between the scoring models.

<alastairc> Straw poll: Should we commit to weighting guidelines by disability category for the FPWD? +1 for yes, -1 for no

<Wilco> -1

<Chuck> -1

<JustineP> -1

<Rachael> -1

<Brooks> -1

<Lauriat> -1

<jon_avila_> -1

<Melanie> -1

<Ryladog> -1

<jeanne> -1

<Fazio> -1

<Francis_Storr> -1

scribe: There is a sense of urgency because we missed our February milestone for FCPWD.

<sajkaj> -1

<CharlesHall> -1 because it is not disability category, but functional needs and human impacts

<Jennie> -1

<mattg> -1

<alistair_garrison> -1

<kirkwood> -1

<Detlev> +0 - possibly needed for European Web Directive testing

Straw poll for WG committing to weighting by GL clearly fails

RESOLUTION: We won't be weighting guidelines by disability for the FPWD

Guideline Scoring Issues

Alastair Campbell: Next topic is about percentages or adjectival ratings

Racheal Montgomery: We have three options...

Normalization versus additive

scribe: normalization to adjectives
... versus cumulative addition of scoring points

Alastair Campbell shared John FF slides. JF reminds us that this is just illustrative.

AC: Do people have strong feeling about points versus percentage?

Peter Korn: What a like about adjectival ratings is that it collected similar issues together.

scribe: Example: missing alt text is rubbish; or alt text is there but not great; or an image with excellent description.
... Approach lets us take slices, and characterize across a whole site.
... It felt more understandable, instead of a range. So if most images are mostly okay, so the site is mostly okay. I hope we can keep this feature.

Rachael: Between these three, my concerns is that additive is hard as we add new tests and guidelines over time.

<JF> +1 to Gundala

Rachael: It is not clear how works over time and with different versions.

<Zakim> JF, you wanted to remind Testable, Measurable, and Repeatable

Rachael: I did try an and had a hard time with ratings.

AC: So potentially more work for us, but easier to understand over all.

<Lauriat> +1 to JF's point about the future proofiness of the points, since conformance would point to a place in time for WCAG 3.

Gundula: Starting with adjectives early stage leads to rounding errors.
... I recommend keeping the numbers for intermediate steps.

<Chuck> link to requirements: https://w3c.github.io/silver/requirements/

<Lauriat> The requirement around maintenance, for reference: https://w3c.github.io/silver/requirements/#flexible-maintenance-and-extensibility

John Folloit: Testable, measurable, repeatable.

scribe: No reason we cannot adjust min / max nubmers
... We need numbers to show where the adjectival comes from.

<alastairc> q/

Wilco Fiers: These detailed numbers are based on assumption. This is not something we know how to do in our industry today.

<Zakim> Lauriat, you wanted to say that more granular scoring can get us more consistency overall (an assertion that needs validation)

<JF> +1

Alastair Garriosn: Adjectives always have boundaries and cut offs, so below 40 is bad, but above 41 is good.

scribe: Totally different rating even though they minutely different.
... You cannot have a plus or minus on an adjective.
... It seems attravie, but I don't think it will work.

Shawn Lauriat: More gradual scoring can get us to Constance. We have the problem now though.

scribe: One tester looks at the alt text and says its okay. Another says it is not good enough.

<alastairc> The bit I was thinking about: "Be flexible enough to support the needs of people with disabilities and keep up with emerging technologies. The information structure allows guidance to be added or removed."

scribe: I think we aiming just to draw distinction between terrible and fantastic.

JF: I am not seeing anything in charter about need a living standard kind of model.
... In a regulatory environment, we need a great deal of structure..
... We can always adjust some we can figure out how to progress from A to B and B to C.

AC: Scoring does have an impact, so running percentage seems like it is easier to add and remove thing.

Peter Korn: We need to move away from 0 errors on the page, but then we run into problems with scoring.

scribe: But we already have disagreements on the pass / fail level.
... It is amateur of ranged, we don't need floating point numbers.

<Rachael> I think that the unit must be specified at the test level for just that reason.

scribe: Adjective are understandable in a way numbers are not.
... Just giving an 86 since it is a little better, and 84 was the previous score.
... I don't see how that can be effective for 100% based score.

Wilco: I don't agree that granulating improves testing, it just means more arbitrary decisions.
... If character is the using instead of paragraph, that is a different score [for contrast] but that is not helpful and not written up in get rule.
... Once that level of detail starts to matter, we have lots of discussion and decision to make, which really slows the work.

<Detlev> https://docs.google.com/presentation/d/1dV1moNnq-56sS1o84UCKkc_g-gE10X6Y/

Detlev: I would say informed decision, not arbitrariness of the decision.
... focus on the menu what you want from the screen reader.

<pkorn> I think that is where paths come into it. If every menu item is needed for that path, then every menu item's issues is separately counted.

Detlev: You will always have cases where people disagree on pass or failing.

<JF> Numbers are understandable in a way adjectives are not.

Detlev: If we have a five point scale, we can at least make sure thing are better reflect.

<Wilco> +1

JF: Numbers can convey things in a way adjective might not.

<pkorn> John - I think this is a GREAT place where we can do some user testing. Find out from FPWD / larger survey, which works better - adjectives or numbers.

JF: Testable, measurable, repeatable
... You put 5 experts in a room, you get 7 different alt tags.

<pkorn> John - I actually disagree. I don't think "bronze/silver/gold" is as understandable as "OK/good/great"

JF: But we need to start with numbers. We are a technical standards organization.
... Bronze / Silver / Gold was our staring categories. We need numbers as basis for this rating.

<JF> @Peter - the Bronze/Silver/Gold (riffing off of the Olympics) scales better of i18n - which is why I believe those terms were chosen

Rachael: The unit question is import for the test set. We used that in draft. I agree we have to specify th unit by ste.

For something that can automate, you might be able evaluate numerically.

<pkorn> Thank you Alastair.

We have to evaluation on a large scale basis.

Alastars calls time.

<alastairc> Looking out for a scribe...

<alastairc> scribe:kirkwood

<Zakim> JF, you wanted to ask about revisiting the last resolution - not all requirements are equal

<CharlesHall> +1 to JF. had said same in IRC

JF: we weren’t going to rate by disability, but the points like flashing would not be as critical, concerned about the value
... wasn’t sure if it was evaluated

AC: it did come up

<jon_avila_> We didn't say all things would be equal - we have Level A, AA, and AAA today.

AC: disabiity categories and functional outcomes. some of scoring methods
... . wanted to go through an example
... to see where same and where differnces

JF: fair enough

<Rachael> https://raw.githack.com/w3c/silver/TestingPages-May20-js/prototypes/TestingDummyExamples/index.html

Rachael: a link to test pages to compare things, trying to do a reqal world but went back to these… screen sharing

<JF> @ Jon, an A SC and an AA SC are equal for most conformance requirements today - all must pass or it is a failure.

Rachel: login page showing with problems
... problems listed below

<JF> @Jon additionally, I thought we were abandoning A, AA, AAA

Rachel: task is to recover password
... will talk about some of the tests and decisions to make as a group
... needed to be able to test things broke down SC adn ACT rules. when wasn’t one made it up to have conversation
... units used to give an idea about knds of decisions
... going through making desiion about number of units

z/decision/decision

<jon_avila_> I'd consider the disclosure triangle to be an image - or non-text content.

Rachel: no multimedia on page describing headings having accessibile name, this got one hundered percent as an example, i’m highlighting problems
... showing failures on the path as different tests have different ways of rating
... if have 3 levels it would have 33 pe3rcent etc if we stay at percentages
... if adjectival 1-5
... stop for questions at end

Gundula: text views not images of text was chosen as pass fail don’t understand the rational

Rachel: valid point
... this is for illustration

Gundula: button detected as image, should it really requirment for image apply here?

Rachel: i agree it is a debate on how apply but beyond scope for today

AC: at test level variety of way to score things
... how can rolll up contribute to high levels for a score

Wilco: is ther an example where we coudnlt express a percentage?

RM: any time content as a block good for adjectival
... language of page works well
... how do you do a percentage for a div
... each needs to be debated

JF: feel web centric how would apply to different contnent, emplates, pdfs and fundamental difffernt types of content

<Zakim> JF, you wanted to ask about non-web content

RM: different tests apply based on technology but mapped to guideline

JF: each pass fail?

RM: noi binary dajectival or percentage

JF: struggling to see how it work for repeatble
... adjectival is subjective
... don’t see how testable and repeatable, requires human evaluator inserting need of subject matter experts

AC: lets get back to that

RM: if nomalizing decide on percentage or adjectival all the way through
... keeping the not present concept
... heres an SC failed, passed or not present. if assume not present shift scores upward

<JF> So one way to add points is to test for content that isn't there, or shoehorn in content that isn't required, but gains more points?

RM: if adjectiveal this score set not presnt up to a 3. then final score in examples shown
... diffent options and differnet decision points we need to have
... John you could show differnece3s in your model

AC: john is ther a stright forward explaination of differnces

JF: not enough data to rebut it
... concerned about if its not there then we don’t count it
... concerned about not accounting for stuff that is not there leqaves opportunity to game it

AC: model that Rachel showed if not there doesn’t count against you

JF: thats my point. if trying to acrue more points could put a foreign word and lang attribute i would get a point

AC: not the case dont’ think

<jon_avila_> If you don't get a pass for not having something then it means the other items each count for more. We shouldn't negatively impact people for not including timeouts, etc.

JF: in qan adding model want to do more things to game system to get more points

<GN015> fully agree to John

RM: question on table

<Detlev> That's the benefit of a subtractive scoring system N.A. makes no difference

JF: i’m proposing having a maximum score and minimum score

AC: additive versus subtrqactive scoring

WF: if you can break down to certian answers. strongly prefer stick to pass fail
... can frame to individual questions

CA: comment to that our requirments document says mulitple way to measure beyond true false, one of the mandates of Silver
... does your proposal have non binary tests? and quality such as alt text quality?

JF: in terms of quality not yet
... add up total images if 9 out of 10 passed get score of 90 percent

<Chuck> The requirement can be found here: 3.1 Multiple ways to measure

JF: haven’t though about individual tests as much as how to use scoring across idffernet veriticals

s/idffedrnet/different

JF: vr will be different then PDF for example
... use numbers as starting point

AC: both models cover having differnt methods and tests for differnt technologies

JA: support passing when don’t have something like no timeout

<Zakim> alastairc, you wanted to talk about binary testing and levels of test

JA: if you get pass rather than NA should get credit for avoiding issues

AC: talkin to wilco’s point of binary pass fail
... testable method level might be wraps up to adjectival score for guideline but don’t think we have gotten to the level
... rollup into adjectival is my understanding

<jeanne> Jon, the problem is that when you normalize, including the nat applicable articficially lifts the score. If you omit it, then the other scores count a little more, but it's no where near as big a change as including them. It's something that needs to be decided with validity and sensitivity testing.

DF: don’t quit understand issue. easily endup one failure and several passes what do you do?
... any failure fails conformance, i give levle 3, useful to have intermediate levels

<pkorn> Question for Rachael - do you see this model as carrying forward each WCAG 2.x SC as-is, with either pass/fail or subjectvie, or do you envision collapsing related A/AA/AAA SCs into an adjective range?

;)

PK: referring to his question above

RM: the second
... some groupi9ngs adjectival some go to contextual test which we haven’t talked about yet
... all important discussion points

<Rachael> +1 to simplification

PK: don’t underestimate value of simplification

WF: want this idea of mulitple point sto measure something
... WCAG doesn’t specify how to measure, always come down to a true or flase question
... reason get percentage like page title is true or false

<alastairc> q/

WF: do we want to specify to that level

<Detlev> you can havel ess than perfect-page titles that you would not want to fail = 75%

AC: if you had alt text how would it be true false for good bad or reasonable alt text

WF: that is four questions you gave, each true false at that level

<Zakim> jeanne, you wanted to answer Wilco on how we got to multiple ways to measure

WF: that is the level of specificity you need

JS: when Silver id 18 moths of research in partnership with academics we developed problem statements and this was a key problem state ment was that binary true false was preventing some types of advice in guidelines particualraly COGA
... we asked leaders and key things that cam out of report from design sprint having more fliexiblitity in testing was determined to be a need

<alastairc> I think this is the design sprint report? https://www.w3.org/community/silver/draft-final-report-of-silver/

JS: wanted to give a mnore accurate reflection of how accessibile they are rather than binary having seen all the research
... binary would let down stakeholders particularyly COGA

AC: scoring on continuim type method might be needed

AG: whole point is using all that work to support this. more and more atomic which are binary but roll up inot others

<JF> +1 to alistaIR

AG: no way need to be more subjective, need more automated more tests

JF: support need for higher order tests
... can still be broken down to true false
... not seen any of these types of tests though such as task completion
... granular issues, but both will be pass/fail such as findable help

WF: think John expressed it well

<JF> +1

WF: more granular tests is the way to do it. otherwise creates ambiguity and diffeicult to repeat

AC: not much disagreement don’t think anyone is objecting to more granular tests
... in terms of reporting it at guideline level or some other level that is where it becomes useful
... as opposed to instant fail you have one heading wrong
... appreciate wish to atomate more unitl computer knows what appropriate is don’t know how that is possible

PK: i wonder we have gone back and forth pass fial and roll up to percentage

<jeanne> +1 for public feedback

PK: would be comfortable to get feedback for this question, do we need to settle before first public working draft

AC: need consensus on point, could go all the way through adjectival, percentage, points

<Rachael> in this model, plain language is a contextual test not an atomic test

WF: not agreemnt of granular test we are sure about, such as plain language one which came to ask a plain language expert
... i would like to avoid that rather than asking someone else to rank
... reason for COGA requirments we don’t know how to test things well

DF: atomic tests not against them, bit concerned to move to level of guidelin like 1.3.1 which is very broad
... i support what Shadi says useful for more granualar requirments like controls separate from images
... if unit wsas somewhat smaller thaen guideline

<alastairc> q

AC: good point
... on Detlev last point it is somewhat arbitrary

<jeanne> We haven't talked about functional outcomes as the more granular requirements

AC: …. …

;)

<Chuck> chair: Chuck

<Rachael> scribe: Rachael

<alastairc> scribe: alastairc

Rachael: Forthe guidelines, there is a balance point between short enough & long enough. I tried it from an SC level, and then by grouping by similar functional outcomes.
... breaking the guidelines down is a conversation we need, but before we get there we need to establish the scoring method to gather real data.

<Rachael> scribe: Rachael

alistair_garrison__: Guidelines are already broken down. We would be interested in testing them. What are sufficient techniques for ... Noone has mentioned them.

Detlev: My impression regarding the sufficient techniques are that they are patchy and that there are other ways of meeting SC that are not documented. In the past we have heard more warnings about techniques being normative. Not sure if that helps.

<Zakim> alastairc, you wanted to comment on using sufficient techniques

<Wilco> +1, I'd love to talk about what will and won't be normative

alastairc: Buidling on what detlev said, the sufficient techniques are examples of ways to pass the SC. If you want to have all sufficient techniques of meeting all SC that is a huge task.

<JF> +1

<jon_avila_> We do have general sufficient techniques that cover SC.

Detlev: You have the same problem with the number of tests to work on WCAG 3. Once you've applied techniques, then your customer can at least apply them. Its the combination of sufficient techniques with avoiding failure techniques that gets you there.

<jon_avila_> We always document in conformance reports supporting techniques to show how conformance is supported.

alastairc: I still don't think that is a reasonable way forward. We really struggle with creating failure techniques because the path is so narrow. That makes me think we need a level above the atomic techniques. What you have to do to normatively conform.

Detlev: Question for Rachael. You mentioned functional outcomes. My impression was that functional outcomes are linked to paths or tasks. Not intermediate level. Is that correct?

<alastairc> Rachael: If you go back to the original pres, the doc structure. They aren't related to paths, they are related to functional categories.

<alastairc> Detlev: Ah, ok, I misunderstood.

<alastairc> Rachael: (Shares screen), the diagram explains the document structure.

<scribe> Scribe: Rachael

Chuck: Wilco had question for Jeanne. How far off are we?

Jeanne: I am here. I am not sure I remember enough about Wilco's question to have context for answering.

alastairc: I asserted that it isn't that different because granular tests get bundled. Wilco was asking about plain language testing which he summed up as you'd have to ask an expert. Question around what the level of reliance in the atomic testing would be in Silver.

Jeanne: Some of the critiques of plain language prompted the group to go back and do more work on the testing and functional outcomes.
... to make it more easily testable by people who did not have a large amount of editorial skill. We also included automated tests as appropriate.
... ultimately its not something easily tested by automated tools. When we wrote adjectival rating for it we defined the tests for each category.

<kirkwood> Reading level can be automated.

I don't see that as something that ACT could write atomic rules for outside of the automated tests. It's not designed for the automated tests.

Wilco: ACT is not just about automated tests. It's accessibility testing period. I asked because ACT does go down to the granular level. It wants a true/false out of what you are doing. Is that the level you envision for everything or not?
... sounds like not.

<Chuck> acl ala

alastairc: Just to Wilco, I think we do in Silver account for different types of guidelines. Some might be similar to what we have now. Some others might be different and go into the contextual testing concept that we should likely talk about soon. How they are presented at the guideline level should be points, adjectival or whatever but we should allow for other types of guidelines.

Chuck: In the point of plain language, my impression is that you can get tests down to granular level. I'd like to see the tests at the granular level in action.

Wilco: I just dont' see the contridication. If you have images and instead of having pass/fail you want a rating from amazing to really bad you can still break it down into atomic tests. What is amazing? Really well written.

+1 to Wilco's point.

<jon_avila_> You have to break it down - it's a rubric

scribe: I don't see why you couldn't have that and still get more complex, higher level requirements.

<jon_avila_> A rubric is the way to adjectival describe something in a repeatable way. I agree with Wilco but it comes down to a pass fail.

Detlev: I think we should make a distinction between discrete values and discreet tests. Does an image have an alt text are binary. You need the second part to make a judgement - is the label descriptive enough? Is the error message descriptive. They are not binary. They may be descrete at an adjectival rating. that is possible but they are not binary. That creates a lot of tension. I don't see how those can be broken down into binary

decisions.

<Zakim> jeanne, you wanted to say that breaking down the questions can be done by the tool vendors. It's probably harder for manual testers to break it into atomic tests.

<pkorn> +1 to Jeanne's characterization of what is most imporant.

Jeanne: I think for Wilco's idea about breaking down the questions, for a tool vendor it would be a really useful thing to do and it makes sense to me for a test for a tool. We need to look at the goals. The testing companies are very important but our top priorities are to make content more accesible for people with disabilities and help developers do it right the first time (at least to me). Testing consultanties are extremely important but

if we focus too much on what makes it very very testable, we start to lose what is understandable . I would prefer err on the side of how to make it accessible and understandable vs what makes it testable.

Wilco: ACT is not just for tools. I think that the methodologies exist is a testiment to WCAG's success but also to it's failure to clearly define what is needed. As for Detlev's point about binary questions, the ACT taskforce tackled it by making a distinction between objective and subjective.
... Subjective is fine. There are many subjective tests.
... you can still boil the subjective tests, the judgement calls down to a subjective level.

<Detlev> but you may not be forced to pass/fail if you have a range lie 1-5

<JF> +1 to Jon

jon_avila_: In order to have consistency of test results, we have to have a rubric to help people categorize. What is a pass and a fail. Then those pieces can role up into adjectival terms that people understand. You need the rubric to address the consistency. To jeanne's point, that is why the suffucient techniques are so important. When someone fails something, we can point to a sufficient technique. Our goals are to help developers and

testers test accessibility and have results be credible.

<jeanne> +1 for credible.

<kirkwood> +1

<Zakim> JF, you wanted to note that not just tool vendors - regulators too

<alastairc> Done well, improving a score should improve the accessibility...

JF: Noone is going to argue the goal. As to credibility and testability. One of the goals we have is to support regulatory environment. Testable, measurable, repeatable. If regulartors don't buy in, people will stick with 2.x and we don't move the ball forward. As much as we want to support users. This ability to test and measure is key to adoption.

<jeanne> +1 to getting input from regulators and FPWD is the best way to do that.

<alastairc> Rachael: I feel like we're arguing the exact same point? We agree the ACT tests & format, granular, is useful and needed for regulatory. When you get to that level it is generally pass/fail, but it sounded liked happy with adjectival at a higher level?

Chuck: Did I hear correctly that ACT can support subjective tests?

<pkorn> I'm sorry John - that is putting the cart before the horse. The top priority shouldn't be that regulators adopt it; it is that it substantially advances the a11y of the web. Adoption is an important tool in that, but so is ease of developer understanding/adoption.

Wilco: yes.

pkorn: The emphasis on regulators is putting the cart before the horse. We want the top priority to be substantially advancing the accessibility of technology. Regulatory environment is a part of that but so is developer adoption and needed accessibilty criteria. Central focus on testability does nto advnace these.

<jeanne> +1 Peter

<alistair_garrison__> testable, repeatable, measurable +1

<GN015> +1 to Peter

JF: The tests need to be repeatable so the regulators will take it. When regulators get in the mix, we started to see progress.

<AWK> Agree that the emphasis can't be only on the regulators, but do agree that the standard needs to be testable/measurable/repeatable

JF: its not the end all and be all but is an important consideration.

pkorn: But testability should not stop us from moving forward.

<jon_avila_> Developers need a way to know they meet the requirements.

kirkwood: Having been in both perspectives, I strongly believe we have to continue these guidelines in a way that the regulatory environments will pick them up. the government agencies adopt and forced the contractors to use it. If we don't have the same type of structure I don't think we meet hte structure.

<Zakim> jeanne, you wanted to talk aobut the Silver research with regulators

<kirkwood> “undewrstandble” is the techniques

Jeanne: We talked with regulators and they were more concerned with people understanding what to do.

<Chuck> straw poll: Use ACT format including a way to address subjetive and adjectival content as a way forward. +1 for using ACT, -1 for not using ACT

<jon_avila_> +0

<bruce_bailey> +1 for using ACT

<pkorn> Question about the straw poll language

<Ryladog> +1

<pkorn> Thank you Detlev. I think "use ACT rules" needs a bit more clarity

<jeanne> +1 to include ACT rules, but not exclusively ACT rules

detlev: I think its not sufficiently clear to use ACT rules to cover all the things. Where it is useful to address an adjectival rating is that the best approach. I Think it needs more context. The straw poll doesn't make much sense.

Wilco: second that

<JF> +1 to Detlev & Wilco

Wilco: I think it can.

<CharlesHall> my opinion is that the ACT Rule Design format can be used

Alastair: I'm not sure we have enough disagreement to have a poll. I think we all agree that we use ACT format as a method of testing a guideline but not every guideline have to have ACT methods, but maybe what we are trying to do is aim for guidelines to be able to broken into ACT style rules at least at the atomic testing level.
... which I'm thinking in the 3 types of testing proposal.
... but there is also a contextual level above that which we would not be usign atomic tests on.

<jeanne> +1 to context of the page

Detlev: I think the difficult thing is that there are a lot of contexts in testing. There are so many contextual pieces you have to take into account. I don't see how you capture that complexity as far as impact to user.

Wilco: I don't think its a question of if these things will be broken down into atomic tests. If we don't, other organizations will.

<Zakim> Chuck, you wanted to say that I think Silver has engaged ACT.

If we don't, and we are supposed to be the people who know how to do it, others will make a mess of it.

Chuck: I believe that Silver has engaged ACT and wants to leverage ACT tests where it fits. Where I think we are struggling, is where subjective tests fall.

<alastairc> Poll suggestion: Our aim is that atomic testing (as per both models) should have defined atomic tests (like ACT)

Jennie: From a government perspective, some organizations could choose to pick portions of it as a compromise.

<AWK> +AWK

<alastairc> (But there is another level on that, the contextual testing)

<kirkwood> +1 to Jennie

JF: I think you've put a fine point on my concern. For large entities that are driven by government regulation, the minute it is ambiguous they will leave it out.

<jon_avila_> If there are no test procedures people will make up their own and they will not be consistent - some not being good for people with disabilities.

JF: what will be left out is the higher order. Being able to explain how to test for something is how we explain what we are looking for.

<Chuck> +1 pkorn

<jeanne> +1 Peter

<Zakim> AWK, you wanted to say that Jennie's perspective is exactly why we need to make sure that the testability is clear.

Peter: I think we are spending a lot of time talking about the important hypotheticals and we already have a massive amount of ambiguity in our exisitng SC. I don't think a test that lays out what a good, great, awful alt text is changes what we have now. I think we will do better going forward is to have examples in the FPWD that we can ask questions for feedback about rather than talking about hypotheticals.

AWK: I think Jennie's point highlights the importance of testability.

<jeanne> +1 we can find ways to test things

AWK: harmonization of standard worldwide is important. Its not the end all, be all but we can find ways to test things even if we get creative about how to get it done.

<Zakim> sajkaj, you wanted to remind ourselves we're not going from w.w straight to 3.0

sajkaj: We are talking about a FPWD which will be hopefully 30% (optimistically) what the final will be. There is a lot of time between now and then. Glad we are talking about it at AGWG now. Glad 2.2 is out for wide review.
... if you look at silver, there are tests for every guideline statement involved. The question is whether they are appropriate. This last half hour is us getting ahead of ourselves.

<Chuck> Poll: Our aim is that atomic testing (as per both models) should have defined atomic tests (like ACT)

<Zakim> AWK, you wanted to say that the goal shouldn't be to be as testable as WCAG, but to be more.

AWK: The point I was reminded of when Peter was speaking. Everything isn't perfectly testable with 2.x. Yes, agreed it is a problem which is why we want to address it in 3.0.

<Chuck> Poll: Our aim is that atomic testing (as per both models) should have defined atomic tests (like ACT)

alastairc: I'm struggling to hear disagreement. My poll was basically saying both proposals have the atomic layer of testing. We are all agreeing that should be reliably testable. IS anyone objecting to that?

<pkorn> AWK - I would modify your paraphrasal of my earlier comments. Everything isn't perfectly clear/unambiguous with existing SCs. Adjectives can be as easy to explain as what we have in many SCs today.

<Chuck> Poll: Our aim is that atomic testing (as per both models) should have defined atomic tests (like ACT)

Wilco: The point of this. Right now there are a lot of large international companies scrambling to test websites using french audits. That is the thing we want to avoid. Now a country has a variation on WCAG. Similar but different enough to require a different audit. That is why the deeper layer is so important.

<AWK> Peter - I may have misunderstood you then. I didn't think that you were saying that everything in 2.x was unambiguous.

Chuck: Do we agree we should have defined atomic tests?

<Ryladog> +1

<pkorn> +1 Sure.

<Wilco> +1

<JF> +1 to defined atomic tests

<AWK> +1

<Chuck> Poll: Our aim is that atomic testing (as per both models) should have defined atomic tests (like ACT) +1 agree

<kirkwood> +1

+1 is agree, -1 is disagree

<Chuck> +1

<alistair_garrison__> +1

<Jennie> +1

<bruce_bailey> +1 for atomic tests, np

<CharlesHall> +1 as part of an overal list of possible tests

<jeanne> + 1 to have defined atomic tests but that we also include other tests

<sajkaj> +1 where we can reasonably get them on our timeline

<Detlev> +1 provided we agree to see atomic test results in context (in terms of user impact)

<Melanie> +1

<pkorn> But we need to have a much more interesting question around the larger structure of our tests (e.g. the approach Rachael laid out)

<jon_avila_> +1

<maryjom> +1

<Ryladog> +1 for Jamison

Wilco: the point of this agreement is to what extent we have this. We should have the intent to cover all of it.

<sheri_byrne-haber_> +1, as long as the atomic tests can be automatable at the discretion of the person certifying

JF: What wilco is saying is that even the most sophisticated tests can still be broken down to a series of atomic questions.
... we can construct complex questions but at the foundation is still an atomic rule.
... other types of tests need to be broken down into the building blocks and that is where we are not in the same page. That somehow we will need to have another form of test that can't be broken down.

alastairc: That is why one of the proposals had two layers to it. I would like to ask if that has progressed. I think we are all agreeing to the atomic level, but if that gets you to the WCAG 2.x equivilent then we have long talked about another layer on top of that. Contextual testing and holistic testing was talked about.

<Detlev> True/False atomic test often useful but usually comes in tandem with a subjective test

<alistair_garrison__> Cover all SCs by atomic tests +1

JF: True or False?

alastairc: You can approach it that way but it doesnt' look like that kind of test to me.

Chuck: I am not certain that every single measure can be broken down into True/false questions.

<Detlev> +1 to Rachael

Chuck: I am not sure it can be broken down into true/false.

<kirkwood> +1 to Rachael

<alastairc> Rachael: Everything could be broken down into true/false, I'm not sure it would be helpful to do so.

JF: Useful exercise would be to discuss those kinds fo tests.
... then we have a better idea of how to break it down.

<Wilco> +1 I think we need to do that exercise

<jeanne> We have examples in Clear Language that are a rubric. I hope it will be done next week or the week after.

Chuck: Reads Jeanne. Just to clarify it sounds like they are still in development.

Jeanne: Some are developed and received critique from AGWG back in Feb and are being worked based on those.

Chuck: 5 minutes left. I'm not quite certain how to close this out.

<Detlev> I'd also be interested how existing SCs like 2.4.1 bypass blocks can be succinctly covered in a series of atomic tests that would result in pass/fail decision

Chuck: it looks like we are leaning towards seeing an example. I would like to see an example of those broken down into true/false.

Jeanne you have one where you can actually show us. I don't know if there is anything in development in ACT.

Wilco: IMG elements that are subjective

Chuck: They are broken down.

Wilco: yes

<Wilco> https://act-rules.github.io/rules/qt1vmo: Image accessible name is descriptive

Alastair: We haven't gotten to the point about normalization vs points. I think the next steps is to flush out a few examples. I think we should look at that and have a small group get together and pick one. It's relatively easy to change later but we need an example that works.

Chuck: we can reconvene and decide where to go from here.

Alastair: topics for next week.

<jon_avila_> Thank you everyone

Chuck: Great for AGWG to be involved and hopefully we can continue this.

- DRAFT -

Silver Deep Dive

11 Aug 2020

Attendees

Contents

Intro & Scribe

Approach for Today’s Deep Dive

Revisit MVP

Discussion on Representative Sampling

Scope & Definitions

Guideline Scoring Issues

Summary of Action Items

Summary of Resolutions

Scribe.perl diagnostic output