12:58:53 RRSAgent has joined #ag 12:58:53 logging to https://www.w3.org/2021/04/29-ag-irc 12:58:55 RRSAgent, make logs Public 12:58:56 Meeting: AGWG Teleconference 12:59:16 present+ 12:59:34 Present+ 12:59:35 Present+ 12:59:42 agenda? 13:00:04 agenda+ Meeting intro 13:00:14 scribe:ChrisLoiselle 13:00:32 present+ 13:00:32 present+ 13:00:36 present+ 13:00:37 agenda+ What types of tests to include? 13:00:49 zakim, who is here? 13:00:49 Present: JakeAbma, Lauriat_, JF, ChrisLoiselle, jeanne, Rachael 13:00:51 On IRC I see RRSAgent, Zakim, JF, SuzanneTaylor, JakeAbma, Lauriat_, sajkaj, join_subline, LisaSeemanKest, jeanne, shadi, Jemma, alastairc, jamesn, jcraig, MichaelC, ChrisLoiselle, 13:00:51 ... hdv, Rachael, joconnor, AWK 13:01:11 Jennie has joined #ag 13:01:26 present+ 13:01:27 MelanieP has joined #ag 13:01:49 Makoto has joined #ag 13:01:55 I can do both blocks if I get a 5 min breather :) 13:02:05 present+ 13:02:07 Rain has joined #ag 13:02:14 present+ 13:02:26 *I can scribe at hour 2 13:02:31 AbiJ has joined #ag 13:02:59 Alastair: Session 1 - Testing . Shares slide presentation in zoom meeting. 13:03:08 Etiquette for this meeting - using queue, using topics on queue "to say, to ask, to [anything]", keeping points short, avoid metaphors and allegories 13:03:15 present+ 13:03:23 Francis_Storr has joined #ag 13:03:24 Context of meeting and short summary of AGWG and Silver Merge plans 13:03:26 Ben has joined #ag 13:03:30 What types of tests to include? 13:03:34 Which tests to include in conformance? 13:03:44 present+ 13:03:46 JustineP has joined #ag 13:03:48 present+ 13:03:49 present+ 13:03:55 present+ 13:04:00 jaunita_george has joined #ag 13:04:02 MelissaD has joined #ag 13:04:04 Present+ 13:04:33 ... Please reference W3C code of Ethics and professional conduct if need be. 13:04:52 present+ 13:04:56 https://www.w3.org/Consortium/cepc/ 13:05:00 Fazio_ has joined #ag 13:05:04 Azlan_ has joined #ag 13:05:05 present+ 13:05:13 Zoom info <- https://www.w3.org/2017/08/telecon-info_ag-ftf 13:05:18 present+ 13:05:18 https://www.w3.org/WAI/GL/wiki/Meetings/vFtF_2021#Session_1_-_Testing 13:05:21 present+ 13:05:28 Agenda <- https://www.w3.org/WAI/GL/task-forces/silver/wiki/Main_Page 13:05:30 Regina has joined #ag 13:05:47 +AWK 13:05:48 Alastair: We have a mix of people who have been in Silver and AGWG. 13:06:05 ... FPWD is a starting point not the end point. 13:06:46 laura has joined #ag 13:06:51 ... We can use guidelines as examples to talk through things, we aren't updating guidelines today. We will be informed of solutions to the issues per our discussions today. 13:06:59 Could you define the acronym used in the previous slide? 13:07:03 present+ Laura_Carlson 13:07:32 ... declaring scope is important , testing is conducted against scope. 13:07:53 present+ 13:07:53 Thanks! 13:07:57 FPWD: First Public Working Draft 13:07:58 ... FPWD = first public working draft 13:08:11 AGWG = Accessibility Guidelines Working Group 13:08:20 Sheri_B-H has joined #ag 13:08:22 bruce_bailey has joined #ag 13:08:22 Thanks JF! 13:08:27 present+ 13:08:31 Thanks JF! 13:08:33 present+ 13:08:46 Detlev has joined #ag 13:08:52 present+ 13:09:05 Alastair: AG will be focusing more on 3.0 , smaller group to work on WCAG 2.3 . Joint meetings with goal of merging in near future , date to be determined. 13:09:15 i can scribe for our 2pm slot 13:09:21 ShawnL: That covers the agenda very well. 13:10:01 Alastair: Talks to schedule Session 1 - Testing, second two hour session Session 2 - Scoring then Session 3 - Conformance 13:10:57 slide url again pls ? 13:11:00 ... Overarching themes - simplicity and objectivity with need for flexibility around functional needs. 13:11:12 https://docs.google.com/presentation/d/1eUbNUGFaqbI87tx7vVMvDwxT8GNsAHD-SCWYddgEWEE/edit#slide=id.gd5abc90fa9_0_10 13:12:15 Alastair: Speaks to WCAG 3 Structure, Guidelines, Outcomes, Methods and how that relates to Functional Needs. Functional Needs describes a gap in one's ability. 13:12:22 Functional needs draft if needed: https://www.w3.org/WAI/GL/WCAG3/2020/functional-needs/ 13:12:41 KimD has joined #ag 13:12:58 ... Requirements review - Multiple ways to measure, flexible maintenance and extensibility, multiple ways to display and technology neutral. 13:13:05 Full requirements document if needed: https://www.w3.org/TR/wcag-3.0-requirements/#requirements 13:13:29 present+ 13:13:39 ... readability and usability, regulatory environment, motivation and scope. 13:13:55 q? 13:14:04 Q+ 13:14:08 q? 13:14:11 ack me 13:14:37 johnkirkwood has joined #AG 13:14:45 JF: On slide 7, guidelines , outcomes and methods. Anything on testing? 13:14:47 q+ 13:14:51 q- 13:14:56 Alastair: Methods talk to testing and scoring. 13:14:59 q+ to say that slide was mainly to synch vocab 13:15:14 ack Rachael 13:15:14 Rachael, you wanted to say that slide was mainly to synch vocab 13:15:22 JF: Testing was not present on the main slide, just wondering where that was in context. 13:15:32 kathyeng has joined #ag 13:15:37 Rachael: We were looking to streamline on words used to describe context of conversation. 13:15:42 JF: Ok, thank you. 13:15:50 present+ 13:16:32 Alastair: What types of tests to include - Granular testing and Holistic Testing . Granular could be subjective but clearly defined tests. 13:16:53 q+ to clarify the "people in seats" testing note 13:17:13 Holistic Testing - includes but not limited to maturity model, people in seats , Assistive Technology (AT) , Heuristic testing 13:17:21 Wilco has joined #ag 13:17:28 present+ 13:17:32 Jeanne: The heuristic testing was included in FPWD 13:18:04 For those interested in following the Maturity Model Testing, here's our working doc https://docs.google.com/document/d/1Y5EO6zkOMrbyePw5-Crq8ojmhn9OCTRQ6TlgB0cE6YE 13:18:41 Jeanne: Say we are testing a small website, we would run the tests that are included in the methods for each of guidelines, they'd get a score for the test. They'd take that score and apply it to outcome, which includes a scoring level. This gives a similar rating scale to compare. 13:18:44 present+ 13:19:16 ... when an org is testing, they need to test for critical errors. 13:19:21 Fazio, could you email the document to me? jaunita_george@navyfederal.org -- I sadly can't use Google Docs on this machine 13:20:04 ... if it prevents users from completing a task, it fails. 13:20:49 Here is the information on critical errors from WCAG 3.0 FPWD https://www.w3.org/TR/wcag-3.0/#critical-errors 13:20:53 mental fatigue 13:21:01 Errors located anywhere within the view that stop a user from being able to use that view (examples: flashing, keyboard trap, audio with no pause); 13:21:03 also called cognitive overload 13:21:07 Errors that when located within a process stop a user from completing a process (example: submit button not in tab order); and 13:21:09 q? 13:21:13 Errors that when aggregated within a view or across a process stop a user from using the view or completing the process (example: a large amount of confusing, ambiguous language). 13:21:25 ack la 13:21:25 Lauriat_, you wanted to clarify the "people in seats" testing note 13:21:26 Outcome rating and score : https://www.w3.org/TR/wcag-3.0/#outcome-rating 13:21:50 Nicaise has joined #ag 13:22:16 ShawnL: On Holistic testing, the people in seats testing relates to that vs. debate around usability testing as there is different usability testing. 13:22:27 q+ to say example of heuristic testing is "provide affordances" 13:22:36 Alastair: Is it correct that nobody was arguing on the topic of granular testing? 13:23:04 Jeanne: There was one comment on less subjectivity on testing in WCAG 3 as WCAG 2 can be subjective. 13:23:08 present+ 13:23:40 Jeanne: We have 300 comments on FPWD , there were around 5 or 6 on tests not being objective enough. 13:23:51 TOPIC: Testing objectivity 13:23:57 q+ 13:24:06 ... we want to be more objective and what types of tests to include and what to do on subjective tests. 13:24:14 ack ra 13:24:14 Rachael, you wanted to say example of heuristic testing is "provide affordances" 13:24:15 q+ to ask or confirm that the presence of one critical error will result in a score of overall failure in conformance or is it possible to pass with one critical error? 13:24:42 qq? 13:24:47 qv? 13:25:01 Rachael: To clarify on Holistic Testing, we wanted to look at affordances, i.e. does a button look like a button. 13:25:28 q+ 13:25:29 Alastair: Do you think that would trigger people's comments on subjectivity? 13:25:48 Rachael: The more objective it is , the fewer the tests we can incorporate. 13:25:57 Chuck__ has joined #ag 13:26:01 ack jau 13:26:19 Q+ to note that increased subjectivity is codifying the need for subject matter expertise 13:26:26 q+ to ask if heuristics defined somewhere ? -- it is not in FCPWD 13:27:22 jon_avila has joined #ag 13:27:22 ack just 13:27:22 JustineP, you wanted to ask or confirm that the presence of one critical error will result in a score of overall failure in conformance or is it possible to pass with one critical 13:27:24 Jaunita: I want to caution on subjectivity. It reduces governments and courts to accept results due to consistency. Objectivity is key in legal world. Automation rubric scoring would be beneficial based off of inputs provided. Allowing for individuals to do this without experts , the clearer the new standard would be on new research and standard. 13:27:26 ... error? 13:27:28 I've mentioned this over the years but Neuropsychological evaluations are a good benchmark 13:27:29 q? 13:27:53 qq+ answer Justine 13:27:55 q+ to note our Challenges doc breaks objectivity/subjectivity as quantitative vs qualitative 13:28:03 q+ Fazio_ 13:28:06 Justine: Critical error topic, on alt text is lacking for all image. Does this result as an critical errors? 13:28:09 ack ans 13:28:09 answer, you wanted to react to JustineP 13:28:15 Jeanne: Any critical errors, you will not pass. 13:28:26 qv? 13:28:41 present+ 13:28:59 present+ 13:29:02 q+ to say we put off critical failures because how we handle them will vary based on the other decisions today 13:29:24 Jeanne: We did ask feedback on this. Critical errors failing entire product is an open question regarding mathematically possibility 13:29:25 q+ to say that objective doesn't mean you can't use percentages / categories 13:29:32 ack wi 13:29:37 qv? 13:29:59 Wilco: I wanted to ask Rachael on layers of subjectivity statement. 13:30:53 q+ 13:30:55 Rachael: I'm not sure how we have the conversation, I think that we can talk to automation testing has higher objectivity and manual is more subjective . There is a continuum that we can talk toward. We know we need simplicity. 13:31:00 In the FPWD, if "any image of text without an appropriate text alternative needed to complete a process" is missing alt text, that's a critical failure. https://www.w3.org/TR/wcag-3.0/#text-alternatives 13:31:18 Q= 13:31:23 ... we are looking for adoption and tolerance points 13:31:27 q? 13:31:29 q- Fazio 13:31:43 Wilco: Almost sounds like repeatability aspect of testing. 13:31:45 ack JF 13:31:45 JF, you wanted to note that increased subjectivity is codifying the need for subject matter expertise 13:31:54 Rachael: Yes, we could say that. 13:32:18 q+ to say that I've viewed adding additional test types as a way of expressing decisions made in ambiguity rather than adding ambiguity 13:32:27 good point JF 13:32:52 Plain language experts to help us make it easier for non experts to use the spec 13:33:01 ack br 13:33:01 bruce_bailey, you wanted to ask if heuristics defined somewhere ? -- it is not in FCPWD 13:33:09 That heuristic testing is at the gold and silver levels not at bronze, right? 13:33:11 JF: The more subjectivity we put into WCAG 3, the more the need for subject matter experts. Concerned on baking in need for subject matter experts. Example , plain language experts reviewing our content , we should be making it clearer before it gets to them. 13:33:44 BruceB: I am wondering where holistic is used , https://www.w3.org/TR/wcag-3.0/#holistic-tests 13:33:54 jon_avila: not yet determined, and a likely topic in the third session today (continuing from previous conversations) 13:34:58 q? 13:35:00 ShawnL : It is essentially a cognitive walk through . It is at least a walk through of overall flow 13:35:04 q+ 13:35:21 Could bronze be sufficient from a court perspective to determine conformance to laws like ADA? 13:35:23 Automated evaluation- Evaluation conducted using software tools, typically evaluating code-level features and applying heuristics for other tests. 13:35:31 ack sa 13:35:31 sajkaj, you wanted to note our Challenges doc breaks objectivity/subjectivity as quantitative vs qualitative 13:35:42 qv? 13:35:48 q- 13:36:03 https://raw.githack.com/w3c/wcag/conformance-challenges-5aside/conformance-challenges/index.html 13:36:28 Usability Heuristic: https://www.nngroup.com/articles/ten-usability-heuristics/ 13:36:36 Janina: I wanted to talk to challenges document and points to a branch copy and quantitively vs. qualitatively assess 13:36:50 probably the most well-known set of heuristics for interface design are Jakob Neilsen's "10 Usability Heuristics for User Interface Design" https://www.nngroup.com/articles/ten-usability-heuristics/ 13:37:57 ... the quantitative set could change, but the threshold could be there . A necessary threshold needs to be present. 13:38:47 ... if we can break them down , i.e. AI in alt text analysis and finding colors used in alt text, is probably not great alt text. 13:39:02 ack ala 13:39:02 alastairc, you wanted to say that objective doesn't mean you can't use percentages / categories 13:40:32 AWK_ has joined #ag 13:40:45 Q+ 13:41:11 q+ to say robustness of qualitative methods is very important 13:41:15 Alastair: There are areas that impact people with disabilities that are qualitative in nature , but with more granular testing methods may benefit them. Objectivity would be similar to what we have now , i.e. alt text of a zero , this is an alt text of 4 , for scoring. Similar to what we have now but scoring a bit differently to move toward objectivity and testing reliability. 13:41:16 q- me 13:41:21 shawn has joined #ag 13:41:22 ack de 13:41:34 shawn9 has joined #ag 13:41:42 q+ address tester reliability in defining rating scales 13:41:42 +1 to Alastair (covered what I wanted to say) 13:42:22 q+ to address tester reliability in defining rating scales 13:42:30 qv? 13:42:46 Detlev: I wanted to add on to issue on criticality and critical errors. When a person defines this as critical , it is an important part. With alternative text, it does help with running automated text. I.e. alt text in a menu missing would be critical vs. an alt in a call to action would not be deemed a critical stopper. 13:42:56 q+ to say that the quality of existing alt text is an example of where subjectivity exists in the current standard. 13:43:32 ... Leveling headings is wrong , could be annoying, but could be very important and critical 13:43:38 Neuropsyche evals are globally accepted measurements of seemingly subjective criteria but with standardized, objective methods - worth exploring how thats accomplished 13:44:02 ack chr 13:44:15 ChrisLoiselle: My comment is on the use of the word heuristic and holistic and how it relates to 3.0 13:44:20 @Fazio how does Neuropsyche evals scale across millions of web sites? 13:44:26 +1 to contextulization and breaking down into more granular requirements/checks 13:44:36 JF - millions of people would be the equivelent. 13:44:41 ChrisLoiselle: I did a search, ...conducted using software tools... we go into holistics in detail, but heuristics is listed once in the doc. 13:44:51 ChrisLoiselle: Next version it may need to be explained in more detail. 13:44:52 Is there a way scoring could be automated? Could an scoring tool be created to help people generate consistent results? 13:44:53 There are 2 types of heuristic - another means user discovered 13:44:53 many accessibility tools use heuristics 13:45:04 q? 13:45:07 ack aw 13:45:11 q? 13:45:21 Yes, and a definition of any term, e.g. "heuristic" should not use the same term in the definition! 13:45:43 ack ab 13:45:43 AbiJ, you wanted to say robustness of qualitative methods is very important 13:46:47 Pls unmute me in zoom AC 13:46:49 Nielsen article on heuristic testing https://www.nngroup.com/articles/how-to-conduct-a-heuristic-evaluation/ 13:47:20 +1 to AbiJ, and more repeatable which is important when doing product evaluations/comparisons 13:47:23 +1 to AbiJ, really looking forward to getting into creating some possible examples of this with ACT folks 13:47:24 AbiJ: I would strongly endorse qualitative and quantitively analyzing. I think that would make the guidelines more readable and accessible to people. I feel shortcoming of WCAG 2 is that it is difficult to implement. I.e. what is subjective and what is not. If WCAG 3 does have that, it would add value. 13:48:58 We will discuss conformance levels in the third session (See slide 20) 13:49:19 +1 to AWK - that sounds vaguely familiar to the feedback that Makoto brought to us from his Japanese workshop 13:49:23 AWK: I wanted to plus 1 to Janina. Conformance level conversation needs to be looked at. What Janina was talking to is incredibly important. The reality is that websites are not meeting WCAG 2. One of the questions from ITI was conformance level around programmatic testing perhaps with in combination with critical tests . Not sure the order of topics, but wanted to bring it up. 13:49:42 q? 13:49:49 ack 13:49:52 ack je 13:49:52 jeanne, you wanted to address tester reliability in defining rating scales 13:49:55 Alastair: If there are conversations about conformance, we can talk to that it this session around automation or manual testing , framing it around testing conversation . 13:50:57 q- 13:51:26 Jeanne: One of the goals we have is to improve the reliability of testing for testers. I.e. how good is alt text, what is the quality of alt text? Putting in a qualitative scale , good instructions on what makes a 2 vs. 4 score is beneficial on subjectivity issues. That would make it more reliable. 13:51:26 +1 to jeanne 13:51:34 q+ 13:51:36 q? 13:51:41 q? 13:51:43 q+ 13:51:44 ack ja 13:52:28 q+ to say there is currently a reliance on experts 13:52:35 david-macdonald has joined #ag 13:52:37 q+ to say that Silver needs more help with scoring models, because we don't have a lot of people with testing experience in the group. AGWG has that expertise 13:52:45 present+ 13:53:16 Jaunita: Can people input items in to a rubric framework that automates this process ? I.e. making people and other parties accountable is the key piece. Subjectivity and inconsistency would be a barrier to the uptake of the requirement and impact users. 13:53:17 ack wi 13:53:19 q? 13:53:20 +1 to Juanita 13:53:39 +1 to Juanita 13:53:48 q+ to say how a scale can help 13:54:04 Wilco: To Jeanne, Why would the scale help with subjectivity? Seems more work and possibility to misinterpreting and lead to more subjectivity 13:55:08 q+ to respond on scale/weighting and requirement breakdown 13:55:09 Jeanne: Our assertion is that we want to work with ACT about, when we have existing subjectivity , such as WCAG 2 , what we want to do is make it easier by giving descriptions of the bands on different levels. 13:55:30 q+ 13:55:39 ack ala 13:55:39 alastairc, you wanted to say there is currently a reliance on experts 13:55:42 q? 13:56:25 Jake: Rationales on specific subjectivity and then judge against those rationales. Up to us to come up with rationales. 13:57:09 q + are you talking to Adjectival rating - A system to report evaluation results as a set of human-understandable adjectives which represent groupings of scores? 13:57:26 Q+ 13:57:29 q+ ChrisLoiselle 13:57:49 qv? 13:58:25 Alastair: We have to undo some of the changes or results they have received from testing as they haven't understood guidelines. I.e. Nielson's heuristics are general, there are also different ones to look at. 13:58:40 A guideline can be easy to understand, easy to test, or short. Pick any two. 13:58:57 i like that ! 13:59:04 Alastair: We are aiming to be easy to understand and test. 13:59:30 q? 13:59:30 At the hour, do we have a new scribe? 13:59:36 ... the types of granular testing will be to be proved but a direction worth exploring. 13:59:38 q? 13:59:39 ack je 13:59:41 jeanne, you wanted to say that Silver needs more help with scoring models, because we don't have a lot of people with testing experience in the group. AGWG has that expertise 13:59:51 *Scribe change after Jeanne, Chris? 14:00:47 Jeanne: When the Silver Task Force looked at this, we were looking at testing, scoring and levels. This was against a time constraint for FPWD. 14:01:55 ... issue was around expertise around testing and migration of WCAG 2.1 and 2.2 into WCAG 3. AGWG has a lot more testing expertise . For example, clear words and testing of clear words. 14:02:17 ... Silver does need AG help around these areas. 14:02:23 Scribe: Jennie 14:02:30 Thanks, Jennie! 14:02:37 Alastair: Yes, thank you Jeanne 14:02:39 q+ 14:02:47 q? 14:03:06 ...I would like to work out the decision points. Are there particular issues that we can frame, and bring to some sort of conclusion. 14:03:14 present+ 14:03:27 Shawn: 1 aspect we are speaking to is whether to remove subjectivity from the guidelines. 14:03:35 LS: small meeting 14:03:41 ...A separate topic is what type of tests do we have in order to support the tests either way 14:03:52 ack ch 14:03:54 Alastair: Thank you. 14:04:13 Chuck: I want to address Wilco's question about how a scale might worsen subjectivity 14:04:24 ...We will discuss this more in the scoring part of the agenda 14:04:40 ...If most of the people were testing, sometimes we got (missed a portion) 14:04:48 ...2 different people might come to different results 14:05:10 ...We got 8 of 9 or even 10 sometimes 14:05:19 ...In our current scoring model, that is a very big difference 14:05:31 ...I am going to make a proposal that most are covered 14:05:47 ...We think that both those results would fit and manage subjectivity 14:05:51 ack sh 14:05:51 shadi, you wanted to respond on scale/weighting and requirement breakdown 14:06:07 Shadi: I also want to react to the scaling aspect 14:06:10 "most" means more than half to me my friend. Not sure that helps. 14:06:17 ...One other access is the scope of the tests 14:06:35 ...For WCAG 2 - it has subjectivity, which is fine, but sometimes the requirements have a broad scope 14:06:50 ...The text alternative requires objectives to have text alternatives and a certain quality 14:06:55 +1 to Shadi. 2 tests 14:07:06 ...this gets broken down with what Detlev was saying 14:07:24 ...We could break this down even more on functional items (buttons, links) - these could have a higher score 14:07:38 q+ to ask what the score is supposed to represent 14:07:52 ...Just by constructing the requirements in a way to be more specific on a shorter scope I think we would automatically get more of a feeling 14:07:57 ...of levels of severity 14:08:03 +1 to shadi on functional object alt being more critical than on other graphics 14:08:17 +1 Shadi to more granularity in testing. 14:08:19 ...If we put such a scale on the current requirement, I would agree with Wilco that it would provide more subjectivity 14:08:29 qv? 14:08:30 ...But breaking it down would get us further 14:08:35 +1 to shadi's comment on granularity 14:08:37 ack ch 14:08:37 Chuck__, you wanted to say how a scale can help 14:08:38 +1 to shadi 14:08:47 ...I don't think it is just the type of tests, but how the tests are constructed, and we have a problem with that 14:08:56 +1 to Shadi's point on test construction, definitely! 14:09:00 +1 to shadi 14:09:01 q+ to say the granularity was a concern raised in issues 14:09:07 ...I think good presentation you could only show the parts that you need to 14:09:11 ack de 14:09:16 I agree with Shadi that this would help to bring objectivity. 14:09:23 q? 14:09:29 Detlev: I want to address Wilco's question about scale 14:09:51 ...I think the problem is if you rely on things that can only be pass fail then there may be cases from a barrier point of view 14:09:58 ...it's a minor thing, and you would have to call it fail 14:10:04 +1 Detlev to have the user point of view 14:10:11 ...and this can be wrong in terms of the impact on the user 14:10:16 ...You have to look at the context 14:10:38 +1 to Detlev, this gets into the larger scoring and critical failure aspects of testing 14:10:40 ...When I was suggesting test cases for 4 evaluators to address 14:10:49 ...The simple, automated tests help, but you need both 14:11:05 ...You can have a check for an alt attribute, but then you need to check to see if they make sense 14:11:18 q+ to say the testing, leading to scoring should map to impact on the user 14:11:22 ack jf 14:11:31 JF: I want to go back to the statement that other fields have these kinds of subjective evaluations 14:11:43 ...How many have requirements that are being taken up by regulators 14:11:49 +1 JF 14:11:53 ...The regulators are depending on us to define success 14:12:03 I keep hearing the voice of Jamie Knight, "After my presentation, you will have a better understanding of the needs of exactly one person with autism". 14:12:12 ...I'm worried we will fall into the 80/20 rule 14:12:21 +1 on Shadi's bands with examples within the bands - non-experts are much better served by examples than by carefully constructed specialist language 14:12:23 ...After hearing Jamie Knight say (see quote above) 14:12:41 q/ 14:12:43 q/ 14:12:44 ...Bruce has talked about having different types of currencies - maybe this is something we need to revist 14:12:44 q? 14:12:48 ack da 14:12:53 +1 for multiple currencies ! 14:12:57 s/Shadi's bands/Detlev's bands 14:13:21 DM: My hope with the next major version is that it is simpler, more accurate 14:13:39 ...Some have found they don't have enough expertise, but others are asking for it to be more objective 14:14:25 ...We have discussed conflicting views: holistic, others - we have to balance this with what people want that are waiting for this standard 14:14:43 ...One thought: with 3.0 it seems like we are introducing a lot of new concepts 14:14:51 ...Let's make it more plain language whenever possible 14:14:58 ...We have a lot of ambitious goals 14:15:06 ...Would it make sense to introduce it in stages? 14:15:11 qv? 14:15:16 ...We have a flat structure, with the methods and guidelines 14:15:16 +1 to introducing it in stages 14:15:19 I have made the request to generate https://www.w3.org/2021/04/29-ag-minutes.html MichaelC 14:15:25 ...Then we migrate WCAG 2 into that new structure 14:15:31 ...Then we expand from there 14:15:45 Alastair: In terms of what we've got in front of us today 14:16:00 ack wi 14:16:00 Wilco, you wanted to ask what the score is supposed to represent 14:16:02 ...Going from the first public working draft to WCAG 3.0 - what we are trying to include for testing 14:16:21 Wilco: What are we scoring against? 14:16:32 ...It might seem better to see if it is better or not 14:16:35 shawn9 has left #ag 14:16:39 ...I think that was achieved in the 1st draft 14:16:47 ...Colour contrast, for example, uses a scoring method 14:17:01 ...It does not take into account how important a particular text is 14:17:18 ...That context is really important in establishing how high a thing should be scored 14:17:28 ...I'm not convinced we can even do that because it is subjective 14:17:37 ...I'm not sure we should even try to do that 14:17:43 +1 to Wilco, with the added observation that the current scoring isn't granular enough 14:17:46 ...Does it make sense to do scoring for everything? 14:17:54 qq+ to respond to Wilco that FPWD defines tasks, or process 14:17:55 ...Does everything need a scale like that? 14:18:06 ack ra 14:18:06 Rachael, you wanted to say the granularity was a concern raised in issues 14:18:08 Rachael: I agree not everything does 14:18:17 ...We are intending to have that conversation 14:18:42 ...I agree with Shadi's point, if we can break things down into granularity, we can have a scale to address the quality of the alt text 14:18:55 ...From issues, there is concern about breaking them down because it increases the burden 14:19:11 ...We have to think about how to reduce it when we break things down into smaller pieces 14:19:11 ack je 14:19:11 jeanne, you wanted to react to Wilco to respond to Wilco that FPWD defines tasks, or process 14:19:19 Jeanne: I want to respond to Wilco 14:19:24 shawn9 has joined #ag 14:19:30 ...The first public working draft attempted to establish that context 14:19:49 ...By having the organization that owns the product or website say this is what the user is trying to accomplish on this page 14:19:52 qv? 14:19:52 q+ 14:19:55 ...This establishes the process 14:20:07 ...That can establish how critical the particular problem is 14:20:10 q+ to mention that with scoring and conformance still to solidify, additional types of tests do not necessarily mean all tests become required. It depends on how we define conformance for that particular guideline and as a whole for the level of conformance 14:20:16 qv? 14:20:27 ...The example with alt text - if a missing alt text prevents the user from accomplishing the text, that is a critical error 14:20:47 ...But if it is in the footer, or not in the area of the task trying to be accomplished, then it is not a critical error, then it could pass 14:20:56 Q+ to ask how do we know what the user's primary "task" is? 14:21:11 ack ala 14:21:11 alastairc, you wanted to say the testing, leading to scoring should map to impact on the user 14:21:12 ...We are allowing people to have a small number of minor errors as long as they don't interfere with the process that the user is trying to go through 14:21:38 Alastair: To respond to Wilco regarding what the score should represent - I would like it to represent what level of barriers are there to the end user 14:21:55 ...That should represent how good a job has the entity done at accounting for accessibility 14:22:04 ...People talk about subjectivity in WCAG 2 14:22:15 ...I was looking at 2 sites at the same time 14:22:22 ...Both had 12 Level A fails 14:22:37 ...You couldn't differentiate between the sites, though one was generally pretty good 14:22:48 ...I would like it to map better to the impact it has on the end users 14:22:49 the simplest way to do it is to get large sample sizes and break it down by percentile. Let govt decide whats acceptable, pass, fail, etc 14:22:54 ...To try to move the conversation forward 14:23:11 ...In terms of what we are trying to add to the First Public Working Draft 14:23:27 ...The subjectivity of 3.0 should be the same as WCAG 2 in terms of the granular testing 14:23:27 the same OR LESS 14:23:29 q+ 14:23:35 it should not be MORE subjective 14:23:39 +JF 14:23:40 ...per guideline, basically 14:23:46 SHould not be more subjective 14:23:46 q? 14:23:49 ack sh 14:23:50 ...Is anyone arguing it should be much more or much less? 14:23:54 it would nice if it was less, that is good 14:24:06 Shadi: To react to Rachael on adding more requirements 14:24:15 ...I don't think we are adding more requirements - we are making it more transparent 14:24:17 +1 14:24:18 but IMHO subjective of 2x SC is manageable 14:24:24 +1 14:24:25 I agree with you Shadi, I just wanted to point out the feedback we are getting from issues 14:24:29 ...Most testers will have some sort of spreadsheet where they break things down, and test 14:24:43 ...We are making it more transparent, and making it easier for those that are less expert in testing 14:24:52 ...It is not adding requirements, it might be adding lines 14:24:58 ...By doing that, we use subjectivity 14:25:02 q+ to state that I do think that we need to expand a small amount to include tests like affordances 14:25:05 ...It depends on how you want to define subjectivity 14:25:15 ...Will we want qualitative tests for alt text? Yes 14:25:30 +1 to Shadi that the granularity improves the subjectivity and makes it more reliable/repeatable 14:25:33 ...But, dividing it into navigational elements - that already becomes more clear 14:25:36 s/+JF/+1 JF 14:25:42 ...It is still the same qualitative check 14:25:43 q? 14:25:47 ack la 14:25:47 Lauriat_, you wanted to mention that with scoring and conformance still to solidify, additional types of tests do not necessarily mean all tests become required. It depends on how 14:25:47 +1 Shadi 14:25:48 Alastair: Good point 14:25:51 ... we define conformance for that particular guideline and as a whole for the level of conformance 14:25:52 ben has joined #ag 14:25:55 +1 some SCs already require multiple test e.g. 1.3.1 so providing more differentiated tests increases accuracy 14:26:04 Shawn: I want to give a preview of the next subtopic 14:26:15 ...We still have scoring and overall conformance left to solidify 14:26:25 ...Additional types of tests does not mean all tests are required 14:26:40 ...It is more of: we want to look at the range of tests to assess how well people have met the guidelines 14:26:50 ...We also want to talk about which to include for conformance 14:27:13 ...We have a lower level of conformance than might be used in a regulartory situation 14:27:33 ...We could have levels that are more or less the same level of subjectivity, but to Shadi's point, change how you navigate that subjectivity 14:27:46 ...And then heuristic evaluations and AT tests 14:27:59 q? 14:28:03 ack jf 14:28:03 JF, you wanted to ask how do we know what the user's primary "task" is? 14:28:06 ...It is more about what we want to make available to those writing WCAG 3.0 14:28:24 JF: I would like to comment on what Jeanne said about activities - what is the primary purpose 14:28:34 ...How do we know what the process is? 14:28:45 ...It is common that in the footer are social media icons 14:28:56 ...There can be a background image 14:29:09 ...when trying to tweet the article - that may be the primary purpose for sharing an article 14:29:17 q+ to say defining process is different discussion from testing 14:29:24 Alastair: Either the organization, or whoever's doing the conformance statement defines that 14:29:32 that's what we agreed on AC 14:29:39 JF: That assumes that I am trying to do on your website. You may have broad guesses, but you may not really know 14:29:51 ...You many not have know that I wanted to share it on Twitter 14:30:06 ...When we make subjective determinations - 80% may want to read the article 14:30:13 ...We could still be failing 20% 14:30:19 ack wi 14:30:28 q- 14:30:32 Wilco: I wanted to give my perspective on Alastair's question 14:30:46 ...I don't like the word subjectivity because I think we are using it differently 14:31:01 ...Lots of organizations have additional documentation on how to test WCAG, which creates differences 14:31:11 ...The fact that they have to do that is a real problem 14:31:23 ...It shouldn't just apply to granular testing 14:31:35 ...I don't think that should be a restriction on what type of testing should be in WCAG 3 14:31:49 ...Usability testing can have a good degree of consistency if you do enough of it 14:31:53 q+ 14:31:56 ack ra 14:31:56 Rachael, you wanted to state that I do think that we need to expand a small amount to include tests like affordances 14:31:57 q+ to suggest a discussion and straw poll on what types of tests for the next draft 14:32:04 Rachael: I also want to go back to Alastair's question. 14:32:08 +1 to Wilco 14:32:14 ...I would like to see the granular tests include a little more context 14:32:20 +1 to Wilco 14:32:39 ...I think that we have a certain level of tests, example for affordances, but that do not make the cut in 2x and I want to see that expanded 14:32:40 ack al 14:33:03 Alastair: I was thinking along the same lines as Wilco - subjectivity is not the term to be using for outcomes for WCAG 3 14:33:21 +1 14:33:23 ...Could we all agree at the granular testing - all we are onboard with at least as good or better at 14:33:26 Proposal: At least as good or better intertester reliability 14:33:26 +1 14:33:26 +1 14:33:32 +1 14:33:33 -1 needs to get better 14:33:34 +1 to Alastair 14:33:37 inter rater reliability? 14:33:49 +1 to Wilco, I included that in my yes 14:33:50 +1 14:33:50 +1 trying for better 14:33:53 +1 on inter-tester reliability as a better measure of success for "subjectivity" issue 14:33:55 at inter rater reliability = or better+1 14:33:56 +1 to inter-rater reliability (as opposed to focus on "subjectivity") 14:34:02 +1 trying for better reliability metric 14:34:03 Alastair: I think that would answer quite a lot of issues that have been raised 14:34:05 Better 14:34:10 +1 14:34:11 +1 trying for better too. 14:34:21 q? 14:34:21 +1 14:34:25 Alastair: On Wilco's 2nd point, I have some disagreements about usability testing being consistent when required in a standard 14:34:29 ack je 14:34:29 jeanne, you wanted to suggest a discussion and straw poll on what types of tests for the next draft 14:34:43 draft RESOLUTION: For WCAG 3, testing will be at least as good or better intertester reliability 14:34:48 Jeanne: I also want to suggest that we wrap up this part of the subjectivity conversation 14:35:00 ...I would like to move to the next question. 14:35:21 +1 14:35:24 Alastair: I think we have +1s on Rachael's proposal 14:35:26 +1 14:35:27 +1 14:35:34 ...This is not a concrete resolution 14:35:45 ...Maybe it is something we can incorporate into the requirements 14:35:59 Draft RESOLUTION: For WCAG 3, testing will improve intertester reliability 14:36:00 ...Would anyone objective to saying improve? 14:36:02 q+ 14:36:05 ack br 14:36:22 Bruce: I think we can only say "don't make it worse" because we don't have great measures of the current inter-rater reliability 14:36:31 q+ 14:36:35 Could we work on a phased approach for WCAG 3? I think we might be trying to do everything at once -- which might hurt progress. 14:36:35 ack wi 14:36:37 Alastair: Within individual conversations, it can be good, but different orgs it may not match up 14:36:43 q+ shadi 14:36:53 Wilco: There is some research, but it is dated 14:37:00 q+ to mention Trusted Tester and The Baseline 14:37:03 Alastair: I think that was looked at as part of the Silver research 14:37:10 ...It should be fairly easy to improve on that 14:37:18 Wilco: I think it was 70% or so 14:37:23 ack sh 14:37:23 Alastair: Improving might not be difficult 14:37:45 Shadi: There is also Michael Vigo's research at the University of Manchester 14:37:59 ...We actually have concrete specific improvements in the ACT group 14:38:13 ...We have identified issues, things that can be made more granular, more understandable 14:38:14 Could you share those recommendations Shadi? 14:38:30 ack br 14:38:30 bruce_bailey, you wanted to mention Trusted Tester and The Baseline 14:38:31 totally agree 14:38:35 ...I think we already have existing improvements for WCAG 2, so we should be confident that we can continue to improve 14:38:58 Bruce: The (named an article) is flawed - the summary description has flaws 14:39:22 ...This is one of the motivations for the Trusted Tester at DHS 14:39:35 ...Their whole focus was having inter-rater reliability 14:39:43 ...For people that are new to testing 14:39:54 ...That is the entire point of the DHS Trusted Tester program 14:40:02 ...It is aimed at beginner testers 14:40:18 ...Abstracting the Trusted Tester approach, we are calling the baseline 14:40:33 ...A less objective version of the WCAG criteria that acts more like a checklist 14:40:46 q+ to add a hopefully more minor point of our setting a goal of improving, but not reaching 100% consistency 14:40:46 ...The lack of a good sense of inter rater reliability is very important 14:40:58 Alastair: Would anyone disapprove 14:41:06 +1 to Bruce 14:41:08 +1 Bruce's comment 14:41:11 ...that WCAG 3 testing will improve inter rater reliability? 14:41:41 AWK: It is ironic that we are trying to agree on a statement that we may not be able to get inter rater reliability on 14:42:06 +1 14:42:09 ...It can be aspirational, but expecting a measure on this, by a university, it may be difficult, and create a bar we cannot clear 14:42:13 Draft RESOLUTION: For WCAG 3, testing will aim to improve intertester reliability 14:42:14 q+ to respond to AWK that we have a testing group already to look at this 14:42:21 AWK: Sounds good to me 14:42:30 +1 14:42:31 ack la 14:42:31 Lauriat_, you wanted to add a hopefully more minor point of our setting a goal of improving, but not reaching 100% consistency 14:42:44 Shawn: I want to make a more minor point 14:42:56 ...While we are aiming to improve it, we are not working towards 100% consistency 14:43:07 Here is the current Trusted Tested testing script 14:43:07 ...We are not going to get 100% inter tester reliability 14:43:07 https://section508coordinators.github.io/TrustedTester/ 14:43:12 ...We are striving to improve only 14:43:14 ack je 14:43:14 jeanne, you wanted to respond to AWK that we have a testing group already to look at this 14:43:34 Jeanne: We do have a subgroup that is working on testing the spec itself, and doing these types of evaluations 14:43:41 ...Francis is leading that group 14:43:49 ...They have 5 metrics and reliability is one of them 14:43:56 ...We are just calling it reliability 14:44:02 Alastair: I think we all came to agreement 14:44:09 ...Any -1s? 14:44:22 ...This helps us create responses to lots of the testing issues 14:44:26 Q+ 14:44:42 JF: To Shawn's point - this isn't going to be perfect 14:44:48 ...Do we set a minimum baseline? 14:45:01 ...The 80/20 rule - are we going to try to qualify it? 14:45:08 Shawn: I think the baseline is where we are with WCAG 2x 14:45:14 q+ 14:45:21 ack me 14:45:24 ...I don't think we have enough sample tests to have some sort of idea of how much we can improve it 14:45:33 ...This would be good to work on in joint sessions with ACT 14:45:34 ack wi 14:45:42 Wilco: I would be in favour of at least trying to measure it 14:45:51 +1 14:45:52 Alastair: Can we put that on a yes we will measure it? 14:46:05 ...That is in our comments around subjectivity 14:46:06 Agree that measuring reliability is critical 14:46:12 JF: Can we get it in the resolution? 14:46:21 Draft RESOLUTION: For WCAG 3, testing will aim to improve intertester reliability and will work on testing to determine this 14:46:28 q+ 14:46:36 ack ja 14:46:47 JG: I want to see if there is a possibility of developing tooling that could assist with this 14:47:09 ...Is there way we can endorse or help lower the entry for those less familiar with the standard and grading to develop a more consistent score 14:47:13 q+ 14:47:18 Alastair: We had been assuming there would be tools to help with scoring 14:47:20 qq+ to talk about proof of concept tools 14:47:29 ...There is a discussion about that 14:47:32 ack je 14:47:32 jeanne, you wanted to react to jaunita_george to talk about proof of concept tools 14:47:39 Jeanne: We are working on proof of concept tools 14:47:48 ...We expect to rely on the industry to create the tools 14:48:03 q- 14:48:05 ...We received feedback that one tooling organization would like more information about the information behind the rule 14:48:10 A test suite would be helpful 14:48:15 ...Then we will not focus on building the tool ourselves 14:48:19 +1, plus our stated goal of using ACT rules format 14:48:21 +1, and also a lot of opportunity for other types of tools in this space 14:48:25 RESOLUTION: For WCAG 3, testing will aim to improve intertester reliability and will work on testing to determine this 14:48:40 q+ to propose we focus on binary and rating types of tests for the next few draft and ask for AGWG and ACT help in drafting some example tests. 14:48:43 Alastair: Moving on to which tests to include in conformance 14:48:49 TOPIC: Which tests to include in conformance 14:48:56 Q+ 14:48:58 Jeanne: This conversation has been very helpful 14:49:13 ...I would like to propose for the next few drafts we focus on including binary testing and qualitative testing 14:49:22 ...I don't have an opinion on what to call it 14:49:32 q+ to ask Jeanne if that means the first three test types? 14:49:36 +1 to that proposal, let's expand in one space at a time 14:49:36 ...Start pursuing greater granularity, more qualitative evaluations 14:49:41 ack je 14:49:41 jeanne, you wanted to propose we focus on binary and rating types of tests for the next few draft and ask for AGWG and ACT help in drafting some example tests. 14:49:45 ...And get people from AGWG to help with that 14:49:54 ack ra 14:49:54 Rachael, you wanted to ask Jeanne if that means the first three test types? 14:50:04 Rachael: you mean the 1st three test types? 14:50:10 Jeanne: I think so, yes 14:50:29 q+ to suggest alt text quality as low hanging fruit for adjectival scoring proof of concept 14:50:32 ...The types of testing we have today - introduce quantitative in the next few drafts 14:50:38 ...To build this out in more detail 14:50:41 ack jf 14:50:48 JF: I want to go back to the resolution 14:50:54 ...I'm looking for the word "measure" 14:51:06 Draft RESOLUTION: For WCAG 3, testing will aim to improve intertester reliability and will work on testing to meaure this 14:51:09 ...I want to be more granular and be able to measure we are achieving it 14:51:28 Alastair: ok 14:51:33 RESOLUTION: For WCAG 3, testing will aim to improve intertester reliability and will work on testing to measure this 14:51:37 q? 14:51:40 ack br 14:51:40 bruce_bailey, you wanted to suggest alt text quality as low hanging fruit for adjectival scoring proof of concept 14:51:41 JF: thank you 14:51:57 Subjective: For WCAG 3, testing will be at least as good or better intertester reliability 14:51:58 Bruce: I want to suggest that we do scoring of the quality of alt text as a proof of concept 14:52:12 ...to say yes we can have adjectival ratings 14:52:19 +1 Bruce 14:52:30 q/ 14:52:31 +1, and that kind of test also allows for a bit of a "walk-through" evaluation 14:52:32 q? 14:52:32 +1 14:52:34 Alastair: Yes, good area to focus on for the proof of concept 14:52:53 ...Jeanne, who could they contact to volunteer to work on that? 14:53:11 Jeanne: Makoto is leading the subgroup, but you can get in touch of me and I can make the connection 14:53:11 heuristic (likely misnamed): For controls which are necessary to move forward or backwards through a process, spacing and/or font styling are not used as the only visual means of conveying that the control is actionable. 14:53:19 Alastair: Start with Jeanne 14:53:25 I'll happily help with that as I can, with the caveat that I go on vacation in a matter of hours, so more likely will help during ACT sessions. 14:53:27 jspellman@spellmanconsulting.com 14:53:35 AWK: I would like to understand Jeanne's proposal 14:53:40 ...I also have one 14:53:53 ...Because of the discussion about tests that are subjective 14:54:07 ...It would be great for the group to focus on starting with that as the baseline 14:54:24 ...To be able to say: here's the things we think we can require that can be handled either in an automated way 14:54:33 ...Or are so important they must be tested manually 14:54:38 1 to AWK 14:54:41 ...Gold is going to be hard 14:54:51 ...It will be presumably more subjective 14:54:52 q+ 14:54:57 ack jka 14:54:57 ...We can build up from the bottom 14:55:00 ack ja 14:55:06 ack sa 14:55:07 Janina: I want to +1 that 14:55:13 @bruce, we have a starting point already: https://www.w3.org/WAI/tutorials/images/decision-tree/ 14:55:17 ...Whatever we call that set, there is a 2nd question 14:55:24 ...There is value in knowing what can be automated 14:55:36 ...We can have higher requirements - where you run those tests regularly 14:55:43 ...Then run the subjective parts 14:55:49 q+ on usability testing 14:55:58 +1 to JF's starting point idea 14:56:09 Alastair: Do Jeanne, Rachael, or Shawn want to respond? 14:56:22 q? 14:56:26 Jeanne: I'm trying to craft a question so we can include it in a later section 14:56:58 Alastair: I think it is very difficult to rely on the results of usability testing to contribute to some kind of conformance 14:57:03 question+ Should we have a structure where a site or product must meet all automated tests before doing manual or qualitative tests? 14:57:23 ...I do think it is possible to have some kind of question - have you done usability testing for a particular topic 14:57:28 ...Have you acted on the results? 14:57:34 ...Then it is not relying on the outcome 14:57:53 ...Question: Can anyone outline what format the AT testing would take? 14:57:57 +1 to "have you done useabiltiy testing" yes/no 14:58:00 q+ to speak to AT testing, from my perspective 14:58:02 ...I'm not clear how that would contribute to a score 14:58:05 ack ala 14:58:05 alastairc, you wanted to comment on usability testing 14:58:19 ack la 14:58:19 Lauriat_, you wanted to speak to AT testing, from my perspective 14:58:19 Shawn: I was thinking of it in terms of the people in seats usability testing 14:58:24 +1 to the usability testing suggestion from Alistair 14:58:30 ...Transparency in process for validating how accessible something is 14:58:35 Q+ to ask about AT testing and the test matrix (OS + Browser + AT tool X versions [current & -1?]) 14:58:43 ...It gives a framework for expressing what kinds of AT testing has been done 14:58:50 ack jf 14:58:50 JF, you wanted to ask about AT testing and the test matrix (OS + Browser + AT tool X versions [current & -1?]) 14:58:58 s/Alistair/Alastair 14:59:06 qq+ 14:59:09 JF: My concern around the AT testing - we also have to build out a matrix about what that looks like 14:59:22 +1 JF 14:59:24 ...OS, browser, etc - it gets unmanageable quickly 14:59:26 ack la 14:59:26 Lauriat_, you wanted to react to JF 14:59:32 Shawn: This onus would go on the people making the claim 14:59:43 ...We would say: here is a structure you can for recording this 14:59:51 Expertise of the user in AT can be a differentiator between AT testing and diverse user testing 14:59:56 ...that responsibility would be on the people making the cliam 15:00:05 Could we require AT support a11y standards? e.g. ARIA? 15:00:07 +1 to AbiJ 15:00:19 Alastair: We have reached time 15:00:26 +1 to AbiJ 15:00:38 +1 to AbiJ 15:00:47 rrsagent, make minutes 15:00:47 I have made the request to generate https://www.w3.org/2021/04/29-ag-minutes.html jeanne 15:01:32 zakim, this meeting spans midnight 15:01:32 I don't understand 'this meeting spans midnight', jeanne 15:15:27 Chuck has joined #ag 15:16:02 zakim, this meeting spans midnight 15:16:02 I don't understand 'this meeting spans midnight', sajkaj 15:16:27 zakim, who's here? 15:16:27 Present: JakeAbma, Lauriat_, JF, ChrisLoiselle, jeanne, Rachael, Jennie, Makoto, MelanieP, shadi, Francis_Storr, Rain, sajkaj, JustineP, Ben, AbiJ, Fazio_, jaunita_george, Azlan_, 15:16:31 ... AWK, Laura_Carlson, SuzanneTaylor, Sheri_B-H, bruce_bailey, Detlev, KimD, kathyeng, Wilco, alastairc, Nicaise, jon_avila, Chuck__, david-macdonald, johnkirkwood 15:16:31 On IRC I see Chuck, ben, shawn9, david-macdonald, jon_avila, Wilco, kathyeng, johnkirkwood, KimD, Sheri_B-H, Regina, Azlan_, MelissaD, JustineP, Francis_Storr, AbiJ, Rain, Jennie, 15:16:34 ... RRSAgent, Zakim, JF, SuzanneTaylor, Lauriat_, sajkaj, join_subline, LisaSeemanKest, jeanne, shadi, Jemma, alastairc, jamesn, jcraig, MichaelC, ChrisLoiselle, hdv, Rachael, 15:16:34 ... joconnor, AWK 15:18:49 rrsagent, this meeting spans midnight 15:33:10 KimD has left #ag 15:39:33 rrsagent, this meeting spans midnight 15:39:53 Though we actually don't! 15:40:04 i.e. end at 2300Z 15:40:10 s/zakim, this meeting spans midnight// 15:40:29 Jeanne: You had it right the first time. It's an rrsagent command 15:41:29 5-7PM Boston = 21:00-23:00 -- so, no 15:59:31 Azlan has joined #ag 16:04:55 Meeting resumes at 1700Z. Wiki info at: 16:04:56 https://www.w3.org/WAI/GL/wiki/Meetings/vFtF_2021 16:10:10 ToddLibby has joined #ag 16:14:37 zakim, list questions 16:14:37 I see 1 question remaining: 16:14:38 Q1: Should we have a structure where a site or product must meet all automated tests before doing manual or qualitative tests? (1 supporter) 16:19:04 Maybe that should be "automatable tests?" 16:55:08 Fazio has joined #ag 16:55:26 bruce_bailey has joined #ag 16:56:28 +1 to sajkaj, though a different term may help avoid that confusion 16:56:52 Jennie_ has joined #ag 16:57:08 same here if need be on scribing 16:57:40 scribe: bruce_bailey 16:58:04 present+ 16:58:07 Ben has joined #ag 16:58:17 present+ 16:58:21 present+ 16:58:23 present+ 16:58:50 present+ 16:58:51 present+ 16:58:56 present+ 16:59:12 * shocked face here 16:59:29 present+ 16:59:47 Chuck, is this the correct slide deck ? https://docs.google.com/presentation/d/1eUbNUGFaqbI87tx7vVMvDwxT8GNsAHD-SCWYddgEWEE/edit#slide=id.gd5abc90fa9_0_111 16:59:48 Azlan has joined #ag 17:00:06 https://docs.google.com/presentation/d/1eUbNUGFaqbI87tx7vVMvDwxT8GNsAHD-SCWYddgEWEE/edit#slide=id.gd5abc90fa9_0_111 17:00:07 present+ 17:00:10 present+ 17:00:22 present+ 17:00:31 Charles Adams (Chuck): Any new members? 17:00:33 Present+ 17:00:49 ... 1st session was testing, this session focuses on Scoring 17:00:58 Agenda <- https://www.w3.org/WAI/GL/task-forces/silver/wiki/Main_Page 17:00:58 Slide Deck <- https://docs.google.com/presentation/d/1eUbNUGFaqbI87tx7vVMvDwxT8GNsAHD-SCWYddgEWEE/edit#slide=id.gd5abc90fa9_0_111 17:01:10 AbiJ has joined #ag 17:01:25 .. reminder of etiquette, use queue and include topic (slide 2 in deck) 17:01:53 .. keep comments short and literal 17:02:02 present+ 17:02:06 .. overall thems, slide 3 17:02:33 q? 17:02:35 .. need for simplicity and objectivity balanced against flexibility 17:02:47 present+ 17:02:48 .. recapping from session one (slide 16) 17:03:23 ..Resolution: For WCAG 3, testing will aim to improve inter-tester reliability and will work on testing to measure this 17:03:32 ..Agreed that the framing of inter-tester reliability (or reliability for short) was a more productive way to approach testing than discussing subjectivity, since WCAG 2 has subjectivity. We want to improve, if possible. 17:03:40 q? 17:03:41 ..Agreed that AGWG members would assist the Alt Text Subgroup to work on writing qualitative evaluation that can be rated with improved inter-tester reliability. This can include breaking it into more granular tests and outcomes. 17:03:54 ..HT Shadi and Detlev 17:04:32 ..slide 17 recap, Guidelines, Outlines, methods 17:04:43 jaunita_george has joined #ag 17:04:43 Regina__ has joined #ag 17:05:18 present+ 17:05:18 q? 17:05:19 Jeanne Spellman speaks to slide 18, tabular comparison of WCAG 2 mapping to analog in WCAG 3 17:05:50 ..slide 19, conformance comparison WCAG 2 vs WCAG 3 17:06:01 q+ to note that we arrived on processes and views: "Conformance is defined only for processes and views." from https://www.w3.org/TR/wcag-3.0/#defining-conformance-scope 17:06:13 +AWK 17:06:18 Q+ 17:06:22 ..in 2, unit of conformance is page, in 3 evaluate by site or product (or logical subset) 17:06:25 ack me 17:06:37 ack Lau 17:06:37 Lauriat_, you wanted to note that we arrived on processes and views: "Conformance is defined only for processes and views." from 17:06:39 ... https://www.w3.org/TR/wcag-3.0/#defining-conformance-scope 17:06:47 Q+ 17:06:55 Detlev has joined #ag 17:06:57 Shawne Laurieat: in FCPWD we defined processes and views 17:07:13 present+ 17:07:13 .. can build up to site vs product 17:07:43 Jeanne: trying to give overview of FCPWD and some big picture concepts 17:08:15 Nicaise has joined #ag 17:08:24 present+ 17:08:31 ..returning to slide 19, SC rated by A/AA/AAA. For 3, we did not want to do that because of disparate treatment to disability groups 17:08:35 present+ 17:08:56 .. 3 does have concept of Critical Errors (three type) which are a hard fail 17:08:57 KarenHerr has joined #ag 17:08:58 present+ 17:09:03 present+ 17:09:19 ..not close correspondse to A/AA/AAA, but some similarity 17:09:38 .. 2 SC are perfection or fail, 3 has point system 17:10:12 q? 17:10:16 ack JF 17:10:18 .. 2 AA level ended up being used by regulator, with 3 we recommend Bronze for regulation 17:10:52 Currently, you can pick your pages as well. 17:10:57 .. 2 all SC are binary T/F, 3 guidelins cutomize for test and scoring as most approptiat 17:11:42 John Foliet: Concern with scoring.. 17:11:55 +1 to JF calling out the dual role of A, AA, AAA, definitely good to consider as we move forward! 17:11:58 Jeanne: Please defer until December publication. 17:12:00 q? 17:12:01 s/Foliet/Foliot 17:12:05 +1 17:12:16 +1 17:12:32 AngelaAccessForAll_ has joined #ag 17:12:37 JF: Reminder that A AA AA was based on difficulty for develpers 17:12:45 present+ 17:12:50 Chuck: Thank you. 17:13:19 s/Reminder that A AA AA was based on difficulty for develpers/Reminder that A AA AA was based on impact on users AND content creators 17:13:31 Jeanne: Lawyer from OCR wanting to know why A, AA, AAA 17:14:18 ..apparently comes up in course case, and things around AAA not getting sufficient attention 17:14:40 If anyone has a copy of that paper, I desperately need citations from it! 17:14:45 ..want to make sure that different disability categories are treated equitibly 17:15:27 equally or equitably? 17:15:35 ..Slide 20, big picture idea is to have a score at outcome leve 17:15:44 KarenHerr_ has joined #ag 17:15:44 s/leve/level/ 17:16:35 ..threshold for total score would include score by functional need / disabilitity category 17:16:35 q+ to ask about "total score" vs average score 17:17:12 .. this addesses situations like where for flashing seisures, there is basically one test... 17:17:23 ack Ch 17:17:23 Chuck, you wanted to ask about "total score" vs average score 17:17:26 https://www.w3.org/TR/wcag-3.0/#overall-scores 17:17:36 .. whereas for for useable with out vision there would be dozens of test 17:18:02 Q+ 17:18:07 ack JF 17:18:11 equally: in the same manner or to the same extent. equitably: in a fair and impartial manner. 17:18:25 Chuck: asks for clarification between total score and average scores, and FCPWD picked 3.5 as minimum 17:19:17 q+ 17:19:27 Jeanne: Correct, mostly we are looking for mean average within a functional category, but thresholds for each FPC 17:19:40 JF has a point 17:19:41 .. would not average score between two fpc 17:20:22 JF: There is a difference between equity and equality. 17:20:35 I think may help on the discussion https://www.mentalfloss.com/article/625404/equity-vs-equality-what-is-the-difference 17:20:43 q? 17:21:16 I think you're agreeing... 17:21:21 KimD has joined #ag 17:21:25 Jeanne: We want the results (of testing) to be equitability, even though the number of test for two different FPC categories is not equal. 17:21:40 Present+ 17:22:12 ack Ch 17:22:38 JenniferC__ has joined #ag 17:22:41 Article on equity vs. equality https://culturalorganizing.org/the-problem-with-that-equity-vs-equality-graphic/ 17:22:49 I want to suggest spoons problem isn't just a COGA challenge 17:22:53 Chuck: To JF, if we are to determine that a large number of small issues (spoons problem) is blocking for COGA issues just as much as with other issues in wcag 2. 17:22:59 q+ 17:23:02 q+ 17:23:32 ack Jeanne 17:23:45 david-macdonald has joined #ag 17:23:48 JF: Yes, that is part of it. Not just the impact and numbers level. But not all issues are created equal. 17:23:48 q+ to say I think we're in violent agreement; but we are still normalizing our terminology 17:23:52 present+ 17:23:58 q+ to suggest that we shouldn't try to equalise by disability, but say whether you have made a reasonable effort to accomodate 17:24:39 PeterKorn has joined #ag 17:24:44 Present+ 17:24:45 Q+ to test a hypothesis 17:24:51 Jeanne: What we are saying is that disability category impact is equitable, but not by count of number of SC (or WCAG 1 checkpoints). 17:24:56 q? 17:25:24 ..so part of the idea behind Critical Errors is a way to reflect this reality. 17:25:32 ack Ch 17:25:38 ack saj 17:25:38 sajkaj, you wanted to say I think we're in violent agreement; but we are still normalizing our terminology 17:26:15 ack ala 17:26:15 alastairc, you wanted to suggest that we shouldn't try to equalise by disability, but say whether you have made a reasonable effort to accomodate 17:26:25 Chuck: Want to say that we could be talking about this for quite a while, and we have other important issues on the table. Equity versus equality is important. 17:27:32 Janina Sajka (sajkaj): We are getting better at normallizing the language we are using for these issues. 17:27:54 q? 17:27:57 ack JF 17:27:57 JF, you wanted to test a hypothesis 17:28:51 Alastair Campbell: We have some consensus in the WCAG 2x space for setting responsibility between devlopers and site owners and what technology users bring to a page. 17:29:30 JF: Is a preference for transcripts over captioning an example of the equity that 3 aims for? 17:29:52 blind functional need: audio description. deaf functional need: captions deaf/blind functional need: transcripts 17:30:18 q+ to say I think we're all vehemently agreeing and have the same goals in this admittedly complicated space, and why we need to work through scoring in a way that support these goals 17:30:27 I agree with Jeanne. The scores will roll up to disability categories and a score at that level must be met -so while we might have more criteria that apply to one disability - the score of the roll up would need to be the same. 17:30:27 ack lau 17:30:27 Lauriat_, you wanted to say I think we're all vehemently agreeing and have the same goals in this admittedly complicated space, and why we need to work through scoring in a way 17:30:30 ... that support these goals 17:30:36 Jeanne: No. It is about count of A and AA sc for blindness versus count of AAA SC for COGA issue.s 17:31:01 +1 Shawn 17:31:02 +1 to SL 17:31:09 +1 17:31:10 +1 17:31:11 ack me 17:31:20 MelanieP has joined #ag 17:31:22 Shawn: Seems like we are in strong agreement on big picture goals. 17:31:22 +1 to moving on 17:31:24 +1 - we are all saying the same thing. 17:31:27 present+ 17:31:54 Jeanne: Coming back to slide, and scoring. There are a few ways to address this. 17:32:22 .. for example maintain binary T/F, obviously some objection to that 17:32:42 .. we could aim for rating by percentage 17:32:52 .. we could aim for likert ratings 17:33:06 .. we could have more granular point based system 17:34:01 Jeanne: Example of Text Alternatives in FPWD 17:34:06 [stepping away for ~30 min] 17:34:14 far too easy to game the outcomes 17:34:14 q+ 17:34:17 .. in document today was automated versus manual 17:34:44 +1 JF 17:34:55 .. Could report percentage of images with alt 17:35:53 .. Manual evaluation could use automated score plus look for critical errors (missing alt on images needed for path / process) 17:36:31 .. Example from this morning, was icon in navigation missing alt: that is a critical error 17:36:43 disagree with that conclusion 17:36:55 .. missing alt for decorative images not a critical error 17:37:05 ack dav 17:37:33 Q+ to note that a script could be created to append 100 1 X 1 "spacer gifs" with alt="" on a page and it would then pass 17:37:49 +1 to concern with gaming 17:38:00 David McDonald: So we need count of images used with all pages in a task, so all pages in flow, all images encountered in that flow? 17:38:52 .. sites always can be changing, so experience different from day to day 17:38:56 On David and Jeanne's point , https://www.w3.org/TR/wcag-3.0/#defining-conformance-scope In many cases, content changes frequently, causing evaluation to be accurate only for a specific moment in time; 17:38:57 q? 17:39:11 Jeanne: Definately needs to be snap shot. 17:39:16 Wilco has joined #ag 17:39:42 q+ to respond to JF in that we can't prevent people from misrepresenting their conformance, they wouldn't need to add images to pages to lie. 17:39:45 JF: Seems like this could be gamed too easily. 17:40:09 ack Lau 17:40:09 Lauriat_, you wanted to respond to JF in that we can't prevent people from misrepresenting their conformance, they wouldn't need to add images to pages to lie. 17:40:14 Could define a non-visible image with null alt text as not in scope. 17:40:18 .. for example add 100 spacer gifs all with alt="" and now the relative percent of images with good alt is much higher 17:40:21 q+ Are we really in the business of anti-gaming which seems like enforcement. That's not our function, imo 17:40:42 q- 17:40:48 ack me 17:40:48 JF, you wanted to note that a script could be created to append 100 1 X 1 "spacer gifs" with alt="" on a page and it would then pass 17:40:51 q+ 17:40:57 Shawn: With conformance claims, we assume people are not lying. 17:41:17 +1 to JF, a definition of an image that is "useful to the task" could help that issue? One for another day probably... 17:41:21 q? 17:41:26 ack Wil 17:41:41 q+ 17:41:42 ..this is not a new problem, but if someone wants to inflant their score currently, it is easy to do. 17:41:51 +1 Wilco - the needs of regulators 17:42:04 q+ to say we don't do compliance, that's a regulatory function 17:42:30 Wilco Fiers: Disagree as point of scoring is have something that reflects actual accessibility. 17:42:33 Q+ to talk about "critical images" and "tasks" 17:42:36 ack Ch 17:43:21 q+ 17:43:23 q+ to say we need to align the tests to what's useful for users 17:43:29 .. If we allow loopholes, people will game the system and exploit them. And they should. So we need to have scoring that reflect that reality. 17:43:35 qv? 17:43:49 ack Jan 17:44:04 ack me 17:44:04 JF, you wanted to talk about "critical images" and "tasks" 17:44:04 ack saj 17:44:05 sajkaj, you wanted to say we don't do compliance, that's a regulatory function 17:44:16 Chuck: This is not a new issue. Queue is long, so try to stay on topic. 17:44:25 ack shad 17:44:33 but if we want the regulators to take up WCAG 3, it has to meet their needs as well! 17:44:47 q+ discussing compliance by vendors and employees 17:45:02 q+ to discuss compliance by vendors and employees 17:45:25 Janina: We should be focused on people that want to do the right thing. If someone games the system, they risk being judged by someone else. 17:45:43 +1 17:45:46 +1 to approaching this as "How do we best do this", since we don't need to do it this particular way. 17:45:48 q? 17:45:50 ack ala 17:45:50 alastairc, you wanted to say we need to align the tests to what's useful for users 17:45:59 Shadi Abou-Zahr: I don 17:46:05 Caryn has joined #ag 17:46:11 't agree with word gaming. 17:46:20 .. developer would be following the rules. 17:46:28 question+ to look at the inverted score where the barriers or errors are counted instead of the overall score. 17:46:33 q? 17:46:38 .. it is about conforming or not. It is what the system offers. 17:46:39 +1 17:46:43 ack Jennie 17:46:43 Jennie_, you wanted to discuss compliance by vendors and employees 17:47:01 question+ How do we address attempted gaming? 17:47:14 Alastair: We can add scoping, for example decorative images are not included in the score. 17:47:26 +1 Jennie. W3C can't toss this over the wall and then wash our hands... 17:47:35 +1 Jennie 17:47:45 .. Current approach is to look for fails, not counts of successes, so less subject to this sort of gaming. 17:47:53 q+ to speak to the importance of the granularity of tests 17:48:13 ack abi 17:48:13 AbiJ, you wanted to speak to the importance of the granularity of tests 17:48:52 Jeannie Delisi: Agree with Janina, that if implimented, many of us work at the state or goverment levels where the scoring could be taken quite literally. 17:49:19 q? 17:49:27 Chuck: Jeanne has this as a question in the FCPWD, so there will be more conversation. 17:50:10 Q+ to ask "which" process? 17:50:17 q+ 17:50:18 Abi Janes: Want to emphasis that current wcag 2 can also be gamed, especially with all the emphasis for automated testing. 17:50:26 q+ to ask what happens between 69 and 70 percent 17:50:39 q? 17:50:59 Chuck: Moving on to slide 23, Outcom Rating. Ratings from 0 to 4. 17:51:04 ack Jennie 17:51:04 Jennie_, you wanted to ask what happens between 69 and 70 percent 17:51:20 ack JF 17:51:20 JF, you wanted to ask "which" process? 17:51:34 q+ to answer John's question on what "process" 17:51:51 .. potential threshold for each tier were suggested. Percentage range very much arbitrary. 17:51:54 zakim, list questions 17:51:54 I see 3 questions remaining: 17:51:55 Q1: Should we have a structure where a site or product must meet all automated tests before doing manual or qualitative tests? (1 supporter) 17:51:55 Q2: to look at the inverted score where the barriers or errors are counted instead of the overall score. (1 supporter) 17:51:55 Q3: How do we address attempted gaming? (1 supporter) 17:52:02 We have a slide at the end that is specifically to discuss scope, including the need to better define or move past process 17:52:04 ..idea is to look at big picture idea. 17:52:18 q? 17:52:31 ack Wil 17:52:34 ..These 0 - 4 ratings also potentially map to adjectival ratings 17:52:48 wireframes can be used to define processes for evaluation 17:53:03 this good drive good design standards too 17:53:06 JF: Assumes well defined process, but even social media links are starting a new view 17:53:36 Wilco: These percentages seem to reflect only automated testing, is that correct? 17:53:56 q? 17:54:00 ack Ch 17:54:00 Chuck, you wanted to answer John's question on what "process" 17:54:05 Jeanne: Yes, but that is not the long term intention. 17:54:35 question+ Should critical errors be across a process or by the view? 17:54:38 Re Wilco: Also, not all images are of equal value. We know about "presentational," but a function/control is arguably more critical than an illustration 17:54:44 q+ 17:54:53 From https://www.w3.org/TR/wcag-3.0/#defining-conformance-scope and you can also declare conformance by "view" (to get beyond page-only definitions of a view) 17:55:06 .. the time deadline for publishing were tight. We did not mean for this to be automated testing only. But yes, that is the characterization in the FCPWD. 17:55:41 ack Shad 17:55:55 Chuck: An important point that I want to come back to is that the entity making the conformance claim gets to say what is the process. 17:55:59 my concern as well Shadi 17:56:27 JF: If main drop downs naivigation menu have 35 items, is that not 35 processes? 17:57:29 Chuck: We acknowlege that the current phrasing is not sufficient. Please focus on the intent. 17:57:41 Automated testing can identify use of alt="spacer.giv" and it can also identify 1 picsel images 17:57:41 +1 to exploring Shadi's idea - perhaps that is where we should start with the joint ACT meeting? 17:58:09 +1 17:58:11 Shadi: The current approach of process seems like a red herring. 17:58:14 q? 17:58:37 Topic: Scoring Issues 17:58:44 .. Stepping back , we are trying to thing about what a process should be, what that should look like. 17:59:02 .. images in footer without alt tags are not getting in the way. 17:59:20 Chuck: moving on, slide 24, how to handle scoring 17:59:36 q? 17:59:38 Testing degrees of success is not efficient 17:59:39 q+ to ask about the percentages 17:59:45 Need for more than pass/fail but also need better explanation 17:59:56 Need to allow some small failures 18:00:08 Slide have links to issues. 18:00:16 How should the scoring be done at the outcome level? 18:00:23 Binary (WCAG 2.x, FPWD) 18:00:29 Percentage (FPWD at testing level) 18:00:36 Rating scale (FPWD at testing and outcome (confusing)) 18:00:42 Points 18:00:54 q+ 18:01:11 Jeanne: Picked these issues out because they are representative 18:01:12 ack Ch 18:01:30 scribe: Wilco 18:01:37 ack Jennie 18:01:37 Jennie_, you wanted to ask about the percentages 18:01:55 laura has joined #ag 18:02:01 Jennie: Are we talking about percentages as they were in the previous slide? 18:03:01 Jeanne: We had agreed we'd specify which significant digit to average to. 18:03:02 thank you 18:03:13 rrsagent, make minutes 18:03:13 I have made the request to generate https://www.w3.org/2021/04/29-ag-minutes.html laura 18:03:28 q? 18:03:37 ... We are looking at the possibility of using something like this at the outcome level. Previously we were looking at the outcome level. 18:03:49 q? 18:03:49 ... We could do percentage calculations at the outcome level. 18:04:03 https://github.com/w3c/silver/issues/508 18:04:10 I should have said "rounding" instead of "averaging" 18:05:02 Chuck: Was sort of a heavy process. We said we'd look at making it more efficient. You had said some people misunderstood this was a manual process. 18:05:37 q+ to ask if it is really a change (in practice) to count instances of fails? 18:05:43 Jeanne: Every place we intend counting should be tool assisted. The rating scale should be very efficient and have a tester quickly be able to assess the overall scope of what's tested. 18:05:53 Q+ to ask are 5 critical failures worse than 1 critical failure? Does the number of critical failures have an impact on scoring? (How? Why?) 18:06:16 ... On an overall scale be able to say where it fits on the scale. So they have guidance to say which is okay. 18:07:03 ... It could vary. In an example where you can go beyond the requirement, such as visual contrast, as long as you meet the minimum contrast you could get more points for going beyond the minimum. 18:07:08 q? 18:07:15 ack ala 18:07:15 alastairc, you wanted to ask if it is really a change (in practice) to count instances of fails? 18:07:53 q+ 18:08:16 Alastair: In our current reporting we report instances, include everywhere that fails alt texts. But also we attach importance to particular instances. 18:08:40 ... Wondering if other people report instances, if so is that going to make the reporting any more difficult? 18:08:40 q+ 18:08:41 q+ 18:08:47 ack JF 18:08:47 JF, you wanted to ask are 5 critical failures worse than 1 critical failure? Does the number of critical failures have an impact on scoring? (How? Why?) 18:09:12 +1 to Alastair to practice of reporting instances and fix priority 18:09:15 JF: It depends on who receives the report. I ask if 5 critical errors are worse than 1? 18:09:45 +1 to JF's point about the helpfulness of granular scoring even within levels of conformance 18:09:45 q? 18:09:52 ... Having a score below compliance is still a useful metric. I would like to measure progress, in addition to seeing the minimum bar. 18:09:52 ack david 18:09:54 +1 on showing progress, although I think that would work as proposed? 18:10:00 a +1 to measuring progress. 18:10:05 +1 18:10:09 +1, alastairc, I think it would 18:10:09 I think that would work as proposed as well 18:10:54 David: When I test, I test a set of pages manually, and crawl to test automatically. I'll scan some of the automated and give details on one of them, and tell them to look at automated results for others. 18:11:07 q? 18:11:10 q+ to talk about different regulatory and cultural approaches 18:11:18 ... I find if developers give a report, people glass over if they have more than 100 issues. 18:12:02 Chuck: As far as reporting goes, VPAT allows for partially conformance and indicate issue by issue where you don't comply. 18:12:04 @chuck - should we? 18:12:19 ... I presumed that other entities described the reporting requirements of WCAG 3. 18:12:45 ... I want to ask John that we track the count of critical failures? 18:12:51 q? 18:12:53 ack Ch 18:13:17 JF: There is value in that. I suggest our scoring mechanism should be an evolving thing that entities can use to track progress well. 18:13:33 q+ to say that nothing prevents companies from recording the information they find of value 18:13:48 q+ to say we are increasing granularity of what is tested, 2.x is very flat 18:13:59 ack me 18:13:59 Chuck, you wanted to say that nothing prevents companies from recording the information they find of value 18:14:07 ... In a 4 point scale you lose granularity. If I can reduce the number of issues, and my scoring counts number of issues and severity. That's a powerful tool that aids content creators in doing the right thing. 18:14:17 ack wil 18:14:17 ... Getting that feedback quickly is really useful. 18:14:30 present+ Laura_Carlson 18:14:52 wilco: I want to highlight the point john made. Scoring model will add to the workload. We are tracking data that we did not track before. 18:15:26 wilco: Often you assign a priority and track it, that's what is used in metrics. Strange that we aren't using that same model. Those are the metrics that orgs use. How many issues and the impact. 18:15:34 +1 Wilco - we previously spoke about "Dashboards" but that conversation seems to have gone away 18:15:34 Wilco: That seems a better fit than current proposal. 18:15:37 q? 18:15:40 ack abi 18:15:40 AbiJ, you wanted to talk about different regulatory and cultural approaches 18:15:42 +1 Wilco 18:15:55 +1 for John Foliot and Wilco Fiers 18:16:13 maturity model? 18:16:19 qv? 18:16:20 Abi: There may be different cultural requirements on this. In the public sector we have to report. Having a mechanism to show progress is very useful. We're looking to show where we need to do further work. 18:16:32 no David - specific "Jira" issues 18:16:56 we have an ICT dev cycle dimension 18:17:00 For reference points - Per WCAG 3 FPWD , https://www.w3.org/TR/wcag-3.0/#critical-errors . Also in WCAG 3 In addition, critical errors within selected processes will be identified and totaled. Any critical errors will result in score of very poor (0). https://www.w3.org/TR/wcag-3.0/#bronze, Views and processes MUST NOT have critical errors. Conformance to this specification at the bronze level does not mean every requirement in every 18:17:00 guideline is fully met. Bronze level means that the content in scope does have any critical errors and meets the minimum percentage of. 18:17:03 q? 18:17:05 ack ala 18:17:05 alastairc, you wanted to say we are increasing granularity of what is tested, 2.x is very flat 18:17:07 q+ to continue my habit of making meta comments in that I think whatever scoring and conformance mechanism we end up with, we'll need to support these different needs, so more to provide the building blocks, rather than the complete system 18:17:07 ... Development teams care to make thing pass when they work on it. Leadership teams want to know where the risk is. Compliance may be more important in some areas than in others. Having a robust methodology would be of value in other environments & countries. 18:17:08 +1 to ABi - Alastair spoke to "emonstraton of effort" 18:17:24 s/emonstraton /demonstration 18:18:01 +1 for John & Wilco as well 18:18:19 q? 18:18:19 Alastair: A lot was built on WCAG 2, like VPATs and reporting. Part of this process is to draw out the most useful bits and put them in the standard. 18:18:57 q+ to talk to efficiency 18:19:27 ... Currently people have to add impact. The levels don't really help on impact because it's not associated with task. 18:19:40 q+ to ask Wilco et al if "impact" can not be subjective 18:19:41 ... Adding that bit more granularity could be built on with more reporting. 18:19:49 ack lau 18:19:49 Lauriat_, you wanted to continue my habit of making meta comments in that I think whatever scoring and conformance mechanism we end up with, we'll need to support these different 18:19:52 ... needs, so more to provide the building blocks, rather than the complete system 18:20:15 q+ 18:20:20 +1 to shawn 18:20:20 q+ 18:20:23 Shawn: Whatever scoring we end up with, the goal is not to create an end-to-end system that is ideal, but to create the building block to support different tracking systems. 18:20:45 ack Chu 18:20:45 Chuck, you wanted to talk to efficiency 18:20:49 ... It simplifies the things we're doing, as opposed to making a process that may not align with their needs. 18:21:10 Chuck: If we just have the data requirements. How an organisation tracks that can be left to the organisation. 18:21:23 With what's proposed we could still measure scores and progress for disability category at view level such as 3.4, 3.5, etc. So I think we can pull metrics out of the proposed scoring. 18:21:33 q+ 18:21:34 ... But what they do with the count of critical errors and non-critical errors. They can track that data. 18:22:14 q? 18:22:19 ... Whether or not there is additional cumbersome, there is more data from the FPWD to track. I wonder if efficiency has to relate to the quantity of work, as opposed to the quantity of data. 18:22:30 q? 18:22:31 Shawn: This is where validation with stakeholder is going to be essential. 18:22:32 q? 18:22:34 ack bru 18:22:34 bruce_bailey, you wanted to ask Wilco et al if "impact" can not be subjective 18:22:44 q+ 18:22:58 Bruce: Impact is pretty important, but it isn't captured in a report like a VPAT. 18:22:59 q+ 18:23:11 Q+ to respond to Bruce 18:23:16 ack dav 18:23:19 18:23:19 18:23:19 18:23:20 18:23:20 18:23:20 isn't "impact" the critical error? 18:24:03 David: Do we aspire to make a statement of if WCAG 3 will be as hard to evaluate as WCAG 2? 18:24:08 +1 18:24:20 q+ to say that I thought we have a requirement to be easier 18:24:20 -1 that we will need to include more disability needs and that will add to the evaluation. 18:24:22 q? 18:24:23 ... The gains we'll have is worth making it harder to test. 18:24:25 ack shadi 18:24:36 q+ to suggest that bronze = WCAG 2.x level of effort, but Silver/gold should track higher and affect process. 18:24:50 Shadi: Think of this from the perspective of how will this make it better. Transparency is what we want. 18:25:01 +1 for Shadi's comment 18:25:03 ... I think the simpler the scoring method is, the less requirements we have for reporting. 18:25:19 q+ to say I think based on earlier conversation whether it will be "harder" or "easier" 18:25:38 ... Lets take WCAG 2. Scoring is fairly intransparent. There are assumptions about the levels and requirements. Two sites can have very similar scores but be very different in accessibility. 18:25:44 q- 18:26:10 ... It would be good to have references on how the score was achieved and calculated. But if we have a fairly simple model, such as counting the errors, it is transparent enough. 18:26:20 +1 to Shadi 18:26:33 q? 18:26:36 ack jon 18:26:42 ... The other issue is what data to collect, that could left up to whoever wants the data. 18:27:39 +1 jon_avila 18:27:46 Jon: The current proposal can show progress improving for disability categories. What I hear customers want to improve their score, or they want to improve access for people. They want to know what the issues are that if they fix them will make their score more compliant. 18:28:08 ... I think we can have that with tests that map to disability groups, and also knowing the score we can calculate what technique would be most effective. 18:28:24 wilco: Suggest for scoring model I hope we can keep it lean. 18:28:39 wilco: That's what Shadi suggested as well. The more data we collect the more complicated it becomes. 18:28:52 qv? 18:28:58 wilco: With questionable benefit to the end user. I like the binary solution. 18:29:14 I know I keep asking this, but can we create a scoring tool to make it easier? 18:29:14 wilco: I'd like to consider using that for the most part for using a scoring system where it's necessary to keep things simpler. 18:29:43 wilco: Bruce asked about impact... it might be beneficial to have an informative part of WCAG 3 to say "if you want a standardized way to prioritize your issues, here's how" 18:30:04 wilco: But not part of conformance model itself. 18:30:12 q+ to remind everyone of Jeanne's suggestion this morning 18:30:13 q? 18:30:16 ack wil 18:30:18 q+ 18:30:26 ack JF 18:30:26 JF, you wanted to respond to Bruce 18:31:36 John: On impact, I used to do workshop where we'd take errors and prioritise them. One thing was that we could prioritise based on different measures. Impact changes based on the role. 18:32:11 ... Binary is complicated, but even the more complicated. A cognitive walkthrough is essentially a series of yes/no questions. 18:32:17 ack ala 18:32:17 alastairc, you wanted to suggest that bronze = WCAG 2.x level of effort, but Silver/gold should track higher and affect process. 18:32:51 Alastair: As a broad brush, it would be useful if bronze was roughly the same effort as WCAG 2 AA. 18:33:56 ... Not sure how binary scores contribute to the overall score. At the guideline level, however things happen without that guideline / outcome contributes a known thing to the final score. Does that free us up to experiment with something like an alt text one that is binary vs an alt text using a rating scale. 18:34:10 q? 18:34:10 q- 18:34:12 ... I think you can compare how they track against effort level, reliability and impact to the end user. 18:34:30 ack Chu 18:34:30 Chuck, you wanted to remind everyone of Jeanne's suggestion this morning 18:34:31 ... Can we then focus on making the tests as best we can, with some experimentation. 18:35:12 Chuck: This morning Jeanne suggested developing some binary tests. Nothing prevents us from creating binary tests, while others explore different types of tests. 18:35:34 q+ to clarify binary 18:35:41 q? 18:35:45 ack She 18:35:48 ... That doesn't prevent us from exploring other options. 18:36:20 ack She 18:36:36 is that a good thing or a bad thing Sheri? 18:36:43 Sheri: I want to make sure that everyone understands, the more data is generated, the larger the trail of evidence that can be used in court. Even if generated in court, it is still discoverable evidence. 18:37:03 WHat if we are allowing minor errors? 18:37:10 +1 Sheri's comment 18:37:13 The #1 phrase related to WCAG = "It depends" 18:37:26 q? 18:37:32 ... On the fence on if that is a good thing. If the company does very poor it's a good thing. But if you're close to perfect, that can be problematic. 18:37:32 ack Wil 18:37:32 Wilco, you wanted to clarify binary 18:37:51 Wilco: I want to clarify, I meant preference for outcomes being binary instead of on a scale. 18:37:58 That's why we need to have a very solid definition of "conformance" 18:38:18 Chuck: I believe we can create all the binary tests we want, but we can explore other options too. 18:38:26 q? 18:38:47 q+ to say binary + critical, scoring is difficult for reliability 18:38:56 ack kathy 18:38:56 kathyeng, you wanted to say binary + critical, scoring is difficult for reliability 18:39:10 https://github.com/w3c/silver/issues/508 18:39:42 q+ 18:39:46 ack Rach 18:39:46 Kathy: From trusted tester perspective, it wouldn't be too difficult to add a critical reading. But it would be difficult to get testers to score on a 0-4 rating consistently. I think the interrater reliability would suffer from adding the scoring part. 18:39:47 +1 18:39:59 +1 18:40:20 +1 to Kathy and that if we make it too hard to learn, we suffer. 18:40:38 Rachael: We have binary tests, but moving from the concept of everything must pass or have a fail. Binary is still a rating system, it's just a two-point rating system. It's just a question of how many divisions we want to have. 18:40:46 q? 18:41:25 Chuck: Moving on to issue 463. There is a lot of positive commentary regarding our effort to expand beyond pass/fail. 18:41:45 https://github.com/w3c/silver/issues/463 18:41:46 ... This particular issue suggested it needs a better explanation. 18:41:47 https://github.com/w3c/silver/issues/463 18:42:33 ... We've delved into the conversation about scoring at the outcome level. The options are binary, percentage, rating scale, or points. 18:43:03 ... Wilco expressed a preference for binary. This repeatedly came up in the issues that were raised. 18:43:08 q? 18:43:21 ... Come had a preference for the WCAG 2 style, others were very supportive. 18:43:43 q? 18:44:29 Rachael: We want a general idea of where the WG & TF are. Understanding where everyone falls on this is helpful. 18:45:09 poll: Binary, Multiple styles (including binary) 18:45:26 JakeAbma has joined #ag 18:45:30 Binary (so that the standard stays legally enforceable) 18:45:31 present+ 18:45:34 instances or pages?? 18:45:36 I think it is important to explore multiple styles more 18:45:41 Chuck: If you support of the system being limited to binary, select that. Otherwise multiple styles. 18:45:51 multiple styles 18:45:52 Multiple styles 18:45:56 Multiple styles 18:45:57 Multiple styes 18:45:58 multiple styles 18:46:00 multiple styles 18:46:05 multiple styles 18:46:07 multiple styles 18:46:09 Multiple styles 18:46:12 multiple styles 18:46:21 Chuck: This particular question is irrelevant to what the scope is. It works whatever the scope is. 18:46:23 q+ 18:46:24 Binary first, additional styles second 18:46:28 more granular scoring 18:46:44 multiple styles, especially if the scoring rubric for rating scales are clearly defined/account for inter rater reliability 18:46:44 +1 to Binary first, additional styles second 18:46:48 Binary first, additional styles second 18:46:55 My preference is to keep working at making 0 - 4 ratings possible 18:46:56 multiple styles, more granular. Also interested in barrier walkthru method 18:47:08 +1 to Detlev 18:47:15 binary first, additional styles second 18:47:24 Binary first, additional styles second, keep working at making 0 - 4 ratings possible 18:47:25 Detlev: If you had different instances where certain criteria apply. If you have to decide to give 1.1.1 a pass or a fail for a unit that has many instances, you have to decide if you're going to tolerate a few missing, or if you're quite strict. 18:47:37 q+ 18:47:44 Q+ to ask about Shadi's suggestion about counting-up instances 18:47:46 ... I don't agree that it's all the same. 18:47:51 q- later 18:48:10 ack wil 18:48:20 A scoring system can work if we are very clear and granular on how things are scored. 18:48:32 wilco: I don't understand the level we are voting on. I'm suggesting binary scores for outcomes where possible. 18:48:51 wilco: It's 3 level's of conformance. There's different things within that. 18:48:53 q? 18:49:36 Explore new model beyond WCAG 2.0 18:50:12 q+ to say not wcag2 not so binary 18:50:19 +1 Wilco: Good, Better, Best 18:50:28 ack JF 18:50:28 JF, you wanted to ask about Shadi's suggestion about counting-up instances 18:51:16 John: I suggest a more granular scoring mechanism. Shadi proposed counting instances of problems. I'm struggling with the 0-4 scoring. It does not give the granularity I believe we need to track progress. 18:51:20 q? 18:51:38 ... 4 is perfect. 3.5 is minimum, and below it is not good enough. I don't see us gaining much from that. 18:51:48 ack shad 18:51:51 ... I'd like something far mor granular. 18:51:57 3.4 is useful because it shows how far you are from 3.5 18:52:26 We know that 3.5 is too high from the testing we already have done 18:52:52 Shadi: Color contrast could be clearly done in multi-step. Other things, rate a text alternative 1 - 4 can be quite difficult. You get very inconsistent results. Multiplies by the number of instance on the page gets more complex. 18:52:54 q? 18:53:06 ... What is the test. How to do the test comes from the nature of the test itself. 18:53:06 ack bruce 18:53:06 bruce_bailey, you wanted to say not wcag2 not so binary 18:53:07 +1 18:53:42 does anyone post claims? 18:53:46 q+ 18:53:51 Bruce: I don't think the 2.0 model is as binary as you characterise it. There are not as many people posting conformance claims. If things get missed, there is continuous improvement to work in that. 18:53:56 All public sector orgs in Europe are posting a form of conformance claim. 18:54:04 ... The question is are we doing good enough. 18:54:19 ... I don't think we should worry too much about this. There is already the ambiguity in the current model. 18:54:21 q? 18:54:25 ack shadi 18:54:28 q+ 18:54:56 +1 to shadi 18:55:04 ack jon 18:55:11 Shadi: Binary is in how the test is constructed. We want to avoid rigidness of WCAG 2. Binary or non-binary isn't the issue. 18:55:18 +1 to that distinction Shadi is making 18:55:22 q+ to say we need to make it more concrete and test it 18:55:46 +1 to avoiding rigidity 18:55:52 q+ to add in the positive direction as well 18:55:55 ack ala 18:55:55 alastairc, you wanted to say we need to make it more concrete and test it 18:55:57 Jon: A lot of contract are written to conform to a standard, and most of the time nobody does. It would be really good to get more granular data, what does substantial conformance actually mean to the user impact. 18:56:11 +1 to Shadi's comment re: rigidity - that's the issue 18:56:17 +1 to jon, WCAG is binary as a standard to conform to, but the fact that companies interpret and report on WCAG in a non-binary form is a whole different topic. 18:56:41 q- 18:56:46 Alastair: This feels quite abstract. It would help to cary on exploring various ways to do tests, and come up with a variety of different requirement, using different methods and see how it works. 18:57:01 q? 18:57:03 ... So long as they all contribute to a final score on conformance. 18:57:54 Chuck: If you look at WCAG 2.0, it isn't fully binary. Exploring different ways to test appeals to me as well. 18:57:58 q? 18:58:20 Thank you to everyone! 18:58:32 Thanks. 18:58:48 Azlan has left #ag 18:58:59 rrsagent, make minutes 18:58:59 I have made the request to generate https://www.w3.org/2021/04/29-ag-minutes.html laura 18:59:00 Look forward to seeing you in 2h! 19:00:04 rrsagent, make minutes 19:00:04 I have made the request to generate https://www.w3.org/2021/04/29-ag-minutes.html jeanne 19:10:23 ToddLibby has left #ag 20:20:40 jeanne has joined #ag 20:22:39 zakim, list questions 20:22:39 I see 4 questions remaining: 20:22:40 Q1: Should we have a structure where a site or product must meet all automated tests before doing manual or qualitative tests? (1 supporter) 20:22:40 Q2: to look at the inverted score where the barriers or errors are counted instead of the overall score. (1 supporter) 20:22:40 Q3: How do we address attempted gaming? (1 supporter) 20:22:40 Q4: Should critical errors be across a process or by the view? (1 supporter) 20:46:55 Fazio has joined #Ag 20:54:50 Ben has joined #ag 20:56:12 present+ 20:58:15 PeterKorn has joined #ag 20:58:21 Present+ 20:58:32 preseent+ 20:58:45 Jennie has joined #ag 20:58:51 present+ 20:59:07 bruce_bailey has joined #ag 21:01:00 Nicaise has joined #ag 21:01:07 present+ 21:01:27 MichaelC2 has joined #ag 21:02:14 Makoto has joined #ag 21:02:29 present+ 21:02:40 Detlev has joined #ag 21:02:40 Present+ 21:02:46 present+ 21:02:59 present+ 21:03:03 scribe: sajkaj 21:03:16 maybe someone sober 21:03:31 ..or awake! 21:04:01 kathyeng has joined #ag 21:04:08 present+ 21:04:30 agenda? 21:04:35 Topic: Session 3 - Conformance 21:04:38 rrsagent, this meeting spans midnight 21:05:21 Sheri_B-H: Reminds of mtg P & Q -- use q+ with q+ to say or q+ to ask 21:06:03 Sheri_B-H: Also reminder to keep points brief, there are many points (and many participants); please avoid metaphors and allegories; 21:06:23 AngelaAccessForAll has joined #ag 21:06:27 https://w3c.github.io/PWETF/ 21:06:28 Rachael: Slide 28 is current state 21:06:34 present+ 21:06:45 Slides <- https://docs.google.com/presentation/d/1eUbNUGFaqbI87tx7vVMvDwxT8GNsAHD-SCWYddgEWEE/edit#slide=id.gd5abc90fa9_0_116 21:06:58 Rachael: Were at single item test -- "atomic" test -- agreement to explore different ways to do that in 2nd session 21:07:27 s/Sheri_B-H: Reminds/ShawnL: Reminds 21:07:30 Rachael: Also agreement to do better on inter-evaluator agreement -- different people should reach similar conclusions 21:07:50 s/Sheri_B-H: Also/ShawnL: Also 21:08:05 Rachael: Now we get to overarching conformance level discussion 21:08:08 present+ 21:08:27 Rachael: FPWD has three levels, Bronze, (similar to A/AA/AAA and considered minimum); 21:08:46 present+ 21:08:49 q? 21:08:49 Rachael: Silver and Gold were proposed to be some kind of wholeistic testing -- but not defined yet in FPWD 21:09:14 Rachael: Notes we've touched on many of these questions during earlier conversations today 21:09:29 Rachael: Should we have a level lower than Bronze? 21:09:37 +AWK 21:09:53 q+ 21:09:58 Q+ 21:09:58 q+ 21:09:58 Rachael: One proposal was a level based on automated testing -- phps this would be qualification for going to a higher level 21:10:03 q+ 21:10:13 q+ 21:10:17 ack Jennie 21:10:22 q+ to s/automated/automatable/ 21:10:33 q+ to say automated testing doesn't align with end user experience 21:10:49 Jennie: State of Minnesota review was concerned that Bronze not quite A/AA; Could we write something like "Silver Level" required by Minnesota? 21:11:08 Rachael: W3C wouldn't define what is ideal; that's regulatory 21:11:25 Rachael: suggest we make that a question for later discussion 21:11:30 q+ to propose an introductory "level" that is based on EasyChecks. https://www.w3.org/WAI/test-evaluate/preliminary/ 21:11:42 Jennie: meaning will it be spelled out so that's easy enough for us to decide? 21:11:52 q+ to mention that AAA SC being in the same doc is useful. 21:12:03 q+ to speak to levels lower than bronze 21:12:06 [agreement that the idea is the same, but stated differently] 21:12:07 ack JF 21:12:15 qv? 21:12:23 jf: Notes Makoto's workshop feedback ... 21:12:36 jf: Asks about scoring content free applications 21:12:57 question+ How do we score content free components or templates 21:13:13 jf: believe that 3 levels may not be sufficient 21:13:17 ack PeterKorn 21:13:25 q+ to say that a lower than bronze might facilitate components and frameworks 21:13:35 PeterKorn: Lot I like about requiring a level set that could be automated 21:13:52 johnkirkwood has joined #AG 21:13:56 PeterKorn: Notes there are currently AAA items that could be testable but we may not want to require for B 21:14:00 q- 21:14:12 PeterKorn: So we may need more nuance in what goes into levels 21:14:12 We might want to consider types of content like elearning content that isn't built the same way as other online experiences 21:14:15 i think ALL current AAA sc are testable 21:14:22 ack MichaelC2 21:14:36 ack MichaelC 21:14:36 MichaelC, you wanted to s/automated/automatable/ 21:14:46 MichaelC2: "Automated" problem for me because it implies using tools; prefers automatable 21:14:49 +1 to MichaelC 21:14:50 +1 to Michael 21:14:55 +1 21:14:59 +1 21:15:04 +1 21:15:16 +1 with a thought towards "automatable using the ACT Rule Set" 21:15:19 ack AWK 21:15:33 awk: +1 to Peter because we want to be smart about what we require even if it's automatable 21:15:52 awk: What's automatable is clearly insufficient 21:16:06 q+ 21:16:15 awk: Have heard people have a concern about stopping at such a level because it doesn't support a full experience for users 21:16:31 Suggested reframed question: "Should we have a lower level that includes an agreed upon subset of automatable tests?" 21:16:57 awk: But, if a large chunk of the web were nightly tested with what is automatable, wouldn't that flush out data we need to help make the experience better and organizations incented 21:17:15 +1 to AWK. Having a clearly defined "thing" (whatever exactly that is) that is 100% automatable that we incent folks to do is valuable. 21:17:25 awk: If we set it up in a way that was attainable, that could help ease people into higher levels than if B were the entry point 21:17:29 q+ 21:17:36 awk: There are very few sites that meet A/AA 21:17:39 ack alastairc 21:17:39 alastairc, you wanted to say automated testing doesn't align with end user experience 21:18:01 alastairc: Would not be in favor of automatable testing 21:18:09 Q+ to ask whether we sidestep "conformance" altogether, and simply focus on "scores", with levels based on minimum scores 21:18:37 alastairc: picking the automatable might focus on certain pwd categories 21:18:41 shadi has joined #ag 21:18:55 alastairc: potential that could be a lower level though also not yet convinced 21:19:02 alastairc: few go for level A 21:19:15 q+ to share feedback from Japanese experts 21:19:21 +1 to Alistair 21:19:36 sorry Alastair 21:19:42 ack jeanne 21:19:42 jeanne, you wanted to propose an introductory "level" that is based on EasyChecks. https://www.w3.org/WAI/test-evaluate/preliminary/ 21:19:42 https://www.w3.org/WAI/test-evaluate/preliminary/ 21:19:51 jeanne: is link to Easy Checks 21:20:21 jeanne: also not in favor of a level; but like that it could be another doc that would be an intro level for beginners or small orgs 21:20:46 Plus we could have a percentage score that you can improve on, until you do get to the baseline of bronze. 21:20:47 jeanne: recalls someone suggested WCAG 3 could be multiple normative docs 21:21:06 ack bruce_bailey 21:21:06 bruce_bailey, you wanted to mention that AAA SC being in the same doc is useful. 21:21:08 shawn: Notes that there are additional options as opposed to we need to explore them all now 21:21:19 qv? 21:21:36 bruce_bailey: in favor of another lower level based on teaching people 21:21:55 bruce_bailey: If one is bringing a site up to speed the first time, it makes sense starting in the A level 21:22:06 bruce_bailey: The levels have been helpful that way 21:22:33 bruce_bailey: When we wrote 2.0 we didn't know regulators would pick up AA; that was unknown 21:22:50 bruce_bailey: And AA is a pretty high bar 21:22:57 Our typical audits find more A issues than AA issues :-/ 21:23:13 bruce: Notes Canada did AA with key exceptions--captions, maps 21:23:24 bruce_bailey: Let regulators figure out their levels 21:23:34 bruce_bailey: Let's figure out our part 21:23:35 ack Ben 21:23:35 Ben, you wanted to speak to levels lower than bronze 21:24:26 ack Chuck 21:24:26 Chuck, you wanted to say that a lower than bronze might facilitate components and frameworks 21:24:37 +1 to describing the low score rather than giving a level\ 21:24:37 Ben: Concerned about implications of levels describing clearly what they are 21:24:52 ack Ch 21:24:56 q+ 21:24:58 chuck:suggest the two could work well together auto and human 21:25:02 ack PeterKorn 21:25:12 qv? 21:25:17 +1 to chuck, and I would include design systems in the "raw components" list 21:25:31 PeterKorn: Notes that what is automatable changes over time especially now with AI 21:25:56 PeterKorn: so if we go down that path we have a lighter weight mechanism for greowing that level without the full W3C TR process 21:26:04 +1 to PeterKorn on the maintainability point 21:26:33 +1 to PK 21:26:33 PeterKorn: important we continue to keep up with advances so that results continue to be most meaningful 21:26:46 ack jaunita_george 21:26:56 q+ to say that if more things become automatable over time, we shouldn't pick guidelines based on that, it should be based on the user-impact. And do we need another level if we have a score? 21:27:20 ack JF 21:27:20 JF, you wanted to ask whether we sidestep "conformance" altogether, and simply focus on "scores", with levels based on minimum scores 21:27:42 jaunita_george: Suggests possible separate guidance for certain content types, i.e. educational 21:28:07 jf:Suggests looking at reqs and thinking from points perspective rather than A/AA/AAA 21:28:18 jf: might help us not get our comparison categories confused 21:28:52 jf: Could support more customizable testing for particular content type 21:29:10 jf: the more one tests, the more one may improve how well one does 21:29:31 ack Makoto 21:29:31 Makoto, you wanted to share feedback from Japanese experts 21:29:34 Interesting idea John 21:29:50 Makoto: Notes regulators will decide on levels; a11y isn't all or nothing 21:30:24 Makoto: Strong and numerous requests from Japan to induce people to try making content accessible by providing a more attainable level 21:30:33 +1 Makoto 21:30:41 +1 21:30:48 Makoto: many private orgs who aim at A as first step because it's hard to meet some AA 21:31:13 Ok, good to know that some areas do start with single-A and easychecks. 21:31:15 Makoto: Also some people who start with basics like Easy Checks because there's no legal pressure in Japan 21:31:37 Makoto: if WCAG3 becomes i18n standard, Japan will follow it 21:32:05 Makoto: We should create conformance scheme that can meet needs 21:32:10 ack Wilco 21:32:13 qv? 21:32:54 Wilco: Notes many of us work for large orgs that can support people spending lots of time working on WCAG 21:32:56 q+ to point out the difference between levels by difficulty in implementing vs. levels by level of accessibility for people with disabilities 21:33:00 Wilco: much of the web can't do that 21:33:05 +1 Wilco 21:33:10 +1 21:33:18 Wilco: would like us to prserve a lower entry barrier 21:33:25 q+ to say that while small organizations can't meet it, often the companies and toolkits they use can get close 21:33:48 Wilco: believe AAA much less used; very few even try and would like to see us encourage making the higher levels meaningful and achievable 21:33:54 To Wilco's point: Over 99 percent of America's 28.7 million firms are small businesses. The vast majority (88 percent) of employer firms have fewer than 20 employees, and nearly 40 percent of all enterprises have under $100k in revenue. 21:34:03 source: https://www.jpmorganchase.com/institute/research/small-business/small-business-dashboard/economic-activity#:~:text=Over%2099%20percent%20of%20America's,under%20$100k%20in%20revenue. 21:34:08 Wilco: also thinks it a bit odd that the same reqs apply to small orgs and mega orgs 21:34:26 q+ 21:34:31 ack ala 21:34:31 alastairc, you wanted to say that if more things become automatable over time, we shouldn't pick guidelines based on that, it should be based on the user-impact. And do we need 21:34:34 ... another level if we have a score? 21:34:38 Wilco: Would like higher level that does attract 21:34:56 alastairc: to things becoming more automatable argues for not basing a level on that 21:35:18 alastairc: an argument against on top of not aligning with user needs not supported at that level 21:35:51 qv? 21:35:57 alastairc: Seems Bronze was picked as most common around AA though I hear Makoto's point 21:36:12 +1 Alastair - all progress is good progress 21:36:13 Should we write practical guidance for regulators and courts to use when applying a standard? I want us to aim to think about where the standards fit in future interpretations of Title III of the ADA and other similar laws around the globe. 21:36:14 alastairc: notes that 50% is better than 30% even one is aiming at 60% 21:36:32 alastairc: suggests some kind of normalization across functional outcomes 21:37:08 shawn: Want to point out that levels defined by difficulty of reaching is one approach; but another is level by support for a11y 21:37:36 shawn: Suggest we consider a system that better supports beginners; and/or who are overwhelmed with 2.x; 21:38:02 shawn: Or, should WCAG3 more explicitly support a lower level 21:38:36 Rachael: is to be based on difficulty of achieving? or on support provided? 21:38:41 q+ to ask, following on Wilco, if it makes sense to set levels to meet requirements based on what the product is doing. For example, a site to buy flowers is very different from a site alerting people to fire evacuations 21:38:50 q+ 21:39:28 shawn: Hearing we want a way to support our users better 21:39:29 q+ 21:39:47 ack Lauriat_ 21:39:47 Lauriat_, you wanted to point out the difference between levels by difficulty in implementing vs. levels by level of accessibility for people with disabilities 21:39:52 ack Rachael 21:39:52 Rachael, you wanted to say that while small organizations can't meet it, often the companies and toolkits they use can get close 21:40:00 qv? 21:40:23 Rachael: Notes her business works with small orgs, typically they're on a CMS that helps them; and they pass or fail based on CMS support 21:40:52 Q+ to follow up on Rachael's point 21:40:52 +1, fair point Rachael 21:40:54 ack shadi 21:41:00 shadi: concerned about separating by org size 21:41:07 +1 Rachel 21:41:07 +1 to Rachael 21:41:26 +1 for not giving lighter requirements for the large component, template, and authoring tools 21:41:28 shadi: we're straying into law/policy and that brings up undue burden which is out of scope for W3C 21:41:33 +1 to shadi 21:41:34 +1 to Shadi's point 21:41:34 +1 to Shadi 21:41:46 +1, that was not what I intended to suggest 21:41:47 +1 21:41:57 The ADA at least has slightly more flexibility for small orgs while not actually changing the base requirements 21:42:06 shadi: though maybe type of website may be a useful org principle 21:42:28 q? 21:42:38 ack Rain 21:42:38 Rain, you wanted to ask, following on Wilco, if it makes sense to set levels to meet requirements based on what the product is doing. For example, a site to buy flowers is very 21:42:41 ... different from a site alerting people to fire evacuations 21:42:58 Rain: Shadi and Rachael resonate with me, but nervous about different levels of expectations 21:43:18 +1 Rain 21:43:25 +1 21:43:39 Rain: smaller orgs might want to be the same groups who are trying to serve pwd the most and should do better than minimum; especially if threat to life or health 21:44:03 +1 21:44:16 q+ to observe that SBA orgs might warrent a break, but Wix and Square Space are NOT small businesses ! 21:44:31 Rain: we don't want to have orgs stray into levels that aren't supportable 21:44:39 +1 to Rain, also great point by Rachael re CMS/website hosts/builders/providers 21:44:48 proposed straw poll: Should we have a lower level of conformance? Yes/No 21:44:51 shawn: having a lower level of conformance appears to not have support 21:45:08 shawn: but having a defined set at such a level seems something useful 21:45:24 Yes 21:45:37 Not sure that's the right question to be asking 21:45:51 Janina: I am intrigued by defining what can be automatable... 21:46:03 ack sajkaj 21:46:19 +1 Janina 21:46:23 +1 21:46:25 ... it also has an opportunity for component libraries. We've talked about it some in conformance challenges but may want to dive into it. We may want to say more about it. 21:46:41 Fazio has joined #Ag 21:46:46 qv? 21:46:54 ...That is where people get their basic frameworks. 21:47:03 ack jaunita_george 21:47:07 question? 21:47:11 question+ Discuss more about how to handle frameworks and CMSes 21:47:12 juanconcerned about a lower standard 21:47:18 present+ 21:47:19 q+ 21:47:23 juanmany make sites accessible because they have to 21:47:44 sajkaj: Want to note we can define the lower level as "necessary bvut insufficient" 21:47:58 juanshould we make suggestions to regulators around the implications here? 21:48:10 ack JF 21:48:10 JF, you wanted to follow up on Rachael's point 21:48:19 q+ to ask what specifically would be involved (e.g. functional outcomes?), and say that we do have a lower standard than bronze now - single A. 21:48:53 jf: Recalls a goal for WCAG3 to roll in ATAG & UAG reqs; much of that already there but won't score the same way a completed site that people use 21:49:22 jf: back to suggesting focus on points 21:49:45 jf: notes specific numbrs immaterial but would map progress 21:50:02 +1 to JFs explanation of different totals for different product types 21:50:08 ack bruce_bailey 21:50:08 bruce_bailey, you wanted to observe that SBA orgs might warrent a break, but Wix and Square Space are NOT small businesses ! 21:50:16 +1 21:50:33 +1 - focus on measurement, let the regulators set minimums 21:50:33 bruce: agrees it doesn't need to be a formal level, just a way to note key milestone 21:50:54 ben said earlier that we could give a verbal description of scores below bronze 21:50:59 bruce_bailey: Notes the impact assessment government does; 21:51:19 +1 to Bruce 21:51:28 bruce_bailey: so these CMS systems are large companies and should actually support a11y better so small orgs can do better 21:51:49 q- 21:52:05 shawn: our CMS could say we have a score of X will imply ... 21:52:19 jf: yes progress toward B 21:52:53 jf: believe we're in a silo now and should instead think about a11y of whatever thing we're describing 21:53:08 q+ to clarify question 21:53:15 jf: so maximum for the CMS is X; how do you do against that measure? 21:53:16 ack ala 21:53:16 alastairc, you wanted to ask what specifically would be involved (e.g. functional outcomes?), and say that we do have a lower standard than bronze now - single A. 21:53:43 q+ 21:53:51 alastairc: what would be the diff if regulators said some orgs need to meet X and other orgs X+y 21:54:49 alastairc: notes that A is lower than B 21:54:53 ack Rachael 21:54:53 Rachael, you wanted to clarify question 21:55:03 Rachael: Would like to clarify JF's question 21:55:08 q- [won't add enough to the conversation to be worth taking the time] 21:55:21 Rachael: do we want some named category that means a threshold 21:55:30 sajkaj: Suggest "threshold" is a good name 21:55:42 q- [my comments don't add enough to be worth being in queue] 21:55:56 Rachael: believe we should first answer that 21:56:05 should we even be defining bronze, silver, gold? 21:56:14 +1 to one question at a time! 21:56:20 q+ 21:56:20 shawn: so we have b/s/g now; should we add a -B level? 21:56:37 Rachael: and phps how many levels? 3? 4? 5? 21:56:51 shawn: notes reqs for motivation is source of b/s/g 21:57:08 I'm not familiar enough with the proposals for silver/gold, but I wonder if they could be combined? 21:57:10 https://www.w3.org/TR/wcag-3.0-requirements/#motivation 21:57:15 q- PeterKorn 21:57:19 Motivation: The Guidelines motivate organizations to go beyond minimal accessibility requirements by providing a scoring system that rewards organizations which demonstrate a greater effort to improve accessibility. 21:57:52 Wilco: why wouldn't we equate AA to Silver? We could do that 21:57:59 Francis_Storr has joined #ag 21:57:59 Rachael: Agrees. That's an option 21:58:06 present+ 21:58:25 So question is more like: Should we have a lower level more equivalent to single-A? 21:58:38 q+ 21:58:45 q+ to the problem of puting AA-type guidance in another level 21:58:46 ack Wilco 21:59:17 qv? 21:59:47 Jennie: Notes that 3 seems easier for people to grok 22:00:10 Jennie: has there been conversation on how more levels would be perceived? What would promote understanding and adoption 22:00:24 shawn: Research did uncover that current naming has confused people 22:00:48 shawn: Where schools grade A-F, A is great and lead to misunderstanding 22:01:48 scribe: Suzanne 22:01:52 scribe: SuzanneTaylor 22:01:57 ack Jennie 22:02:00 ack jeanne 22:02:00 jeanne, you wanted to the problem of puting AA-type guidance in another level 22:02:29 same problem, different name 22:02:41 q+ to say mixing automatable and requires human evaluation in a lowr level would be counter productive 22:03:25 Jeanne: I do want to caution people about the problem of putting guidelines into level in a way that might result in unbalanced representation of different disability groups 22:03:27 ack sajkaj 22:03:27 sajkaj, you wanted to say mixing automatable and requires human evaluation in a lowr level would be counter productive 22:04:31 q+ to say participant level 22:04:31 Janina: what is valuable in defining an automated set, is the power of what technology can do 22:04:59 ... we could have this type of level, but make it very clear that this level is not sufficient 22:05:02 ack kathyeng 22:05:02 kathyeng, you wanted to say participant level 22:05:23 "Wooden spoon" comes to mind 22:05:56 +1 to participation medals! 22:06:00 q+ 22:06:03 +1 Kathy, and add that to what Alastair noted about demonstration of effort 22:06:03 Kathy Eng: What about a participant metal - they tried, but they have not achieved bronze yet. I would not support providing a lower level of conformance, beyond something like a participant metal 22:06:11 ack AWK 22:06:15 Eric Eggert (yatil) talked about "wood" metals being a thing 22:06:40 AWK: I don't feel strongly about 3 versus 4 levels. But I do think we need to think very hard about what Bronze actually means 22:06:56 Is it more about sites trying to meet A vs AA? 22:07:19 +100, we need to bring more organisations on board 22:07:23 q+ to ask if there are legal considerations. 22:07:32 ... we need to be cautious about believing that not having an easy on-ramp will force widespread Bronze compliance 22:07:40 +1 to AWK, especially onramp 22:07:40 ack Chuck 22:07:40 Chuck, you wanted to ask if there are legal considerations. 22:08:23 q+ 22:08:31 q+ 22:08:46 ack PeterKorn 22:09:04 To your point, based on webaim's test at https://webaim.org/projects/million/ 98.1% of home pages had detectable WCAG 2 failure 22:09:22 Charles Adams: if Bronze is minimum legally, and there is an earlier level, does claiming compliance with the earlier level put an org at a legal risk 22:09:44 Peter Korn: it would be better if we stay away from claiming what will or will not be a legal level 22:09:48 q+ to ask where did the idea come from that w3c could/should tell regulators what to do? 22:09:52 +1 Peter we should focus on measurement, and leave "levels" to others 22:10:18 suggestion: Bronze = between single-A and easy-checks, Silver = AA(ish), Gold = AA+ (whatever has been proposed for silver/gold) 22:10:33 Charles Adams: Could someone be sued, regardless who has set the conformance, based on the idea of a claiming a named lower level 22:11:02 Peter Korn: Helpful clarification. Needs more consideration. Good question. 22:11:06 ack shadi 22:12:03 q+ 22:12:09 counter suggestion: Bronze = based on minimum score aggregated from outcomes, Silver = based on higher score aggregated from outcomes, Gold based on additional testing 22:12:10 +1 to Shadi re considering the test may be easy to run, but the solution may be complex 22:12:13 Shadi: about the lower level being based on automation, will that truely be an easier level, if an organization doesn't know how to respond to the test results? 22:12:16 ack bruce_bailey 22:12:16 bruce_bailey, you wanted to ask where did the idea come from that w3c could/should tell regulators what to do? 22:12:26 +1 to Shadi's comment 22:12:29 +1, automation doesn't equate to the outcome 22:12:54 Bruce: Is it even the W3C's role to make a recommendation as to what should be regulated? 22:13:20 q+ to say I just realized I may be misapprehended--I feel automatable isuseful in describing successful a11y, not because it pertains to regulatory adoption 22:13:37 ... for WCAG 2.0 I don't recall conversations about regulation during the development of the standards 22:13:38 +1 Bruce 22:13:44 +1 to Bruce! 22:14:10 ack Chuck 22:14:15 ... we should be writing the best requirements we can, regardless of any thoughts about government actions 22:14:46 Charles Adams: We should make the requirements attractive to regulators... 22:14:52 https://www.w3.org/TR/wcag-3.0-requirements/#regulatory-environment 22:15:03 Bruce Bailey: it is the fact that the guidelines are *good* that makes them attractive 22:15:05 The Guidelines provide broad support, including Structure, methodology, and content that facilitates adoption into law, regulation, or policy, and clear intent and transparency as to purpose and goals, to assist when there are questions or controversy. 22:15:32 Q+ 22:15:41 none of that address bronze / silver / gold 22:15:41 ack sajkaj 22:15:41 sajkaj, you wanted to say I just realized I may be misapprehended--I feel automatable isuseful in describing successful a11y, not because it pertains to regulatory adoption 22:16:02 Janina: I wanted to clarify as I've been speaking strongly for a lower level 22:16:37 Possible labels: Beginner? Student? 22:16:41 suggested straw poll: Should we have a lower level more equivalent to A/AA? 22:17:12 ... I'm not proposing this for regulation. I'm proposing this as it is useful for getting the message about accessibility out there and getting people started. If they are willing to run the tool, they are interested in this to some extend, and can we use this leverage the value of automation. 22:17:31 suggested straw poll: Should we have a lower level than one equivalent to A/AA? 22:17:43 ack JF 22:18:50 JF: To me, the things that are really important are clear intent and transparency. Everyone must understand how a content owner got their score. This way, different countries can choose different numerical scores. 22:18:51 Ben_ has joined #ag 22:18:56 regulatory need: clear and unambigous 22:19:56 Rachael & Shawn: discuss straw poll 22:19:58 How does that work with ATAG RAchael? 22:20:18 I think that is a different question set 22:20:28 Why? 22:20:39 q+ 22:20:51 ack PeterKorn 22:21:02 Alastair: If Bronze is similar to AA, then the lower level would be similar to A, so I think the question should be whether we need something lower than Bronze 22:21:21 Peter Korn: We don't have enough guidelines currently to make this decision 22:21:24 q- 22:21:30 the reality is that if you score below bronze, you have something below bronze... 22:21:36 JF, because it applies to a subset of possible profiles - authoring tools -- than this discussion is about. 22:22:07 Shawn: Disagrees because we are not voting on a level of conformance, but rather an easier on-ramp 22:22:14 s/ because it applies / because ATAG applies 22:22:50 Peter Korn: Even still, we will need to revisit once we have more guidelines 22:22:56 draft resolution: We support a simpler on ramp below bronze that is not conformance, and will explore this further later 22:23:34 I can get behind that statement. 22:23:40 Shawn Lauriat: Agree, we are setting aside the question of whether or not there is a lower level, but we can likely agree now that we support an on-ramp 22:23:55 +1 -- could be in a best practices, even 22:23:59 Maybe a conformance roadmap? Like a path to Silver? 22:24:01 @Jeanne, sure, but what about when authoring tools are part of a larger offering? Members of the accessibility community were furious when WordPress initially rolled out Gutenberg, which went backwards in accessibility 22:24:25 AWK: I would agree if the statement said "that is not necessarily conformance" 22:24:44 +1 22:24:50 +1 to onramp but not conformance 22:24:52 +1 22:24:53 +1 22:24:54 +1 22:24:55 Shawn: yes, it is about supporting the use case, leaving the question of conformance for later 22:24:56 +1 22:24:56 +1 22:24:57 +1 22:24:57 +1 22:24:58 +1 22:24:58 +1 22:24:59 +1 22:25:00 0 22:25:00 +1 22:25:00 +1 22:25:00 +1 22:25:02 +1 to onramp but not conformance 22:25:02 +1 22:25:05 +1 22:25:05 +1 to onramp but not conformance 22:25:08 +1 22:25:10 +1 22:25:12 +1 to a roadmap/onramp to conformance 22:25:37 RESOLUTION: We support a simpler on ramp below bronze that is not conformance, and will explore this further later 22:25:56 Shawn: Sees lots of +1, some with qualifications, which we will note for the next draft 22:26:59 options: https://docs.google.com/document/d/1BjH_9iEr_JL8d7sE7BoQckmkpaDksKZiH7Q-RdDide4/edit#heading=h.r8n8wkp3rutl 22:27:05 Rachel and Shawn: discuss next topic and choose a narrowly scoped discussion of Options for Levels 22:27:24 s/Rachel/Rachael 22:27:54 Profiles 22:29:15 q+ to say that I don't oppose aggregate scores, but I am concerned that a company could acrue points that are 22:29:20 Rachael: Can we discuss the pros and cons of subsets of tests versus overall scores. Let's not discuss types of products. 22:29:23 q- 22:30:43 q+ 22:31:17 q+ to say that I don't oppose aggregate scores, but I am concerned that a company could acrue points that are "less expensive" than others that are critical to certain disabilities, for example, the expense of supporting people with hearing disabilities could be ignored. We have to protect against that. 22:31:32 Rachael question is about the pros and cons of aggregate points, whether against profiles or not, 22:32:24 ack Sheri_B-H 22:32:26 Sheri: Points are much more granular. 750 vs 900 much more meaningful than Bronze vs Silver 22:32:26 +1 Sheri, I've previously referenced FICO scores as a model 22:32:36 ack jeanne 22:32:36 jeanne, you wanted to say that I don't oppose aggregate scores, but I am concerned that a company could acrue points that are "less expensive" than others that are critical to 22:32:39 ... certain disabilities, for example, the expense of supporting people with hearing disabilities could be ignored. We have to protect against that. 22:32:52 Only using scores skips Functional categories, so you could end up with several being left out 22:33:08 more expensive = more points 22:33:10 Jeanne: I don't oppose aggregate scores, but I am concerned that we have to put in protections so that companies don't skip over the more expensive supports 22:33:18 q+ 22:33:27 ... for example, captions and ASL might be more expensive and be dropped 22:33:28 Q+ 22:33:32 Q+ to also note that we have the functional requirements minimums 22:33:35 ack Rachael 22:33:43 ... so I still think we need to have minimums by disability 22:33:57 Rachael: Agree that would need minimums by disability 22:34:19 q+ 22:34:31 ack Sheri_B-H 22:34:33 ack me 22:34:33 JF, you wanted to also note that we have the functional requirements minimums 22:34:41 Rachael: thresholds versus points can be discussed without the idea of product profiles. The question to the group is which direction we want to think in, rather than exactly which option to choose. 22:35:22 ack shadi 22:35:43 Sheri: agrees with Jeanne. Has seen examples where people were making decisions based on how they would be reflected on a VPAT. 22:35:46 In terms of what is attractive to regulators, here is the pointer to the preamble of the Original 508 Standards (12/21/2000) where the U.S. Access Board explains (comment/response) why WCAG 1.0 was not more closely adopted: 22:35:47 https://www.federalregister.gov/d/00-32017/p-124 22:36:56 as a point, thresholds can also be by number of errors where the lowest number of errors gets the highest medal 22:37:11 Shadi: yes, worries that too much scoring to draw attention to the incorrect things. I think increasingly sites are nearly at WCAG 2 AA. Something similar to the level that we currently have, but perhaps with different balance of disabilities, etc. 22:37:51 THe problem with weighting in this example (how hard it is to implement) is that it can't be standardized. What is hard for VMware could be very easy for Wordpress, for example. Legacy software change is much harder to implement than new sofltwasre change 22:38:07 ... there could perhaps be a threshold, where a few minus points are allowed, but after a certain number, you would no longer pass 22:38:20 +1 to Shadi 22:39:03 As a note, we all agreed at the last face to face not to do weighting 22:39:25 Jeanne: the problem with the example that Sheri gave of weighting based on how difficult things are would be very difficult to do at the W3C level, because it varies so much between products 22:39:42 Rachel: Yes, we had agreed to not to do weighting 22:39:45 q? 22:39:46 david-macdonald has joined #ag 22:40:11 Shawn: weighting is still be possible within individual orgs 22:40:23 thresholds 22:40:39 suggested straw poll: Option 1) Levels by thresholds Option 2) Levels by subsets of tests 22:40:46 1 22:40:56 1 22:40:57 1 22:41:10 1 22:41:12 1 22:41:13 q+ 22:41:15 1 22:41:15 Would thresholds include subsets of tests? 22:41:16 1 22:41:18 Rachael: Option 1 is new, Option 2 is how WCAG 2 does it 22:41:23 1 but can live with 2 22:41:26 David Fazio who had to drop off also says 1 22:41:39 1, but I believe that we will need to test that it works once we have enough outcomes 22:41:41 Shawn: I see 1s coming in 22:41:48 ack david-macdonald 22:41:55 don't understand options well enough to give meaningful answer 22:42:05 Same as Detlev 22:42:10 i was listening, but also not clear on thresholds! 22:42:13 Agree with Detlev 22:42:27 Jeanne: Can you give a summary of the straw poll? 22:42:58 what's 'a group of tests'? 22:43:19 Rachael: Option 1 is where you complete a group of tests which result in a numerical score and score leads to a level 22:43:35 q+ 22:44:02 q+ 22:44:11 Rachael: Option 2 is where you have sets of tests (A, AA, AAA) and the subsets of the tests are what define the levels (like WCAG 2.x) 22:44:14 So when you have thresholds, you wouldn't have to test as many things at a lower level? 22:44:31 Question: if I score "80%" on thresholds is it the same as scoring "80%" of the subset? 22:44:37 ack shadi 22:44:43 detlev: can you clarify "group of tests" 22:45:07 +1 Shadi - each subset has minimum thresholds 22:45:13 Shadi: I thought I was suggesting a combinations of both, where you do have the subsets, but you also have the thresholds 22:45:22 +1 for concrete examples ! 22:45:25 ack david-macdonald 22:45:50 Rachael and Shawn: decide to clarify the question further with examples 22:45:54 Action: Draw up examples of differences and revisit 22:45:58 seems like a very important question so appreciate cycling back 22:45:58 I think we can set the table at the very least. 22:46:14 Shawn: No resolution on that question, but we have next steps 22:47:25 or, providing a maturity model accruse X number of points 22:47:40 at any level 22:47:55 Rachael: Maturity Model is being created. Some options have placed as Gold level of WCAG 3.0 compliance, some have suggested it be separate from WCAG 3 compliance, and some have suggested it be just a part of what Gold is 22:48:50 Sheri: We have been working on a Maturity Model with 7 dimensions 22:48:59 q+ 22:49:01 ... each dimension = slice of behavior 22:49:23 q+ to ask how much overlap their work has with the WCAG 3 draft? 22:49:29 ... we have been making sure it will work regardless of WCAG level, regardless of the type of organization 22:49:44 q+ to reframe this as how an organization makes something, rather than looking at the thing made, itself? 22:49:49 ... each of 7 levels is scored from 0 to 4 22:50:17 Q+ to note there is a non-trivial cost to documenting a maturity model, which is a negative for smaller businesses 22:50:20 Sheri, can you give an example of a dimension? Just name them? 22:50:42 ack shadi 22:50:48 Shadi: I think this is a great resource. 22:51:22 ... The issue for me is how the two connect. 22:51:50 ... it would be very difficult for a Web a11y auditor to get access to the information to support the model 22:52:01 +1 to Shadi 22:52:02 ... great work, but I feel it is a separate effort 22:52:08 +1 to Shadi 22:52:11 +1 to Shadi's comments. 22:52:19 ... some organizations also outsource all of this 22:52:20 +1 22:52:32 q+ 22:52:40 q- 22:52:43 Sheri: We have addressed the outsourcing issue within the model 22:53:17 ... I'm not sure how I feel about it - there is great value in it, but it also makes sense to publish on the side as a note 22:53:33 Shawn: Does the separate document mean it must be a note? 22:53:38 q+ 22:53:57 ack ala 22:53:57 alastairc, you wanted to ask how much overlap their work has with the WCAG 3 draft? 22:53:57 ack alastairc 22:53:59 Michael Cooper/Sheri: There are a number of options, can decide later 22:54:33 @alastair pls post link in irc if you have it handy 22:54:37 It's WCAG agnostic 22:54:56 Alastair: Maturity model does not show a lot of overlap with WCAG 3. You mentioned it could work with WCAG 2 or WCAG 3. 22:55:19 rrsagent, make minutes 22:55:19 I have made the request to generate https://www.w3.org/2021/04/29-ag-minutes.html david-macdonald 22:55:25 Sheri: It was a deliberate choice to not tie it to WCAG 3 22:55:36 Apologies, I need to drop a bit early. 22:56:00 ... Since WCAG 3 will not *replace* WCAG 2, the maturity model is separate 22:56:06 ack JF 22:56:06 JF, you wanted to note there is a non-trivial cost to documenting a maturity model, which is a negative for smaller businesses 22:56:50 ack david-macdonald 22:56:56 draft straw poll: Publish maturity model as separate document 22:57:03 Jf: there is a non-trivial cost to documenting this; small businesses may not be able to do this; might make it a barrier to achieving gold. All for publishing it as very useful, but WCAG should measure outcome not process 22:57:41 ack Wilco 22:57:42 David: It doesn't seem that W3C could really tie conformance to culture change, HR policy, etc. Seems it needs to be separate document. 22:57:42 proposed Poll: Should the maturity model go out as a separate document? yes/no 22:58:00 At gold level, I think we could cross-reference and add points above the baseline of our usual testing. 22:58:01 q+ 22:58:53 Wilco: I really like this idea. There is a potential solution in this to many things that are not documented in an international standard - and will help make changes for their specific users. But, should be separate. 22:58:55 ack Ben 22:59:07 +1 Ben, my experience too 22:59:17 A note for future considerations. Can this be paired with the lower/nonconformant level? Could be interesting to discuss in the future. 22:59:55 Ben Tillyer: In my experience it is possible to score highly in a maturity model, while also providing a low-accessibility product. I don't think we should boost a product's score base don a maturity model. 22:59:55 +1 to Ben’s concerns 22:59:59 Poll: Should the maturity model go out as a separate document? yes/no 23:00:17 yes 23:00:19 Yes 23:00:21 Sheri: One big motivator, though, is to ensure that a product can/will be expected to *remain* accessible 23:00:21 We have always talked about requiring bronze regardless of maturity model 23:00:24 yes 23:00:24 yes 23:00:25 Yes 23:00:28 yes 23:00:28 yes 23:00:28 Yes 23:00:29 yes 23:00:30 yes 23:00:30 Yes 23:00:30 yes 23:00:30 yes 23:00:32 yes 23:00:34 y 23:00:35 yes 23:00:36 seperate 23:00:37 no because it needs the exposure of WCAG3 23:00:39 yes 23:00:42 +1 23:00:44 0 23:01:21 Shawn: We could publish as a separate document and then later have WCAG 3 reference it 23:01:25 we've not discussed Wilco's proposal for a suite of documents, so... 23:02:12 use "may"? 23:02:18 q+ 23:02:30 +1 to "may 23:02:42 RESOLUTION: The maturity model may be published as a separate document. 23:02:53 "we will pursue publishing Maturity Model as a separate document 23:03:02 s/base don/based on 23:03:11 good night to all 23:03:17 Good night 23:03:43 +1 to Jeanne's wording 23:03:46 +1 23:03:56 bye all 23:04:09 s /The maturity model may be published as a separate document./We will pursue publishing Maturity Model as a separate document./ 23:04:13 good work today. Thanks, all 23:04:28 Thank you all. That was a lot but made good progress. 23:04:39 bye all 23:04:42 sayonara 23:04:56 thank you! 23:05:08 rrsagent, make minutes 23:05:08 I have made the request to generate https://www.w3.org/2021/04/29-ag-minutes.html jeanne 23:40:15 list the questions 00:49:02 johnkirkwood has joined #AG 01:45:15 SuzanneTaylor has joined #ag 01:52:06 johnkirkwood has joined #AG 02:33:36 johnkirkwood has joined #AG 04:17:33 shawn has left #ag 06:44:45 shadi has joined #ag 10:13:34 jeanne has joined #ag